On https://github.com/NixOS/nix/issues/8946, we faced a surprising
behaviour wrt. exception when using pthread_cancel. In a nutshell when
a thread is inside a catch block and it's getting pthread_cancel by
another one, then the original exception is bubbled up and crashes the
process.
We now poll on the notification pipe from the thread and exit when the
main thread closes its end. This solution does not exhibit surprising
behaviour wrt. exceptions.
Co-authored-by: Mic92 <joerg@thalheim.io>
Fixes https://github.com/NixOS/nix/issues/8946
See also Lix https://gerrit.lix.systems/c/lix/+/1605 which is very
similar by coincidence. Pulled a comment from that.
Without this :u, :sh and :i repl commands fail with:
> Cannot run 'nix-shell'/`nix-env` because no method of calling the Nix
> CLI was provided. This is a configuration problem pertaining to how
> this program was built.
Remove the default ctor argument as it evidently makes catching
refactoring bugs much harder. `NixRepl` implementation lives completely
in `repl.cc`, so we can be as explicit as necessary.
We want the $doc, $man outputs to be symlinks pointing to nix-manual and
nix-manual.man. Creating the directories first makes the `ln` command
produce symlink $doc/${nix-manual} instead.
```
$file /nix/store/q4dwlnd36gpfajgfcp6hca2xwy068wjq-nix-2.27.1-man/rwh8ky3k040wyrywl8k2v5b3csdfbdg7-nix-manual-2.27.1-man
/nix/store/q4dwlnd36gpfajgfcp6hca2xwy068wjq-nix-2.27.1-man/rwh8ky3k040wyrywl8k2v5b3csdfbdg7-nix-manual-2.27.1-man:
symbolic link to /nix/store/rwh8ky3k040wyrywl8k2v5b3csdfbdg7-nix-manual-2.27.1-man
```
This is the reason `nix-env --help` is once again broken on 2.26/2.27/master
after 4108529.
Instead of calling `worker.waitForAWhile(shared_from_this())` etc.,
the subclasses of Goal instead call protected functions defined in Goal
that abstract over these.
The code for awaiting has also been heavily simplified.
Instead of calling `addWaitee`, then suspending,
`co_await await(waitees)` is called once, which also handles the suspend.
The end-goal is to remove all manual `co_await Suspend{}`s.
We now see exception beeing thrown when remote building in master
because of writing to a non-blocking file descriptor from our json logger.
> #0 0x00007f2ea97aea9c in __pthread_kill_implementation () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
> #1 0x00007f2ea975c576 in raise () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
> #2 0x00007f2ea9744935 in abort () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
> #3 0x00007f2ea99e8c2b in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #4 0x00007f2ea99f820a in __cxxabiv1::__terminate(void (*)()) () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #5 0x00007f2ea99f8275 in std::terminate() () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #6 0x00007f2ea99f84c7 in __cxa_throw () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #7 0x00007f2eaa5035c2 in nix::writeFull (fd=2, s=..., allowInterrupts=true) at ../unix/file-descriptor.cc:43
> #8 0x00007f2eaa5633c4 in nix::JSONLogger::write (this=this@entry=0x249a7d40, json=...) at /nix/store/4krab2h0hd4wvxxmscxrw21pl77j4i7j-gcc-13.3.0/include/c++/13.3.0/bits/char_traits.h:358
> #9 0x00007f2eaa5658d7 in nix::JSONLogger::logEI (this=<optimized out>, ei=...) at ../logging.cc:242
> #10 0x00007f2ea9c5d048 in nix::Logger::logEI (ei=..., lvl=nix::lvlError, this=0x249a7d40) at /nix/store/a7cq5bqh0ryvnkv4m19ffchnvi8l9qx6-nix-util-2.27.0-dev/include/nix/logging.hh:108
> #11 nix::handleExceptions (programName="nix", fun=...) at ../shared.cc:343
> #12 0x0000000000465b1f in main (argc=<optimized out>, argv=<optimized out>) at /nix/store/4krab2h0hd4wvxxmscxrw21pl77j4i7j-gcc-13.3.0/include/c++/13.3.0/bits/allocator.h:163
> (gdb) frame 10
> #10 0x00007f2ea9c5d048 in nix::Logger::logEI (ei=..., lvl=nix::lvlError, this=0x249a7d40) at /nix/store/a7cq5bqh0ryvnkv4m19ffchnvi8l9qx6-nix-util-2.27.0-dev/include/nix/logging.hh:108
> 108 logEI(ei);
So far only drainFD sets the non-blocking flag on a "readable" file descriptor,
while this is a "writeable" file descriptor.
It's not clear to me yet, why we see logs after that point, but it's
also not that bad to handle EAGAIN in read/write functions after all.
Now that we have coroutines, we can go back to loops and regular RAII,
which is must less error-proone!
I look forward to removing the other instances!
Perhaps more significantly, it no longer knows about
`LocalDerivationGoal`, and without any effort it also compiles on
Windows just fine. (`local-derivation-goal.{cc,hh}` is currently skipped
on Windows.)
Default: istty(stdout)
This refactors `nix develop` internals a bit to use the `json` type
more. The assertion now operates in the in-memory json instead of
re-parsing it. While this is technically a weaker guarantee, we
should be able to rely on the library to get this right. It's its
most essential purpose.
The underlying issue is that debugger code path was
calling PosTable::operator[] in each eval method.
This has become incredibly expensive since 5d9fdab3de.
While we are it it, I've reworked the code to
not use std::shared_ptr where it really isn't necessary.
As I've documented in previous commits, this is actually
more a workaround for recursive header dependencies now
and is only necessary in `error.hh` code.
Some ad-hoc benchmarking:
After this commit:
```
Benchmark 1: nix eval nixpkgs#hello --impure --ignore-try --no-eval-cache --debugger
Time (mean ± σ): 784.2 ms ± 7.1 ms [User: 561.4 ms, System: 147.7 ms]
Range (min … max): 773.5 ms … 792.6 ms 10 runs
```
On master 3604c7c51:
```
Benchmark 1: nix eval nixpkgs#hello --impure --ignore-try --no-eval-cache --debugger
Time (mean ± σ): 22.914 s ± 0.178 s [User: 18.524 s, System: 4.151 s]
Range (min … max): 22.738 s … 23.290 s 10 runs
```