After the previous commit it should not be necessary. Furthermore, if we
*do* sleep, we'll exacerbate a race condition (in conjunction with
getting rid of the thread cancellation) that will cause test failures.
This was filed as https://github.com/nixos/nix/issues/7584, but as far
as I can tell, the previous solution of POLLHUP works just fine on macOS
14. I've also tested on an ancient machine with macOS 10.15.7, which
also has POLLHUP work correctly.
It's possible this might regress some older versions of macOS that have
a kernel bug, but I went looking through the history on the sources and
didn't find anything that looked terribly convincingly like a bug fix
between 2020 and today. If such a broken version exists, it seems pretty
reasonable to suggest simply updating the OS.
Change-Id: I178a038baa000f927ea2cbc4587d69d8ab786843
Based off of commit 69e2ee5b25752ba5fd8644cef56fb9d627ca4a64. Ericson2314 added
additional other information.
On https://github.com/NixOS/nix/issues/8946, we faced a surprising
behaviour wrt. exception when using pthread_cancel. In a nutshell when
a thread is inside a catch block and it's getting pthread_cancel by
another one, then the original exception is bubbled up and crashes the
process.
We now poll on the notification pipe from the thread and exit when the
main thread closes its end. This solution does not exhibit surprising
behaviour wrt. exceptions.
Co-authored-by: Mic92 <joerg@thalheim.io>
Fixes https://github.com/NixOS/nix/issues/8946
See also Lix https://gerrit.lix.systems/c/lix/+/1605 which is very
similar by coincidence. Pulled a comment from that.
Without this :u, :sh and :i repl commands fail with:
> Cannot run 'nix-shell'/`nix-env` because no method of calling the Nix
> CLI was provided. This is a configuration problem pertaining to how
> this program was built.
Remove the default ctor argument as it evidently makes catching
refactoring bugs much harder. `NixRepl` implementation lives completely
in `repl.cc`, so we can be as explicit as necessary.
Instead of calling `worker.waitForAWhile(shared_from_this())` etc.,
the subclasses of Goal instead call protected functions defined in Goal
that abstract over these.
The code for awaiting has also been heavily simplified.
Instead of calling `addWaitee`, then suspending,
`co_await await(waitees)` is called once, which also handles the suspend.
The end-goal is to remove all manual `co_await Suspend{}`s.
We now see exception beeing thrown when remote building in master
because of writing to a non-blocking file descriptor from our json logger.
> #0 0x00007f2ea97aea9c in __pthread_kill_implementation () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
> #1 0x00007f2ea975c576 in raise () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
> #2 0x00007f2ea9744935 in abort () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
> #3 0x00007f2ea99e8c2b in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #4 0x00007f2ea99f820a in __cxxabiv1::__terminate(void (*)()) () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #5 0x00007f2ea99f8275 in std::terminate() () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #6 0x00007f2ea99f84c7 in __cxa_throw () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #7 0x00007f2eaa5035c2 in nix::writeFull (fd=2, s=..., allowInterrupts=true) at ../unix/file-descriptor.cc:43
> #8 0x00007f2eaa5633c4 in nix::JSONLogger::write (this=this@entry=0x249a7d40, json=...) at /nix/store/4krab2h0hd4wvxxmscxrw21pl77j4i7j-gcc-13.3.0/include/c++/13.3.0/bits/char_traits.h:358
> #9 0x00007f2eaa5658d7 in nix::JSONLogger::logEI (this=<optimized out>, ei=...) at ../logging.cc:242
> #10 0x00007f2ea9c5d048 in nix::Logger::logEI (ei=..., lvl=nix::lvlError, this=0x249a7d40) at /nix/store/a7cq5bqh0ryvnkv4m19ffchnvi8l9qx6-nix-util-2.27.0-dev/include/nix/logging.hh:108
> #11 nix::handleExceptions (programName="nix", fun=...) at ../shared.cc:343
> #12 0x0000000000465b1f in main (argc=<optimized out>, argv=<optimized out>) at /nix/store/4krab2h0hd4wvxxmscxrw21pl77j4i7j-gcc-13.3.0/include/c++/13.3.0/bits/allocator.h:163
> (gdb) frame 10
> #10 0x00007f2ea9c5d048 in nix::Logger::logEI (ei=..., lvl=nix::lvlError, this=0x249a7d40) at /nix/store/a7cq5bqh0ryvnkv4m19ffchnvi8l9qx6-nix-util-2.27.0-dev/include/nix/logging.hh:108
> 108 logEI(ei);
So far only drainFD sets the non-blocking flag on a "readable" file descriptor,
while this is a "writeable" file descriptor.
It's not clear to me yet, why we see logs after that point, but it's
also not that bad to handle EAGAIN in read/write functions after all.
Now that we have coroutines, we can go back to loops and regular RAII,
which is must less error-proone!
I look forward to removing the other instances!
Perhaps more significantly, it no longer knows about
`LocalDerivationGoal`, and without any effort it also compiles on
Windows just fine. (`local-derivation-goal.{cc,hh}` is currently skipped
on Windows.)
Default: istty(stdout)
This refactors `nix develop` internals a bit to use the `json` type
more. The assertion now operates in the in-memory json instead of
re-parsing it. While this is technically a weaker guarantee, we
should be able to rely on the library to get this right. It's its
most essential purpose.
The underlying issue is that debugger code path was
calling PosTable::operator[] in each eval method.
This has become incredibly expensive since 5d9fdab3de.
While we are it it, I've reworked the code to
not use std::shared_ptr where it really isn't necessary.
As I've documented in previous commits, this is actually
more a workaround for recursive header dependencies now
and is only necessary in `error.hh` code.
Some ad-hoc benchmarking:
After this commit:
```
Benchmark 1: nix eval nixpkgs#hello --impure --ignore-try --no-eval-cache --debugger
Time (mean ± σ): 784.2 ms ± 7.1 ms [User: 561.4 ms, System: 147.7 ms]
Range (min … max): 773.5 ms … 792.6 ms 10 runs
```
On master 3604c7c51:
```
Benchmark 1: nix eval nixpkgs#hello --impure --ignore-try --no-eval-cache --debugger
Time (mean ± σ): 22.914 s ± 0.178 s [User: 18.524 s, System: 4.151 s]
Range (min … max): 22.738 s … 23.290 s 10 runs
```
All of this code doesn't actually depend on anything from
libexpr. Because Pos is so tigtly coupled with Error, it
makes sense to have in the same library.
The simplification here is due to a long-standing bug, but it is not
worth fixing the bug at this time. Instead we've finally written up an
issue for the bug, and referenced the issue number in the code.
In the local building case, there is many things which can through
`BuildError`, but in the hook case there is just this one. We can
therefore simplify the code by "cinching" down the logic just to the
spot the error is thrown.
There is other code outside `libstore/build` which also uses
`BuildError`, but I believe those cases are mistakes. The point of
`BuildError` is the narrow technical use-cases of "errors which should
not be fatal with `--keep-going`". Using it outside the
building/scheduling code doesn't really make sense in that regard. It
seems likely that those usages were instead merely because "oh, this
error has something to do with building, so I guess `BuildError` is
better than `Error`".
It is quite likely that I myself used `BuildError` incorrectly as
described above :).
The variables are only set by CGroup mechanisms in `killSandbox` in the
local build. In the build hook case, these variables will not be set, so
there is nothing to do.
We can move this method from `LocalStore` to `Store` --- even if we only
want the actual builder to sign things in many cases, there is no reason
to try to enforce this policy by spurious moving the method to a
subclass.
Now, we might technically sign class, but CA derivations is
experimental, and @Ericson2314 is going to revisit all this stuff with
issue #11896 anyways.