Fixes
$ nix copy --derivation --to /tmp/nix /nix/store/...
error: cannot enqueue a work item while the thread pool is shutting down
The ThreadPoolShutDown exception was hiding the reason for the thread
pool shut down, e.g.
error: cannot add path '/nix/store/03sl46khd8gmjpsad7223m32ma965vy9-fix-static.patch' because it lacks a signature by a trusted key
(cherry picked from commit a8c69cc907)
Fixes a user report of trouble with toybox grep and avoids
potential of same basic issue with other utils.
(cherry picked from commit 6a874c2865)
# Conflicts:
# scripts/sequoia-nixbld-user-migration.sh
First the motivation: I recently faced a bug that I assume is coming
from the topoSortPaths function where the GC was trying to delete a
path having some alive referrers. I resolved this by manually deleting
the faulty path referrers using nix-store --query --referrers. I sadly
did not manage to reproduce this bug.
This bug alone is not a big deal. However, this bug is
triggering a cascading failure: invalidatePathChecked is throwing a
PathInUse exception. This exception is not catched and fails the whole GC
run. From there, the machine (a builder machine) was unable to GC its
Nix store, which led to an almost full disk with no way to
automatically delete the dead Nix paths.
Instead, I think we should log the error for the specific store path
we're trying to delete, specifying we can't delete this path because
it still has referrers. Once we're done with logging that, the GC run
should continue to delete the dead store paths it can delete.
(cherry picked from commit ced8d311a5)
This fixes segfaults with nix copy when there was an error processing
addMultipleToStore.
Running with ASAN/TSAN pointed at an use-after-free with threads from
the pool accessing the graph declared in processGraph after the function
was exiting and destructing the variables.
It turns out that if there is an error before pool.process() is called,
for example while we are still enqueuing tasks, then pool.process()
isn't called and threads are still left to run.
By creating the pool last we ensure that it is stopped first before
running other destructors even if an exception happens early.
[ lix porting note: nix does not name threads so the patch has been
adapted to not pass thread name ]
Link: https://git.lix.systems/lix-project/lix/issues/618
Link: https://gerrit.lix.systems/c/lix/+/2355
(cherry picked from commit afac093b34)
Some of the code before https://github.com/NixOS/nix/pull/11351 did not
have the problem. What remains wasn't all that problematic, but is good
to have anyway.
Since withFramedSink() is now used a lot more than in the past (for
every addToStore() variant), we were creating a lot of threads, e.g.
nix flake show --no-eval-cache --all-systems github:NixOS/nix/afdd12be5e19c0001ff3297dea544301108d298
would create 46418 threads. While threads on Linux are cheap, this is
still substantial overhead.
So instead, just poll from FramedSink before every write whether there
are pending messages from the daemon. This could slightly increase the
latency on log messages from the daemon, but not on exceptions (which
were only synchronously checked from FramedSink anyway).
This speeds up the command above from 19.2s to 17.5s on my machine (a
9% speedup).
(cherry picked from commit 39daa4a0d3)