This wires up the {pre,post}FunctionCallHook machinery
in EvalState::callFunction and migrates FunctionCallTrace
to use the new EvalProfiler mechanisms for tracing.
Note that branches when the hook gets called are marked with [[unlikely]]
as a hint to the compiler that this is not a hot path. For non-tracing
evaluation this should be a 100% predictable branch, so the performance
cost is nonexistent.
Some measurements to prove support this point:
```
nix build .#nix-cli
nix build github:nixos/nix/d692729759e4e370361cc5105fbeb0e33137ca9e#nix-cli --out-link before
```
(Before)
```
$ taskset -c 2,3 hyperfine "GC_INITIAL_HEAP_SIZE=16g before/bin/nix eval nixpkgs#gnome --no-eval-cache" --warmup 4
Benchmark 1: GC_INITIAL_HEAP_SIZE=16g before/bin/nix eval nixpkgs#gnome --no-eval-cache
Time (mean ± σ): 2.517 s ± 0.032 s [User: 1.464 s, System: 0.476 s]
Range (min … max): 2.464 s … 2.557 s 10 runs
```
(After)
```
$ taskset -c 2,3 hyperfine "GC_INITIAL_HEAP_SIZE=16g result/bin/nix eval nixpkgs#gnome --no-eval-cache" --warmup 4
Benchmark 1: GC_INITIAL_HEAP_SIZE=16g result/bin/nix eval nixpkgs#gnome --no-eval-cache
Time (mean ± σ): 2.499 s ± 0.022 s [User: 1.448 s, System: 0.478 s]
Range (min … max): 2.472 s … 2.537 s 10 runs
```
This patch adds an EvalProfiler and MultiEvalProfiler that can be used
to insert hooks into the evaluation for the purposes of function tracing
(what function-trace currently does) or for flamegraph/tracy profilers.
See the following commits for how this is supposed to be integrated into
the evaluator and performance considerations.
Ensure relative path inputs are relative to the parent node's _actual_
`outPath`, instead of the subtly different `sourceInfo.outPath`.
Additionally, non-flake inputs now also have a `sourceInfo` attribute.
This fixes the relationship between `self.outPath` and
`self.sourceInfo.outPath` in some edge cases.
Fixes#13164
Previous code had a sneaky bug due to which no caching
actually happened:
```cpp
auto linesForInput = (*lines)[origin->offset];
```
That should have been:
```cpp
auto & linesForInput = (*lines)[origin->offset];
```
See [1].
Now that it also makes sense to make the cache bound in side
in order not to memoize all the sources without freeing any memory.
The default cache size has been chosen somewhat arbitrarily to be ~64k
origins. For reference, 25.05 nixpkgs has ~50k .nix files.
Simple benchmark:
```nix
let
pkgs = import <nixpkgs> { };
in
builtins.foldl' (acc: el: acc + el.line) 0 (
builtins.genList (x: builtins.unsafeGetAttrPos "gcc" pkgs) 10000
)
```
(After)
```
$ hyperfine "result/bin/nix eval -f ./test.nix"
Benchmark 1: result/bin/nix eval -f ./test.nix
Time (mean ± σ): 292.7 ms ± 3.9 ms [User: 131.0 ms, System: 120.5 ms]
Range (min … max): 288.1 ms … 300.5 ms 10 runs
```
(Before)
```
hyperfine "nix eval -f ./test.nix"
Benchmark 1: nix eval -f ./test.nix
Time (mean ± σ): 666.7 ms ± 6.4 ms [User: 428.3 ms, System: 191.2 ms]
Range (min … max): 659.7 ms … 681.3 ms 10 runs
```
If the origin happens to be a `all-packages.nix` or similar in size then the
difference is much more dramatic.
[1]: 22e3f0e987
Try to make `DerivationGoal` care less whether we're working from an
in-memory derivation or not.
It's a clean-up in its own right, but it will also help with other
cleanups under the umbrella of #12628.
Now, each class provides the initial coroutine by value. This avoids
some sketchy virtual function stuff, and will also be further put to
good use in the next commit.
As summarized in
https://github.com/NixOS/nix/issues/77#issuecomment-2843228280 the
motivation is that the complicated retry logic this introduced was
making the cleanup task #12628 harder to accomplish. It was not easy to
ascertain just what policy / semantics the extra control-flow was
implementing, in order to figure out a different way to implementing it
either.
After talking to Eelco about it, he decided we could just....get rid of
the feature entirely! It's a bit scary removing a decade+ old feature,
but I think he is right. See the release notes for more explanation.
This reverts commit 299141ecbd.
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
Leverage #10766 to show how we can now resolve a store configuration
without actually opening the store for that resolved configuration.
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
Splicing the list element to the back can be done in
a much simpler and concise way without the need for
erasing and re-inserting the element. Doing it this
way is equivalent to just moving node pointers around,
whereas inserting/erasing allocates/deallocates new nodes.
The existing header is a bit too big. Now the following use-cases are
separated, and get their own headers:
- Using or implementing an arbitrary store: remaining `store-api.hh`
This is closer to just being about the `Store` (and `StoreConfig`)
classes, as one would expect.
- Opening a store from a textual description: `store-open.hh`
Opening an aribtrary store implementation like this requires some sort
of store registration mechanism to exists, but the caller doesn't need
to know how it works. This just exposes the functions which use such a
mechanism, without exposing the mechanism itself
- Registering a store implementation: `store-registration.hh`
This requires understanding how the mechanism actually works, and the
mechanism in question involves templated machinery in headers we
rather not expose to things that don't need it, as it would slow down
compilation for no reason.
I can't find a good way to benchmark in isolation from the
git cache, but common sense dictates that creating (and destroying)
a 131KiB std::vector for each regular file from the archive imposes
quite a significant overhead regardless of the IO bound git cache.
AFAICT there is no reason to keep a copy of the data since
it always gets fed into the sink and there are no coroutines/threads
in sight.
As it turns out using `std::regex` is actually the bottleneck
for root discovery. Just substituting `std::` -> `boost::`
makes root discovery twice as fast (3x if counting only userspace time).
Some rather ad-hoc measurements to motivate the switch:
(On master)
```
nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-cli --out-link result-1e822bd4149a8bce1da81ee2ad9404986b07914c
taskset -c 2,3 hyperfine "result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0"
Benchmark 1: result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0
Time (mean ± σ): 481.6 ms ± 3.9 ms [User: 336.2 ms, System: 142.0 ms]
Range (min … max): 474.6 ms … 487.7 ms 10 runs
```
(After this patch)
```
taskset -c 2,3 hyperfine "result/bin/nix store gc --dry-run --max 0"
Benchmark 1: result/bin/nix store gc --dry-run --max 0
Time (mean ± σ): 254.7 ms ± 9.7 ms [User: 111.1 ms, System: 141.3 ms]
Range (min … max): 246.5 ms … 281.3 ms 10 runs
```
`boost::regex` is a drop-in replacement for `std::regex`, but much faster.
Doing a simple before/after comparison doesn't surface any change in behavior:
```
result/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
```