This prevents C++ level undefined behavior from affecting
the evaluator. Stdlib implementation details should not affect
eval, regardless of the build platform. Even erroneous usage
of `builtins.sort` should not make it possible to crash the
evaluator or produce results that depend on the host platform.
This overload isn't actually necessary anywhere and
doesn't make much sense. The pointers to `Value`s are
themselves const, but the `Value`s are mutable.
A non-const member function implies that the object itself
can be modified but this doesn't make much sense considering
the return type: `Value * const * `, which is a pointer
to a constant array of pointers to mutable values.
This adds a meson.format file that mostly mirrors the projects
meson style and a pre-commit hook to enforce this style.
Some low-diff files are formatted.
`getPrimOp` function was basically identical to existing
`Value::primOpAppPrimOp` modulo some trivial differences.
Makes sense to reuse existing code for that.
This makes the profiler much more useful by actually distiguishing
different derivations being evaluated. This does make the implementation
a bit more convoluted, but I think it's worth it.
Sometimes the profiler might want to do evaluation (e.g. for getting
derivation names). This is not ideal, but is really necessary
to make the profiler stack traces useful for end users.
This patch adds support for a native stack sampling
profiler to the evaluator, which saves a collapsed stack
profile information to a configurable location.
Introduced options (in `EvalSettings`):
- `eval-profile-file` - path to the collected profile file.
- `eval-profiler-frequency` - sampling frequency.
- `eval-profiler` - enumeration option for enabling the profiler.
Currently only `flamegraph` is supported, but having this an
enumeration rather than a boolean switch leaves the door open
for other profiler variants (e.g. tracy).
Profile includes the following information on best-effort basis (e.g. some lambdas might
have an undefined name). Callstack information contains:
- Call site location (where the function gets called).
- Primop/lambda name of the function being called.
- Functors/partial applications don't have a name attached to them unlike special-cased primops and lambads.
For cases where callsite location isn't available we have to resort to providing
the location where the lambda itself is defined. This removes some of the confusing
`«none»:0` locations in the profile from previous attempts.
Example usage with piping directly into zstd for compression:
```
nix eval --no-eval-cache nixpkgs#nixosTests.gnome \
--eval-profiler flamegraph \
--eval-profile-file >(zstd -of nix.profile.zstd)
```
Co-authored-by: Jörg Thalheim <joerg@thalheim.io>
This wires up the {pre,post}FunctionCallHook machinery
in EvalState::callFunction and migrates FunctionCallTrace
to use the new EvalProfiler mechanisms for tracing.
Note that branches when the hook gets called are marked with [[unlikely]]
as a hint to the compiler that this is not a hot path. For non-tracing
evaluation this should be a 100% predictable branch, so the performance
cost is nonexistent.
Some measurements to prove support this point:
```
nix build .#nix-cli
nix build github:nixos/nix/d692729759e4e370361cc5105fbeb0e33137ca9e#nix-cli --out-link before
```
(Before)
```
$ taskset -c 2,3 hyperfine "GC_INITIAL_HEAP_SIZE=16g before/bin/nix eval nixpkgs#gnome --no-eval-cache" --warmup 4
Benchmark 1: GC_INITIAL_HEAP_SIZE=16g before/bin/nix eval nixpkgs#gnome --no-eval-cache
Time (mean ± σ): 2.517 s ± 0.032 s [User: 1.464 s, System: 0.476 s]
Range (min … max): 2.464 s … 2.557 s 10 runs
```
(After)
```
$ taskset -c 2,3 hyperfine "GC_INITIAL_HEAP_SIZE=16g result/bin/nix eval nixpkgs#gnome --no-eval-cache" --warmup 4
Benchmark 1: GC_INITIAL_HEAP_SIZE=16g result/bin/nix eval nixpkgs#gnome --no-eval-cache
Time (mean ± σ): 2.499 s ± 0.022 s [User: 1.448 s, System: 0.478 s]
Range (min … max): 2.472 s … 2.537 s 10 runs
```
This patch adds an EvalProfiler and MultiEvalProfiler that can be used
to insert hooks into the evaluation for the purposes of function tracing
(what function-trace currently does) or for flamegraph/tracy profilers.
See the following commits for how this is supposed to be integrated into
the evaluator and performance considerations.
The existing header is a bit too big. Now the following use-cases are
separated, and get their own headers:
- Using or implementing an arbitrary store: remaining `store-api.hh`
This is closer to just being about the `Store` (and `StoreConfig`)
classes, as one would expect.
- Opening a store from a textual description: `store-open.hh`
Opening an aribtrary store implementation like this requires some sort
of store registration mechanism to exists, but the caller doesn't need
to know how it works. This just exposes the functions which use such a
mechanism, without exposing the mechanism itself
- Registering a store implementation: `store-registration.hh`
This requires understanding how the mechanism actually works, and the
mechanism in question involves templated machinery in headers we
rather not expose to things that don't need it, as it would slow down
compilation for no reason.
Remove outdated and no longer relevant TODO. It's more confusing
now, since symbol table must now be addressed by uint32_t indices
in order to keep Attr size down to 16 bytes on 64 bit machines.
The intention is to switch to transparent comparators from N3657 for
ordered set containers for strings and using the alias consistently
would simplify things.
This trades off some executable size for measurable lexer performance
improvements.
Note on the explicitly enabling 8bit scanner.
This is needed due to the default behavior of flex (excerpt from the manual [1]):
> Flex’s default behavior is to generate an 8-bit scanner unless you
> use the ‘-Cf’ or ‘-CF’, in which case flex defaults to generating
> 7-bit scanners unless your site was always configured to generate 8-bit
> scanners.
Some quantifyable metrics:
Nixpkgs revision: a6e3f45acf4e817532a861ab0eda4ab5485fecc1
Parsing the largest file in nixpkgs: pkgs/development/haskell-modules/hackage-packages.nix.
(Before this patch)
```
$ nix build github:nixos/nix/9fe3077d4#nix-expr
$ du --apparent-size result/lib/libnixexpr.so
2518 result/lib/libnixexpr.so
$ nix build github:nixos/nix/9fe3077d4#nix-cli
$ taskset -c 2,3 hyperfine "GC_INITIAL_HEAP_SIZE=16g \
result/bin/nix-instantiate --parse \
../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix > /dev/null"
Time (mean ± σ): 375.5 ms ± 6.3 ms [User: 316.9 ms, System: 56.7 ms]
Range (min … max): 368.5 ms … 388.3 ms 10 runs
```
(After the patch)
```
$ nix build .#nix-expr
$ du --apparent-size result/lib/libnixexpr.so
2685 result/lib/libnixexpr.so
$ nix build .#nix-cli
$ taskset -c 2,3 hyperfine "GC_INITIAL_HEAP_SIZE=16g \
result/bin/nix-instantiate --parse \
../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix > /dev/null"
Time (mean ± σ): 326.8 ms ± 4.9 ms [User: 269.5 ms, System: 55.3 ms]
Range (min … max): 319.7 ms … 335.5 ms 10 runs
```
Overall, the change is roughly:
- 2518KiB -> 2685KiB ~ 150 KiB of machine code
- 375ms -> 325ms ~ 50ms
The perf uplift for eval-heavy test cases is obviously less noticeable,
but it doesn't make sense not to take this free perf win.
[1]: https://westes.github.io/flex/manual/Options-Affecting-Scanner-Behavior.html#Options-Affecting-Scanner-Behavior
Rather than "mounting" the store inside an empty virtual filesystem,
just return the store as a virtual filesystem. This is more modular.
(FWIW, it also supports two long term hopes of mind:
1. More capability-based Nix language mode. I dream of a "super pure
eval" where you can only use relative path literals (See #8738), and
any `fetchTree`-fetched stuff + the store are all disjoint (none is
mounted in another) file systems.
2. Windows, where the store dir may include drive letters, etc., and is
thus unsuitable to be the prefix of any `CanonPath`s.
)
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>