1
0
Fork 0
mirror of https://github.com/NixOS/nix synced 2025-07-08 15:13:55 +02:00

Expand manual on derivation outputs

Note, this includes some text adapted from from Eelco's dissertation
This commit is contained in:
John Ericson 2025-02-10 01:08:00 -05:00
parent 31923aaac0
commit 2aa6e0f084
12 changed files with 508 additions and 174 deletions

View file

@ -24,13 +24,17 @@ For the full specification of the algorithms involved, see the [specification of
### File System Objects
With all currently supported store object content addressing methods, the file system object is always [content-addressed][fso-ca] first, and then that hash is incorporated into content address computation for the store object.
With all currently-supported store object content-addressing methods, the file system object is always [content-addressed][fso-ca] first, and then that hash is incorporated into content address computation for the store object.
### References
#### References to other store object#### References to other store objectss
With all currently supported store object content addressing methods,
other objects are referred to by their regular (string-encoded-) [store paths][Store Path].
#### Self-references
Self-references however cannot be referred to by their path, because we are in the midst of describing how to compute that path!
> The alternative would require finding as hash function fixed point, i.e. the solution to an equation in the form
@ -40,7 +44,28 @@ Self-references however cannot be referred to by their path, because we are in t
> which is computationally infeasible.
> As far as we know, this is equivalent to finding a hash collision.
Instead we just have a "has self reference" boolean, which will end up affecting the digest.
Instead we have a "has self reference" boolean, which end up affecting the digest:
In all currently-supported store object content-addressing methods, when hashing the file system object data, any occurence of store objects own store path in the digested data is replaced with a [sentinal value](https://en.wikipedia.org/wiki/Sentinel_value).
The hashes of these modified input streams are used instead.
When validating the content-address of a store object after the fact, the above process works as written.
However, when first creating the store object we don't know the store object's store path, as explained just above.
We therefore, strictly speaking, do not know what value we will be replacing with the sentinental value in the inputs to hash functions.
What instead happens is that the provisional store object --- the data from which we wish to create a store object --- is paired with a provisional "scratch" store path (that presumably was choosen when the data was created).
That provisional store path is instead what is replaced with the sentinal value, rather than the final store object which we do not yet know.
> **Design note**
>
> It is an informal property of content-addressed store objects that the choice of provisional store path should not matter.
> In other words, if a provisional store object is prepared in the same way except for the choice of provision store path, the provisional data need not be identical.
> But, after the sentinal value is substituted in place of each provisional store object's provision store path, the final so-normalized data *should* be identifical.
>
> If, conversely, the data after this normalization process is still different, we'll compute a different content-address.
> The method of preparing the provisional self-referenced data has *failed* to be deterministic in the sense of not *leaking* the choice of provisional store path --- a choice which is supposed to be arbitrary --- into the final store object.
>
> This property is informal because at this stage, we are just described store objects, which have no formal notion of their origin.
> Without such a formal notion, there is nothing to formally accuse of being insufficiently deterministic.
> Later in this chapter, when we cover [derivations](@docroot@/store/derivation/index.md), we will have a chance to make this a formal property, not of content-addressed store objects themselves, but of derivations that *produce* content-addressed store objects.
### Name and Store Directory