Expand manual on derivation outputs

Note, this includes some text adapted from from Eelco's dissertation
2025-07-08 15:13:55 +02:00 · 2025-02-10 01:08:00 -05:00 · 2025-02-10 01:08:00 -05:00 · 2aa6e0f084
commit 2aa6e0f084
parent 31923aaac0
12 changed files with 508 additions and 174 deletions
--- a/doc/manual/source/store/store-object/content-address.md
+++ b/doc/manual/source/store/store-object/content-address.md
@ -24,13 +24,17 @@ For the full specification of the algorithms involved, see the [specification of

 ### File System Objects

-With all currently supported store object content addressing methods, the file system object is always [content-addressed][fso-ca] first, and then that hash is incorporated into content address computation for the store object.
+With all currently-supported store object content-addressing methods, the file system object is always [content-addressed][fso-ca] first, and then that hash is incorporated into content address computation for the store object.

 ### References

+#### References to other store object#### References to other store objectss
+
 With all currently supported store object content addressing methods,
 other objects are referred to by their regular (string-encoded-) [store paths][Store Path].

+#### Self-references
+
 Self-references however cannot be referred to by their path, because we are in the midst of describing how to compute that path!

 > The alternative would require finding as hash function fixed point, i.e. the solution to an equation in the form
@ -40,7 +44,28 @@ Self-references however cannot be referred to by their path, because we are in t
 > which is computationally infeasible.
 > As far as we know, this is equivalent to finding a hash collision.

-Instead we just have a "has self reference" boolean, which will end up affecting the digest.
+Instead we have a "has self reference" boolean, which end up affecting the digest:
+In all currently-supported store object content-addressing methods, when hashing the file system object data, any occurence of store objects own store path in the digested data is replaced with a [sentinal value](https://en.wikipedia.org/wiki/Sentinel_value).
+The hashes of these modified input streams are used instead.
+
+When validating the content-address of a store object after the fact, the above process works as written.
+However, when first creating the store object we don't know the store object's store path, as explained just above.
+We therefore, strictly speaking, do not know what value we will be replacing with the sentinental value in the inputs to hash functions.
+What instead happens is that the provisional store object --- the data from which we wish to create a store object --- is paired with a provisional "scratch" store path (that presumably was choosen when the data was created).
+That provisional store path is instead what is replaced with the sentinal value, rather than the final store object which we do not yet know.
+
+> **Design note**
+>
+> It is an informal property of content-addressed store objects that the choice of provisional store path should not matter.
+> In other words, if a provisional store object is prepared in the same way except for the choice of provision store path, the provisional data need not be identical.
+> But, after the sentinal value is substituted in place of each provisional store object's provision store path, the final so-normalized data *should* be identifical.
+>
+> If, conversely, the data after this normalization process is still different, we'll compute a different content-address.
+> The method of preparing the provisional self-referenced data has *failed* to be deterministic in the sense of not *leaking* the choice of provisional store path --- a choice which is supposed to be arbitrary --- into the final store object.
+>
+> This property is informal because at this stage, we are just described store objects, which have no formal notion of their origin.
+> Without such a formal notion, there is nothing to formally accuse of being insufficiently deterministic.
+> Later in this chapter, when we cover [derivations](@docroot@/store/derivation/index.md), we will have a chance to make this a formal property, not of content-addressed store objects themselves, but of derivations that *produce* content-addressed store objects.

 ### Name and Store Directory