Merge pull request #10722 from obsidiansystems/ca-obj-docs

Content addressing store objects
2025-06-25 06:31:14 +02:00 · 2024-05-20 15:58:29 +02:00 · 2024-05-20 15:58:29 +02:00 · 67db9e0c64
commit 67db9e0c64
parent e4be8abe42 4c91bc543c
21 changed files with 240 additions and 65 deletions
--- a/doc/manual/rl-next/derivation-json-change.md
+++ b/doc/manual/rl-next/derivation-json-change.md
@ -0,0 +1,12 @@
+---
+synopsis: Modify `nix derivation {add,show}` JSON format
+issues: 9866
+prs: 10722
+---
+
+The JSON format for derivations has been slightly revised to better conform to our [JSON guidelines](@docroot@contributing/cli-guideline#returning-future-proof-json).
+In particular, the hash algorithm and content addressing method of content-addresed derivation outputs is now separated into two fields `hashAlgo` and `method`,
+rather than one field with an arcane `:`-separated format.
+
+This JSON format is only used by the experimental `nix derivation` family of commands, at this time.
+Future revisions are expected as the JSON format is still not entirely in compliance even after these changes.
--- a/doc/manual/src/SUMMARY.md.in
+++ b/doc/manual/src/SUMMARY.md.in
@ -20,6 +20,7 @@
  - [File System Object](store/file-system-object.md)
    - [Content-Addressing File System Objects](store/file-system-object/content-address.md)
  - [Store Object](store/store-object.md)
+    - [Content-Addressing Store Objects](store/store-object/content-address.md)
  - [Store Path](store/store-path.md)
  - [Store Types](store/types/index.md)
 {{#include ./store/types/SUMMARY.md}}
--- a/doc/manual/src/language/advanced-attributes.md
+++ b/doc/manual/src/language/advanced-attributes.md
@ -197,37 +197,40 @@ Derivations can declare some infrequently used optional attributes.
    `outputHashAlgo` can only be `null` when `outputHash` follows the SRI format.

    The `outputHashMode` attribute determines how the hash is computed.
-    It must be one of the following two values:
+    It must be one of the following values:

-    <!-- FIXME link to store object content-addressing not file system object content addressing once we have the page for that. -->
-
-      - `"flat"`
-
-        The output must be a non-executable regular file; if it isn’t, the build fails.
-        The hash is
-        [simply computed over the contents of that file](@docroot@/store/file-system-object/content-address.md#serial-flat)
-        (so it’s equal to what Unix commands like `sha256sum` or `sha1sum` produce).
+      - [`"flat"`](@docroot@/store/store-object/content-address.md#method-flat)

        This is the default.

-      - `"recursive"` or `"nar"`
+      - [`"recursive"` or `"nar"`](@docroot@/store/store-object/content-address.md#method-nix-archive)

-        The hash is computed over the
-        [Nix Archive (NAR)](@docroot@/store/file-system-object/content-address.md#serial-nix-archive)
-        dump of the output (i.e., the result of [`nix-store --dump`](@docroot@/command-ref/nix-store/dump.md)).
-        In this case, the output is allowed to be any [file system object], including directories and more.
+        > **Compatibility**
+        >
+        > `"recursive"` is the traditional way of indicating this,
+        > and is supported since 2005 (virtually the entire history of Nix).
+        > `"nar"` is more clear, and consistent with other parts of Nix (such as the CLI),
+        > however support for it is only added in Nix version 2.21.

-    `"recursive"` is the traditional way of indicating this,
-    and is supported since 2005 (virtually the entire history of Nix).
-    `"nar"` is more clear, and consistent with other parts of Nix (such as the CLI),
-    however support for it is only added in Nix version 2.21.
+      - [`"text"`](@docroot@/store/store-object/content-address.md#method-text)
+
+        > **Warning**
+        >
+        > The use of this method for derivation outputs is part of the [`dynamic-derivations`][xp-feature-dynamic-derivations] experimental feature.
+
+      - [`"git"`](@docroot@/store/store-object/content-address.md#method-git)
+
+        > **Warning**
+        >
+        > This method is part of the [`git-hashing`][xp-feature-git-hashing] experimental feature.

  - [`__contentAddressed`]{#adv-attr-__contentAddressed}
+
    > **Warning**
    > This attribute is part of an [experimental feature](@docroot@/contributing/experimental-features.md).
    >
    > To use this attribute, you must enable the
-    > [`ca-derivations`](@docroot@/contributing/experimental-features.md#xp-feature-ca-derivations) experimental feature.
+    > [`ca-derivations`][xp-feature-ca-derivations] experimental feature.
    > For example, in [nix.conf](../command-ref/conf-file.md) you could add:
    >
    > ```
@ -359,3 +362,7 @@ Derivations can declare some infrequently used optional attributes.
  ```

  ensures that the derivation can only be built on a machine with the `kvm` feature.
+
+[xp-feature-ca-derivations]: @docroot@/contributing/experimental-features.md#xp-feature-ca-derivations
+[xp-feature-dynamic-derivations]: @docroot@/contributing/experimental-features.md#xp-feature-dynamic-derivations
+[xp-feature-git-hashing]: @docroot@/contributing/experimental-features.md#xp-feature-git-hashing
--- a/doc/manual/src/protocols/json/derivation.md
+++ b/doc/manual/src/protocols/json/derivation.md
@ -18,10 +18,30 @@ is a JSON object with the following fields:
  Information about the output paths of the derivation.
  This is a JSON object with one member per output, where the key is the output name and the value is a JSON object with these fields:

-  * `path`: The output path.
+  * `path`:
+    The output path, if it is known in advanced.
+    Otherwise, `null`.
+
+
+  * `method`:
+    For an output which will be [content addresed], a string representing the [method](@docroot@/store/store-object/content-address.md) of content addressing that is chosen.
+    Valid method strings are:
+
+    - [`flat`](@docroot@/store/store-object/content-address.md#method-flat)
+    - [`nar`](@docroot@/store/store-object/content-address.md#method-nix-archive)
+    - [`text`](@docroot@/store/store-object/content-address.md#method-text)
+    - [`git`](@docroot@/store/store-object/content-address.md#method-git)
+
+    Otherwise, `null`.

  * `hashAlgo`:
-    For fixed-output derivations, the hashing algorithm (e.g. `sha256`), optionally prefixed by `r:` if `hash` denotes a NAR hash rather than a flat file hash.
+    For an output which will be [content addresed], the name of the hash algorithm used.
+    Valid algorithm strings are:
+
+    - `md5`
+    - `sha1`
+    - `sha256`
+    - `sha512`

  * `hash`:
    For fixed-output derivations, the expected content hash in base-16.
@ -32,7 +52,8 @@ is a JSON object with the following fields:
  > "outputs": {
  >   "out": {
  >     "path": "/nix/store/2543j7c6jn75blc3drf4g5vhb1rhdq29-source",
-  >     "hashAlgo": "r:sha256",
+  >     "method": "nar",
+  >     "hashAlgo": "sha256",
  >     "hash": "6fc80dcc62179dbc12fc0b5881275898f93444833d21b89dfe5f7fbcbb1d0d62"
  >   }
  > }
--- a/doc/manual/src/protocols/store-path.md
+++ b/doc/manual/src/protocols/store-path.md
@ -36,18 +36,23 @@ where
 - `type` = one of:

  - ```ebnf
-    | "text" ( ":" store-path )*
+    | "text" { ":" store-path }
    ```

-    for encoded derivations written to the store.
+    This is for the
+    ["Text"](@docroot@/store/store-object/content-address.md#method-text)
+    method of content addressing store objects.
    The optional trailing store paths are the references of the store object.

  - ```ebnf
-    | "source" ( ":" store-path )*
+    | "source" { ":" store-path } [ ":self" ]
    ```

-    For paths copied to the store and hashed via a [Nix Archive (NAR)] and [SHA-256][sha-256].
-    Just like in the text case, we can have the store objects referenced by their paths.
+    This is for the
+    ["Nix Archive"](@docroot@/store/store-object/content-address.md#method-nix-archive)
+    method of content addressing store objects,
+    if the hash algorithm is [SHA-256].
+    Just like in the "Text" case, we can have the store objects referenced by their paths.
    Additionally, we can have an optional `:self` label to denote self reference.

  - ```ebnf
@ -55,8 +60,12 @@ where
    ```

    For either the outputs built from derivations,
-    paths copied to the store hashed that area single file hashed directly, or the via a hash algorithm other than [SHA-256][sha-256].
-    (in that case "source" is used; this is only necessary for compatibility).
+    or content-addressed store objects that are not using one of the two above cases.
+    To be explicit about the latter, that is currently these methods:
+
+    - ["Flat"](@docroot@/store/store-object/content-address.md#method-flat)
+    - ["Git"](@docroot@/store/store-object/content-address.md#method-git)
+    - ["Nix Archive"](@docroot@/store/store-object/content-address.md#method-nix-archive) if the hash algorithm is not [SHA-256].

    `id` is the name of the output (usually, "out").
    For content-addressed store objects, `id`, is always "out".
@ -116,7 +125,7 @@ where
      Also note that NAR + SHA-256 must not use this case, and instead must use the `type` = `"source:" ...` case.

 [Nix Archive (NAR)]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
-[sha-256]: https://en.m.wikipedia.org/wiki/SHA-256
+[SHA-256]: https://en.m.wikipedia.org/wiki/SHA-256

 ### Historical Note

--- a/doc/manual/src/store/file-system-object/content-address.md
+++ b/doc/manual/src/store/file-system-object/content-address.md
@ -1,7 +1,9 @@
 # Content-Addressing File System Objects

 For many operations, Nix needs to calculate [a content addresses](@docroot@/glossary.md#gloss-content-address) of [a file system object][file system object].
-Usually this is needed as part of content addressing [store objects], since store objects always have a root file system object.
+Usually this is needed as part of
+[content addressing store objects](../store-object/content-address.md),
+since store objects always have a root file system object.
 But some command-line utilities also just work on "raw" file system objects, not part of any store object.

 Every content addressing scheme Nix uses ultimately involves feeding data into a [hash function](https://en.wikipedia.org/wiki/Hash_function), and getting back an opaque fixed-size digest which is deemed a content address.
@ -18,6 +20,9 @@ A single file object can just be hashed by its contents.
 This is not enough information to encode the fact that the file system object is a file,
 but if we *already* know that the FSO is a single non-executable file by other means, it is sufficient.

+Because the hashed data is just the raw file, as is, this choice is good for compatibility with other systems.
+For example, Unix commands like `sha256sum` or `sha1sum` will produce hashes for single files that match this.
+
 ### Nix Archive (NAR) { #serial-nix-archive }

 For the other cases of [file system objects][file system object], especially directories with arbitrary descendents, we need a more complex serialisation format.
@ -69,7 +74,7 @@ every non-directory object is owned by a parent directory, and the entry that re
 However, if the root object is not a directory, then we have no way of knowing which one of an executable file, non-executable file, or symlink it is supposed to be.

 In response to this, we have decided to treat a bare file as non-executable file.
-This is similar to do what we do with [flat serialisation](#flat), which also lacks this information.
+This is similar to do what we do with [flat serialisation](#serial-flat), which also lacks this information.
 To avoid an address collision, attempts to hash a bare executable file or symlink will result in an error (just as would happen for flat serialisation also).
 Thus, Git can encode some, but not all of Nix's "File System Objects", and this sort of content-addressing is likewise partial.

--- a/doc/manual/src/store/store-object/content-address.md
+++ b/doc/manual/src/store/store-object/content-address.md
@ -0,0 +1,95 @@
+# Content-Addressing Store Objects
+
+Just [like][fso-ca] [File System Objects][File System Object],
+[Store Objects][Store Object] can also be [content-addressed](@docroot@/glossary.md#gloss-content-addressed),
+unless they are [input-addressed](@docroot@/glossary.md#gloss-input-addressed-store-object).
+
+For store objects, the content address we produce will take the form of a [Store Path] rather than regular hash.
+In particular, the content-addressing scheme will ensure that the digest of the store path is solely computed from the
+
+- file system object graph (the root one and its children, if it has any)
+- references
+- [store directory](../store-path.md#store-directory)
+- name
+
+of the store object, and not any other information, which would not be an intrinsic property of that store object.
+
+For the full specification of the algorithms involved, see the [specification of store path digests][sp-spec].
+
+[File System Object]: ../file-system-object.md
+[Store Object]: ../store-object.md
+[Store Path]: ../store-path.md
+
+## Content addressing each part of a store object
+
+### File System Objects
+
+With all currently supported store object content addressing methods, the file system object is always [content-addressed][fso-ca] first, and then that hash is incorporated into content address computation for the store object.
+
+### References
+
+With all currently supported store object content addressing methods,
+other objects are referred to by their regular (string-encoded-) [store paths][Store Path].
+
+Self-references however cannot be referred to by their path, because we are in the midst of describing how to compute that path!
+
+> The alternative would require finding as hash function fixed point, i.e. the solution to an equation in the form
+> ```
+> digest = hash(..... || digest || ....)
+> ```
+> which is computationally infeasible.
+> As far as we know, this is equivalent to finding a hash collision.
+
+Instead we just have a "has self reference" boolean, which will end up affecting the digest.
+
+### Name and Store Directory
+
+These two items affect the digest in a way that is standard for store path digest computations and not specific to content-addressing.
+Consult the [specification of store path digests][sp-spec] for further details.
+
+## Content addressing Methods
+
+For historical reasons, we don't support all features in all combinations.
+Each currently supported method of content addressing chooses a single method of file system object hashing, and may offer some restrictions on references.
+The names and store directories are unrestricted however.
+
+### Flat { #method-flat }
+
+This uses the corresponding [Flat](../file-system-object/content-address.md#serial-flat) method of file system object content addressing.
+
+References are not supported: store objects with flat hashing *and* references can not be created.
+
+### Text { #method-text }
+
+This also uses the corresponding [Flat](../file-system-object/content-address.md#serial-flat) method of file system object content addressing.
+
+References to other store objects are supported, but self references are not.
+
+This is the only store-object content-addressing method that is not named identically with a corresponding file system object method.
+It is somewhat obscure, mainly used for "drv files"
+(derivations serialized as store objects in their ["ATerm" file format](@docroot@/protocols/derivation-aterm.md)).
+Prefer another method if possible.
+
+### Nix Archive { #method-nix-archive }
+
+This uses the corresponding [Nix Archive](../file-system-object/content-address.md#serial-nix-archive) method of file system object content addressing.
+
+References (to other store objects and self references alike) are supported so long as the hash algorithm is SHA-256, but not (neither kind) otherwise.
+
+### Git { #method-git }
+
+> **Warning**
+>
+> This method is part of the [`git-hashing`][xp-feature-git-hashing] experimental feature.
+
+This uses the corresponding [Git](../file-system-object/content-address.md#serial-git) method of file system object content addressing.
+
+References are not supported.
+
+Only SHA-1 is supported at this time.
+If [SHA-256-based Git](https://git-scm.com/docs/hash-function-transition)
+becomes more widespread, this restriction will be revisited.
+
+[fso-ca]: ../file-system-object/content-address.md
+[sp-spec]: @docroot@/protocols/store-path.md
+[xp-feature-git-hashing]: @docroot@/contributing/experimental-features.md#xp-feature-git-hashing