Expanded test suite

* Lang now verifies errors and parse output * Some new miscellaneous tests * Easy way to update the tests * Document workflow in manual * Use `!` not `~` as separater char for sed It is confusing to use `~` when we are talking about paths and home directories! * Test test suite itself (`test/lang-test/infra.sh`) Additionally, run shellcheck on `tests/lang.sh` to help ensure it is correct, now that is is more complex. Co-authored-by: Robert Hensing <roberth@users.noreply.github.com> Co-authored-by: Valentin Gagarin <valentin.gagarin@tweag.io>
2025-06-25 10:41:16 +02:00 · 2015-09-04 14:23:08 -06:00 · 2015-09-04 14:23:08 -06:00 · c70484454f
commit c70484454f
parent c2c8187118
73 changed files with 762 additions and 36 deletions
--- a/doc/manual/src/contributing/testing.md
+++ b/doc/manual/src/contributing/testing.md
@ -86,6 +86,31 @@ GNU gdb (GDB) 12.1
 One can debug the Nix invocation in all the usual ways.
 For example, enter `run` to start the Nix invocation.

+### Characterization testing
+
+Occasionally, Nix utilizes a technique called [Characterization Testing](https://en.wikipedia.org/wiki/Characterization_test) as part of the functional tests.
+This technique is to include the exact output/behavior of a former version of Nix in a test in order to check that Nix continues to produce the same behavior going forward.
+
+For example, this technique is used for the language tests, to check both the printed final value if evaluation was successful, and any errors and warnings encountered.
+
+It is frequently useful to regenerate the expected output.
+To do that, rerun the failed test with `_NIX_TEST_ACCEPT=1`.
+(At least, this is the convention we've used for `tests/lang.sh`.
+If we add more characterization testing we should always strive to be consistent.)
+
+An interesting situation to document is the case when these tests are "overfitted".
+The language tests are, again, an example of this.
+The expected successful output of evaluation is supposed to be highly stable – we do not intend to make breaking changes to (the stable parts of) the Nix language.
+However, the errors and warnings during evaluation (successful or not) are not stable in this way.
+We are free to change how they are displayed at any time.
+
+It may be surprising that we would test non-normative behavior like diagnostic outputs.
+Diagnostic outputs are indeed not a stable interface, but they still are important to users.
+By recording the expected output, the test suite guards against accidental changes, and ensure the *result* (not just the code that implements it) of the diagnostic code paths are under code review.
+Regressions are caught, and improvements always show up in code review.
+
+To ensure that characterization testing doesn't make it harder to intentionally change these interfaces, there always must be an easy way to regenerate the expected output, as we do with `_NIX_TEST_ACCEPT=1`.
+
 ## Integration tests

 The integration tests are defined in the Nix flake under the `hydraJobs.tests` attribute.