From 4fac767b5295542da040d02807dc4b3a175c0337 Mon Sep 17 00:00:00 2001 From: Andrew Poelstra Date: Sun, 12 Jan 2025 18:36:32 +0000 Subject: [PATCH] gc: replace ordered sets with unordered sets for in-memory caches During garbage collection we cache several things -- a set of known-dead paths, a set of known-alive paths, and a map of paths to their derivers. Currently they use STL maps and sets, which are ordered structures that typically are backed by binary trees. Since we are putting pseudorandom paths into these and looking them up by exact key, we don't need the ordering, and we're paying a nontrivial cost per insertion. The existing maps require O(n log n) memory and have O(log n) insertion and lookup time. We could instead use unordered maps, which are typically backed by hashmaps. These require O(n) memory and have O(1) insertion and lookup time. On my system this appears to result in a dramatic speedup -- prior to this patch I was able to delete 400k paths out of 9.5 million over the course of 34.5 hours. After this patch the same result took 89 minutes. This result should NOT be taken at face value because the two runs aren't really comparable; in particular the first started when I had 9.5 million store paths and the seconcd started with 7.8 million, so we are deleting a different set of paths starting from a much cleaner filesystem. But I do think it's indicative. Related: https://github.com/NixOS/nix/issues/9581 --- src/libstore/gc.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/libstore/gc.cc b/src/libstore/gc.cc index 45dfe4ad8..ac354f3fa 100644 --- a/src/libstore/gc.cc +++ b/src/libstore/gc.cc @@ -455,7 +455,7 @@ void LocalStore::collectGarbage(const GCOptions & options, GCResults & results) bool gcKeepOutputs = settings.gcKeepOutputs; bool gcKeepDerivations = settings.gcKeepDerivations; - StorePathSet roots, dead, alive; + std::unordered_set roots, dead, alive; struct Shared { @@ -661,7 +661,7 @@ void LocalStore::collectGarbage(const GCOptions & options, GCResults & results) } }; - std::map referrersCache; + std::unordered_map referrersCache; /* Helper function that visits all paths reachable from `start` via the referrers edges and optionally derivers and derivation