[Agda-dev] More efficient handling of TopLevelModuleNames
Andreas Abel
abela at chalmers.se
Tue Dec 2 23:15:48 CET 2014
During AIM XX we investigated the high serialization times, noting that
for the standard library, 3000000 TopLevelModuleNames are serialized
(that would be e.g. a file name like Data.Nat.Properties.Simple).
Nisse and I talked briefly about this problem last Friday, and one idea
is to use cryptographic hashes of module names instead of their string
representation.
Of course, there is the risk of collisions (which would be fatal).
https://en.wikipedia.org/wiki/Birthday_attack
shows a table with collision probabilities. Interestingly, with a
32-bit or 64-bit hash one can virtually store no module names if one
wants to stay collision free. For a collision probability < 10^-18 we
could only store 6 (!) different module names.
But even more interestingly, with 128 bit one can distinguish already
10^10 (!) module names with sufficiently high probability. This is more
we ever need, most probably (at least for a while, Agda developments
will stay way below 10^10 files ;-)).
A 128bit hash seems to be an alternative to a String representation of
TopLevelModuleNames which could lead to a significant speed up of
serialization.
--
Andreas Abel <>< Du bist der geliebte Mensch.
Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden
andreas.abel at gu.se
http://www2.tcs.ifi.lmu.de/~abel/
More information about the Agda-dev
mailing list