[Agda-dev] More efficient handling of TopLevelModuleNames

Andreas Abel abela at chalmers.se
Tue Dec 2 23:15:48 CET 2014


During AIM XX we investigated the high serialization times, noting that 
for the standard library, 3000000 TopLevelModuleNames are serialized 
(that would be e.g. a file name like Data.Nat.Properties.Simple).

Nisse and I talked briefly about this problem last Friday, and one idea 
is to use cryptographic hashes of module names instead of their string 
representation.

Of course, there is the risk of collisions (which would be fatal).

   https://en.wikipedia.org/wiki/Birthday_attack

shows a table with collision probabilities.  Interestingly, with a 
32-bit or 64-bit hash one can virtually store no module names if one 
wants to stay collision free.  For a collision probability < 10^-18 we 
could only store 6 (!) different module names.

But even more interestingly, with 128 bit one can distinguish already 
10^10 (!) module names with sufficiently high probability.  This is more 
we ever need, most probably (at least for a while, Agda developments 
will stay way below 10^10 files ;-)).

A 128bit hash seems to be an alternative to a String representation of 
TopLevelModuleNames which could lead to a significant speed up of 
serialization.


-- 
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

andreas.abel at gu.se
http://www2.tcs.ifi.lmu.de/~abel/


More information about the Agda-dev mailing list