[Agda-dev] More efficient handling of TopLevelModuleNames

Nicolas Pouillard np at nicolaspouillard.fr
Wed Dec 3 10:11:02 CET 2014


On 12/02/2014 11:15 PM, Andreas Abel wrote:
> During AIM XX we investigated the high serialization times, noting 
> that for the standard library, 3000000 TopLevelModuleNames are 
> serialized (that would be e.g. a file name like 
> Data.Nat.Properties.Simple).
>
> Nisse and I talked briefly about this problem last Friday, and one 
> idea is to use cryptographic hashes of module names instead of their 
> string representation.
>
> Of course, there is the risk of collisions (which would be fatal).
>
>   https://en.wikipedia.org/wiki/Birthday_attack
>
> shows a table with collision probabilities.  Interestingly, with a 
> 32-bit or 64-bit hash one can virtually store no module names if one 
> wants to stay collision free.  For a collision probability < 10^-18 we 
> could only store 6 (!) different module names.
>
> But even more interestingly, with 128 bit one can distinguish already 
> 10^10 (!) module names with sufficiently high probability.  This is 
> more we ever need, most probably (at least for a while, Agda 
> developments will stay way below 10^10 files ;-)).
>
> A 128bit hash seems to be an alternative to a String representation of 
> TopLevelModuleNames which could lead to a significant speed up of 
> serialization.
I support the choice of 128bit hash. As much as one can experiment with 
different hash functions to test performances. I would recommend 
choosing SHA256 truncated to 128bits. Another option would be to look at 
SHA3 which is faster (if I recall well) and secure as well.

Best regards,
-- NP


More information about the Agda-dev mailing list