[Agda-dev] More efficient handling of TopLevelModuleNames
Nicolas Pouillard
np at nicolaspouillard.fr
Wed Dec 3 10:11:02 CET 2014
On 12/02/2014 11:15 PM, Andreas Abel wrote:
> During AIM XX we investigated the high serialization times, noting
> that for the standard library, 3000000 TopLevelModuleNames are
> serialized (that would be e.g. a file name like
> Data.Nat.Properties.Simple).
>
> Nisse and I talked briefly about this problem last Friday, and one
> idea is to use cryptographic hashes of module names instead of their
> string representation.
>
> Of course, there is the risk of collisions (which would be fatal).
>
> https://en.wikipedia.org/wiki/Birthday_attack
>
> shows a table with collision probabilities. Interestingly, with a
> 32-bit or 64-bit hash one can virtually store no module names if one
> wants to stay collision free. For a collision probability < 10^-18 we
> could only store 6 (!) different module names.
>
> But even more interestingly, with 128 bit one can distinguish already
> 10^10 (!) module names with sufficiently high probability. This is
> more we ever need, most probably (at least for a while, Agda
> developments will stay way below 10^10 files ;-)).
>
> A 128bit hash seems to be an alternative to a String representation of
> TopLevelModuleNames which could lead to a significant speed up of
> serialization.
I support the choice of 128bit hash. As much as one can experiment with
different hash functions to test performances. I would recommend
choosing SHA256 truncated to 128bits. Another option would be to look at
SHA3 which is faster (if I recall well) and secure as well.
Best regards,
-- NP
More information about the Agda-dev
mailing list