[Agda-dev] More efficient handling of TopLevelModuleNames

Dominique Devriese dominique.devriese at cs.kuleuven.be
Mon Dec 8 10:06:16 CET 2014


FWIW, working with some form of integer representations of names might
also speed up type-checking.  In GHC, for example, scope-checking
turns string names in the source code into a representation of names
that carry a unique integer in addition to the string version of the
name, in order to speed up name comparisons and provide faster "name
to *"-maps.

Regards,
Dominique

2014-12-04 13:31 GMT+01:00 Andreas Abel <abela at chalmers.se>:
> These strings are not serialized as such, but turned to a UID during
> serialization, via a HashMap.  The HashMap is serialized as well. Upon
> deserialization, the UIDs are turned into strings again.
>
> If we used, say 128bit hashes instead of strings, one could either serialize
> these directly, or turn into UIDs again via a HashMap.  Even the second
> alternative would probably be faster, because equality checking for 128bit
> Bytestrings should be faster than for strings.  Also a HashMap for hashes
> should be faster, since as getting n-bit hashes of 128bit hashes is very
> quick: just take the first n bits.
>
> Cheers,
> Andreas
>
> On 04.12.2014 11:31, Guillaume Brunerie wrote:
>>
>> I'm missing something.
>> Why would it be faster to compute 3000000 hashes as opposed to just
>> keeping the strings as is?
>>
>> Le 2 déc. 2014 23:15, "Andreas Abel" <abela at chalmers.se
>> <mailto:abela at chalmers.se>> a écrit :
>>
>>     During AIM XX we investigated the high serialization times, noting
>>     that for the standard library, 3000000 TopLevelModuleNames are
>>     serialized (that would be e.g. a file name like
>>     Data.Nat.Properties.Simple).
>>
>>     Nisse and I talked briefly about this problem last Friday, and one
>>     idea is to use cryptographic hashes of module names instead of their
>>     string representation.
>>
>>     Of course, there is the risk of collisions (which would be fatal).
>>
>>     https://en.wikipedia.org/wiki/__Birthday_attack
>>     <https://en.wikipedia.org/wiki/Birthday_attack>
>>
>>     shows a table with collision probabilities.  Interestingly, with a
>>     32-bit or 64-bit hash one can virtually store no module names if one
>>     wants to stay collision free.  For a collision probability < 10^-18
>>     we could only store 6 (!) different module names.
>>
>>     But even more interestingly, with 128 bit one can distinguish
>>     already 10^10 (!) module names with sufficiently high probability.
>>     This is more we ever need, most probably (at least for a while, Agda
>>     developments will stay way below 10^10 files ;-)).
>>
>>     A 128bit hash seems to be an alternative to a String representation
>>     of TopLevelModuleNames which could lead to a significant speed up of
>>     serialization.
>>
>>
>>     --
>>     Andreas Abel  <><      Du bist der geliebte Mensch.
>>
>>     Department of Computer Science and Engineering
>>     Chalmers and Gothenburg University, Sweden
>>
>>     andreas.abel at gu.se <mailto:andreas.abel at gu.se>
>>     http://www2.tcs.ifi.lmu.de/~__abel/
>> <http://www2.tcs.ifi.lmu.de/~abel/>
>>     _________________________________________________
>>     Agda-dev mailing list
>>     Agda-dev at lists.chalmers.se <mailto:Agda-dev at lists.chalmers.se>
>>     https://lists.chalmers.se/__mailman/listinfo/agda-dev
>>     <https://lists.chalmers.se/mailman/listinfo/agda-dev>
>>
>
> --
> Andreas Abel  <><      Du bist der geliebte Mensch.
>
> Department of Computer Science and Engineering
> Chalmers and Gothenburg University, Sweden
>
> andreas.abel at gu.se
> http://www2.tcs.ifi.lmu.de/~abel/
> _______________________________________________
> Agda-dev mailing list
> Agda-dev at lists.chalmers.se
> https://lists.chalmers.se/mailman/listinfo/agda-dev


More information about the Agda-dev mailing list