[Agda-dev] More efficient handling of TopLevelModuleNames

Andreas Abel abela at chalmers.se
Mon Dec 8 11:13:17 CET 2014


Yes, we already do this, see Agda.Syntax.Abstract.Name.

The TopLevelModuleName thing is only relevant for 
serialization/deserialization (and linking maybe, in some future version 
of Agda with a proper compiler).

Cheers,
Andreas

On 08.12.2014 10:06, Dominique Devriese wrote:
> FWIW, working with some form of integer representations of names might
> also speed up type-checking.  In GHC, for example, scope-checking
> turns string names in the source code into a representation of names
> that carry a unique integer in addition to the string version of the
> name, in order to speed up name comparisons and provide faster "name
> to *"-maps.
>
> Regards,
> Dominique
>
> 2014-12-04 13:31 GMT+01:00 Andreas Abel <abela at chalmers.se>:
>> These strings are not serialized as such, but turned to a UID during
>> serialization, via a HashMap.  The HashMap is serialized as well. Upon
>> deserialization, the UIDs are turned into strings again.
>>
>> If we used, say 128bit hashes instead of strings, one could either serialize
>> these directly, or turn into UIDs again via a HashMap.  Even the second
>> alternative would probably be faster, because equality checking for 128bit
>> Bytestrings should be faster than for strings.  Also a HashMap for hashes
>> should be faster, since as getting n-bit hashes of 128bit hashes is very
>> quick: just take the first n bits.
>>
>> Cheers,
>> Andreas
>>
>> On 04.12.2014 11:31, Guillaume Brunerie wrote:
>>>
>>> I'm missing something.
>>> Why would it be faster to compute 3000000 hashes as opposed to just
>>> keeping the strings as is?
>>>
>>> Le 2 déc. 2014 23:15, "Andreas Abel" <abela at chalmers.se
>>> <mailto:abela at chalmers.se>> a écrit :
>>>
>>>      During AIM XX we investigated the high serialization times, noting
>>>      that for the standard library, 3000000 TopLevelModuleNames are
>>>      serialized (that would be e.g. a file name like
>>>      Data.Nat.Properties.Simple).
>>>
>>>      Nisse and I talked briefly about this problem last Friday, and one
>>>      idea is to use cryptographic hashes of module names instead of their
>>>      string representation.
>>>
>>>      Of course, there is the risk of collisions (which would be fatal).
>>>
>>>      https://en.wikipedia.org/wiki/__Birthday_attack
>>>      <https://en.wikipedia.org/wiki/Birthday_attack>
>>>
>>>      shows a table with collision probabilities.  Interestingly, with a
>>>      32-bit or 64-bit hash one can virtually store no module names if one
>>>      wants to stay collision free.  For a collision probability < 10^-18
>>>      we could only store 6 (!) different module names.
>>>
>>>      But even more interestingly, with 128 bit one can distinguish
>>>      already 10^10 (!) module names with sufficiently high probability.
>>>      This is more we ever need, most probably (at least for a while, Agda
>>>      developments will stay way below 10^10 files ;-)).
>>>
>>>      A 128bit hash seems to be an alternative to a String representation
>>>      of TopLevelModuleNames which could lead to a significant speed up of
>>>      serialization.
>>>
>>>
>>>      --
>>>      Andreas Abel  <><      Du bist der geliebte Mensch.
>>>
>>>      Department of Computer Science and Engineering
>>>      Chalmers and Gothenburg University, Sweden
>>>
>>>      andreas.abel at gu.se <mailto:andreas.abel at gu.se>
>>>      http://www2.tcs.ifi.lmu.de/~__abel/
>>> <http://www2.tcs.ifi.lmu.de/~abel/>
>>>      _________________________________________________
>>>      Agda-dev mailing list
>>>      Agda-dev at lists.chalmers.se <mailto:Agda-dev at lists.chalmers.se>
>>>      https://lists.chalmers.se/__mailman/listinfo/agda-dev
>>>      <https://lists.chalmers.se/mailman/listinfo/agda-dev>
>>>
>>
>> --
>> Andreas Abel  <><      Du bist der geliebte Mensch.
>>
>> Department of Computer Science and Engineering
>> Chalmers and Gothenburg University, Sweden
>>
>> andreas.abel at gu.se
>> http://www2.tcs.ifi.lmu.de/~abel/
>> _______________________________________________
>> Agda-dev mailing list
>> Agda-dev at lists.chalmers.se
>> https://lists.chalmers.se/mailman/listinfo/agda-dev
> _______________________________________________
> Agda-dev mailing list
> Agda-dev at lists.chalmers.se
> https://lists.chalmers.se/mailman/listinfo/agda-dev
>

-- 
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

andreas.abel at gu.se
http://www2.tcs.ifi.lmu.de/~abel/


More information about the Agda-dev mailing list