[Agda-dev] More efficient handling of TopLevelModuleNames

Dominique Devriese dominique.devriese at cs.kuleuven.be
Mon Dec 8 11:20:08 CET 2014


Funny, I completely missed this, sorry :).

Regards
Dominique

2014-12-08 11:13 GMT+01:00 Andreas Abel <abela at chalmers.se>:
> Yes, we already do this, see Agda.Syntax.Abstract.Name.
>
> The TopLevelModuleName thing is only relevant for
> serialization/deserialization (and linking maybe, in some future version of
> Agda with a proper compiler).
>
> Cheers,
> Andreas
>
>
> On 08.12.2014 10:06, Dominique Devriese wrote:
>>
>> FWIW, working with some form of integer representations of names might
>> also speed up type-checking.  In GHC, for example, scope-checking
>> turns string names in the source code into a representation of names
>> that carry a unique integer in addition to the string version of the
>> name, in order to speed up name comparisons and provide faster "name
>> to *"-maps.
>>
>> Regards,
>> Dominique
>>
>> 2014-12-04 13:31 GMT+01:00 Andreas Abel <abela at chalmers.se>:
>>>
>>> These strings are not serialized as such, but turned to a UID during
>>> serialization, via a HashMap.  The HashMap is serialized as well. Upon
>>> deserialization, the UIDs are turned into strings again.
>>>
>>> If we used, say 128bit hashes instead of strings, one could either
>>> serialize
>>> these directly, or turn into UIDs again via a HashMap.  Even the second
>>> alternative would probably be faster, because equality checking for
>>> 128bit
>>> Bytestrings should be faster than for strings.  Also a HashMap for hashes
>>> should be faster, since as getting n-bit hashes of 128bit hashes is very
>>> quick: just take the first n bits.
>>>
>>> Cheers,
>>> Andreas
>>>
>>> On 04.12.2014 11:31, Guillaume Brunerie wrote:
>>>>
>>>>
>>>> I'm missing something.
>>>> Why would it be faster to compute 3000000 hashes as opposed to just
>>>> keeping the strings as is?
>>>>
>>>> Le 2 déc. 2014 23:15, "Andreas Abel" <abela at chalmers.se
>>>> <mailto:abela at chalmers.se>> a écrit :
>>>>
>>>>      During AIM XX we investigated the high serialization times, noting
>>>>      that for the standard library, 3000000 TopLevelModuleNames are
>>>>      serialized (that would be e.g. a file name like
>>>>      Data.Nat.Properties.Simple).
>>>>
>>>>      Nisse and I talked briefly about this problem last Friday, and one
>>>>      idea is to use cryptographic hashes of module names instead of
>>>> their
>>>>      string representation.
>>>>
>>>>      Of course, there is the risk of collisions (which would be fatal).
>>>>
>>>>      https://en.wikipedia.org/wiki/__Birthday_attack
>>>>      <https://en.wikipedia.org/wiki/Birthday_attack>
>>>>
>>>>      shows a table with collision probabilities.  Interestingly, with a
>>>>      32-bit or 64-bit hash one can virtually store no module names if
>>>> one
>>>>      wants to stay collision free.  For a collision probability < 10^-18
>>>>      we could only store 6 (!) different module names.
>>>>
>>>>      But even more interestingly, with 128 bit one can distinguish
>>>>      already 10^10 (!) module names with sufficiently high probability.
>>>>      This is more we ever need, most probably (at least for a while,
>>>> Agda
>>>>      developments will stay way below 10^10 files ;-)).
>>>>
>>>>      A 128bit hash seems to be an alternative to a String representation
>>>>      of TopLevelModuleNames which could lead to a significant speed up
>>>> of
>>>>      serialization.
>>>>
>>>>
>>>>      --
>>>>      Andreas Abel  <><      Du bist der geliebte Mensch.
>>>>
>>>>      Department of Computer Science and Engineering
>>>>      Chalmers and Gothenburg University, Sweden
>>>>
>>>>      andreas.abel at gu.se <mailto:andreas.abel at gu.se>
>>>>      http://www2.tcs.ifi.lmu.de/~__abel/
>>>> <http://www2.tcs.ifi.lmu.de/~abel/>
>>>>      _________________________________________________
>>>>      Agda-dev mailing list
>>>>      Agda-dev at lists.chalmers.se <mailto:Agda-dev at lists.chalmers.se>
>>>>      https://lists.chalmers.se/__mailman/listinfo/agda-dev
>>>>      <https://lists.chalmers.se/mailman/listinfo/agda-dev>
>>>>
>>>
>>> --
>>> Andreas Abel  <><      Du bist der geliebte Mensch.
>>>
>>> Department of Computer Science and Engineering
>>> Chalmers and Gothenburg University, Sweden
>>>
>>> andreas.abel at gu.se
>>> http://www2.tcs.ifi.lmu.de/~abel/
>>> _______________________________________________
>>> Agda-dev mailing list
>>> Agda-dev at lists.chalmers.se
>>> https://lists.chalmers.se/mailman/listinfo/agda-dev
>>
>> _______________________________________________
>> Agda-dev mailing list
>> Agda-dev at lists.chalmers.se
>> https://lists.chalmers.se/mailman/listinfo/agda-dev
>>
>
> --
> Andreas Abel  <><      Du bist der geliebte Mensch.
>
> Department of Computer Science and Engineering
> Chalmers and Gothenburg University, Sweden
>
> andreas.abel at gu.se
> http://www2.tcs.ifi.lmu.de/~abel/


More information about the Agda-dev mailing list