[Agda-dev] More efficient handling of TopLevelModuleNames

Wed Dec 3 11:26:03 CET 2014

I'd certainly prefer a clean solution like you outline.  The question is 
where such a table should be located.

- If you do it on a per-file basis, meaning each file has its module 
table, you need to translate from one local index to another.  Ex:

   file A             defines x maps A->0, A.x -> 0.x
   file B             defines x maps B->0, B.x -> 0.x
   file C imports A,B           maps A.x -> ?, B.x -> ?

- If you have a central registry, you cannot move .agdai files from one 
machine to another; they will be meaningful only w.r.t. to a global 
module table.

- One could introduce a concept of project/package and do it 
package-local.  .agdai files would then make sense relative to a package 
context, but we get the same problem on a different level: we need 
unique package identifiers (strings?).

The current solution is that the module name (like Data.Nat.Properties) 
uniquely defines a module, and there can be only one file 
Data.Nat.Properties.*agda in the search path.

The hashing hack would just swap the module name for its hash, otherwise 
change nothing in the current handling of files and modules.

Cheers,
Andreas

On 03.12.2014 11:01, Dominique Devriese wrote:
> Just out of curiosity: are you sure that a hash is what is needed?  An
> alternative might be to emit a table with the module names that will
> be used and then refer to them later by their index in the previously
> emitted table?  This way, one might save more bits with 32 or 64 bit
> indices instead of 128bit hashes?
>
> Regards
> Dominique
>
> 2014-12-02 23:15 GMT+01:00 Andreas Abel <abela at chalmers.se>:
>> During AIM XX we investigated the high serialization times, noting that for
>> the standard library, 3000000 TopLevelModuleNames are serialized (that would
>> be e.g. a file name like Data.Nat.Properties.Simple).
>>
>> Nisse and I talked briefly about this problem last Friday, and one idea is
>> to use cryptographic hashes of module names instead of their string
>> representation.
>>
>> Of course, there is the risk of collisions (which would be fatal).
>>
>>    https://en.wikipedia.org/wiki/Birthday_attack
>>
>> shows a table with collision probabilities.  Interestingly, with a 32-bit or
>> 64-bit hash one can virtually store no module names if one wants to stay
>> collision free.  For a collision probability < 10^-18 we could only store 6
>> (!) different module names.
>>
>> But even more interestingly, with 128 bit one can distinguish already 10^10
>> (!) module names with sufficiently high probability.  This is more we ever
>> need, most probably (at least for a while, Agda developments will stay way
>> below 10^10 files ;-)).
>>
>> A 128bit hash seems to be an alternative to a String representation of
>> TopLevelModuleNames which could lead to a significant speed up of
>> serialization.
>>
>>
>> --
>> Andreas Abel  <><      Du bist der geliebte Mensch.
>>
>> Department of Computer Science and Engineering
>> Chalmers and Gothenburg University, Sweden
>>
>> andreas.abel at gu.se
>> http://www2.tcs.ifi.lmu.de/~abel/
>> _______________________________________________
>> Agda-dev mailing list
>> Agda-dev at lists.chalmers.se
>> https://lists.chalmers.se/mailman/listinfo/agda-dev
> _______________________________________________
> Agda-dev mailing list
> Agda-dev at lists.chalmers.se
> https://lists.chalmers.se/mailman/listinfo/agda-dev
>

-- 
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

andreas.abel at gu.se
http://www2.tcs.ifi.lmu.de/~abel/