[Agda-dev] More efficient handling of TopLevelModuleNames

Andreas Abel abela at chalmers.se
Thu Dec 4 10:10:57 CET 2014


Well, we want the properties of what are called "cryptographic" hash 
functions: non-locality and absence of collisions.  The Java string hash

   \sum_i s[i]*31^i

potentially does not have these properties in a sufficient way.  (I 
might be wrong, but I would need to see sufficient evidence to convince 
me of the opposite.)

In any case, even for the optimally distributing hash functions, 32 or 
64 bit hashes cannot avoid collisions.

But your actual question is probably whether the "cryptographic" 
character is needed.  We should always check that the module table (map 
from TopLevelModuleNames to hashes) is injective.  If not, we can only 
exit with a fatal error stating that one of the TopLevelModules need to 
be renamed.  Such an error should only occur with virtually 0 
propability, as such a renaming might not be feasible (e.g. if you use 
third-party libraries).

A cryptographic hash will prevent the engineering of a collision by a 
maliciously minded user, say, who wants to discredit Agda.

Cheers,
Andreas

On 04.12.2014 07:38, Dominique Devriese wrote:
> I may still be missing something, but are we sure that *cryptographic*
> hashes are needed?  Why not a simpler (and probably faster),
> non-cryptographic hash function, like
>
>    http://en.wikipedia.org/wiki/Java_hashCode()#The_java.lang.String_hash_function
> ?
>
> Regards,
> Dominique
>
>
> 2014-12-03 21:18 GMT+01:00 Guilhem Moulin <guilhem.moulin at chalmers.se>:
>> On Wed, 03 Dec 2014 at 10:11:02 +0100, Nicolas Pouillard wrote:
>>> On 12/02/2014 11:15 PM, Andreas Abel wrote:
>>>> A 128bit hash seems to be an alternative to a String representation of
>>>> TopLevelModuleNames which could lead to a significant speed up of
>>>> serialization.
>>>
>>> I support the choice of 128bit hash. As much as one can experiment with
>>> different hash functions to test performances. I would recommend choosing
>>> SHA256 truncated to 128bits. Another option would be to look at SHA3 which
>>> is faster (if I recall well) and secure as well.
>>
>> Actually SHA-512 can be faster than SHA-256 on 64 bits platforms [1],
>> essentially because the word size is larger.  But anyway, the common
>> cryptographic hash functions (MD5 and SHA-{1,2,3}) were optimized for
>> large messages, while I guess a typical QId will hardly be longer than,
>> say 128 bytes.
>>
>> I'm unsure how significant the overhead would be, but one may want to
>> benchmark BLAKE2 (derived of BLAKE, a SHA-3 finalist) which
>>
>>    “is extremely fast both when hashing a single large file and when
>>     hashing many small files. For example, BLAKE2s hashes 64-byte files
>>     quicker than SHA-1, MD5, and even MD4.” — https://blake2.net/
>>
>> --
>> Guilhem.
>>
>> [1] https://community.emc.com/community/edn/rsashare/blog/2010/11/01/sha-2-algorithms-when-sha-512-is-more-secure-and-faster
>>
>> _______________________________________________
>> Agda-dev mailing list
>> Agda-dev at lists.chalmers.se
>> https://lists.chalmers.se/mailman/listinfo/agda-dev
>>
> _______________________________________________
> Agda-dev mailing list
> Agda-dev at lists.chalmers.se
> https://lists.chalmers.se/mailman/listinfo/agda-dev
>

-- 
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

andreas.abel at gu.se
http://www2.tcs.ifi.lmu.de/~abel/


More information about the Agda-dev mailing list