[Pharo-project] string hash collisions?

Andres Valloud avalloud at smalltalk.comcastbiz.net
Mon Jan 2 06:00:35 CET 2012


Part of why those attacks are so easy is that the factor for the 
multiplicative hash function is very poor (i.e.: 33, 31, etc).  There is 
nothing spectacular about this reasoning, those "cool looking" factors 
chosen using the digital (finger) method should have been stamped out a 
long time ago.  The hashing book I wrote has a ton of additional details 
on how to choose a factor properly if you want to use multiplicative 
hash functions, see here:

http://www.lulu.com/spotlight/avsmalltalkbooks

Of course you can avoid all that trouble by using a crypto hash, at the 
cost of the (mostly unnecessary) crypto hashing.

The authors suggest using a randomized hash function, but how do 
persistent hash tables survive across invocations of the randomizer? 
Moreover, I suspect the following:

1.  The multiplicative hash function they show as "improved" in Perl 
5.8.1 seems to use effectively 1025 as a factor which is generally 
terrible (slide 77).

2.  How is the seed determined?  What criteria are used to guarantee it 
is a good seed value?

3.  It is likely you can still abuse the hash function with series of 
null bytes, or by constructing strings that abuse that poor hash 
function into producing tons of collisions mod 2^x, x < 32.

To sum it up...

1.  If you use multiplicative hash functions, then you must choose the 
factor properly.  Go read the literature.

2.  If you are expecting attackers, consider making it difficult to find 
collisions in your hash function.  A way to make it REALLY hard is to 
use a crypto hash.  But the applicability depends on the application. 
Generally speaking, making something like a crypto hash function the 
default hash function is a bad idea.

3.  Just "randomizing" a hash function can be as useful as "randomizing" 
a random number generator.  Obfuscation is not equivalent to a fix. 
Without a clear understanding of why the results have to be good, 
chances are the results will be poor.

On 12/30/11 7:58 , Philippe Marschall wrote:
> Hi
>
> As you probably noted string hash collisions are all the rage these days
> [1]. Has anybody looked into whether this applies to Pharo as well?
>
>    [1] http://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html
>
> Cheers
> Philippe
>
>



More information about the Pharo-project mailing list