I happened to want to create a hash table with integer keys and went looking for a suitable function. As usual, Google is your friend. And as usual, once you start researching things on the 'net, hours can go by.

Thomas Wang has a good discussion of various integer hash functions, but that also lead me elsewhere to discussions of good hash functions in general.

In the past, I've found that many of the hash functions that are claimed as being better than Knuth's classic string hash function don't actually prove to be any better by most metrics, and some seem to be much worse.

For example, one popular hash on the street these days seems to be Paul Hsieh's SuperFastHash. It does run quickly, and on the whole its statistical properties seem to shake out reasonably well. But when you look at the actual integers it returns, in my tests using /usr/share/dict/web2 on my Mac, there seem to be a far more collisions than you'd statistically expect. Statisitically, you'd expect about six collisions in the 32-bit space. Knuth's hash function has only five, and they're very dissimilar words, namely:

227010540:  autovivisection grovelings
890239928:  dialypetalous mumpishness
2851341963: anisostemonous umbellifer
3508170762: ctenodactyl fuliginousness
3909438781: prerogativity puzzleheaded

The SuperFastHash function, on the other hand has 59 collisions, an order of magnitude more. Here are a representative few:

432696082: Cotinga Cotonam
535511585: miscoin misfond
631000912: amidine aminity
668950620: untossed unworked
738886349: hennin penman
749072160: revisible rewirable

Notice that the words that hash the same seem somehow similar. That's just weird.

In addition to his own hash function, Paul Hsieh also has some other useful code on his site, including a hash test program comparing several different hash implementations for speed, and a portable implementation of stdint.h.

The FNV (a.k.a. Fowler/Noll/Vo) hash is another hash function that seems popular these days. It seems broadly similar to Knuth's hash function, but does a better job of distributing hashes for short strings across the full 32-bit space for hashes. For example, Knuth's hash hashes bat and cat to 137867 and 139236 respectively, but FNV hashes them to 950299920 and 1587996537. Like Knuth's hash function, FNV places single-letter words in adjacent spots (although there is an alternative version, FNV-1a, that avoids this problem), Here are the collisions in 32-bit space for FNV.

374764810:  diabolically koilanaglyphic
1055878936: deuteropathic vertebrosacral
1290893597: parer vila
1408982841: basiotribe narcotinic
1713658462: averral climatical
3129894270: Scorpididae transposer

Bob Jenkins published an article in Dr. Dobbs journal in 1997, providing a good hash function of his own, and has continued to tweak his code since. His page on hashing has lots of good stuff, including links to his code. His hash function is no slouch, and is the only one I looked at that maps single characters to radically different positions. Below are his 32-bit collisions, again with about the distribution you'd expect:

728135544: chorda fingerbreadth
733592810: stockily virginally
893264706: combaron unlimited
1456871225: gaspingly secularistic
1486736111: unbodied Yankee
2683815022: blackpoll Paharia
2947362466: Borinqueno unskewed
3298503807: distributress granulator

Thanks to Paul Hsieh's test program, here are some performance numbers for these different implementations (as benchmarked on my aging PowerBook G4):

FNVHash         :  3.9300s
knuthHash       :  2.9700s
BobJenkins      :  2.4600s
SuperFastHash   :  2.2800s

Running on some other architectures, I find that FNV and Knuth are really about the same (the difference between the two seems to be a G4 artifact). On the whole, although it may look like there's a big difference between the algorithms, in my experience, I've found that the hash function, (or even the whole hash table implementation!) isn't really the bottleneck. In other words, if you make your hash function twice as fast, usually no one will notice.

Paul Hsieh's SuperFastHash may be a tiny bit faster than Bob Jenkins's hash, but I think not enough to really stand out, and its strange collisions worry me. Bob Jenkins's hash function is probably the best and the one to use if you want an industrial-strength hash, but it is massive and complex. FNV may be slower, but it's short and sweet, just two mystery constants to remember. But if I have to write it myself, from memory, I'm still going to go with Knuth. Usually, Knuth's slightly odd pattern really won't matter.

For more, see Wikipedia's coverage of hash tables, which also has pretty good coverage of hash functions.