Minimal Perfect Hashing for Game Assets
Perfect hash functions are the holy grail of hashing functions. They guarantee no key collisions and work really well when dealing with data that is either static, where all keys are known in advance, or rarely changing.
Minimal perfect hash functions (MPHF) are even better. They guarantee 100% load factor, so your hash tables will contain only one entry for each key, and no empty slots.
Hash tables that use MPHF are more memory efficient and have generally faster retrieval times; they also have excellent cache locality by removing cache misses that arise due to collision resolution schemes required for traditional hash tables.
The downside is that MPHF-backed hash tables are more costly to generate and trickier to implement, as they require more setup than most hash table implementations. This makes them impractical for dynamic data set like the ones you usually find in games.
MPHF for game asset lookups
For the asset system used by Rival Fortress I decided to use MPHF-backed hash tables for asset and font kerning lookups.
The reason for adding hash table lookups to assets, instead of just indexing them statically, is because of the engine’s modding capabilities. Users will be able to add custom assets and refer to them through LUA
, so internally they have to be stored in hash tables.
Fortunately, both engine and mod assets are preprocessed offline, so I can leverage MPHF without having to pay the runtime cost.
Assets are referenced through an unsigned 32-bit integer generated by the preprocessor. The key is used to calculate the intermediate MPHF lookup table and this, in turn, is used to populate the final assets hash table that is then stored in the PAK
file.
The hash tables, generated using an implementation based on the Compress, Hash, and Displace algorithm. I’m using a load factor of 100%, and lambda
values of 1 and 4 for assets and for font kerning data.
Additional resources
I won’t share any example code as my implementation is tightly coupled with the asset preprocessor, but I’ll point you in the direction of some open source implementations:
- (LGPL/MPL) http://cmph.sourceforge.net/
- (GPL) http://www.gnu.org/software/gperf
- (Public domain) http://burtleburtle.net/bob/hash/perfect.html
- (MIT) https://github.com/wahern/phf