Micro-profiling game code
Most game programmers like to have quick access to graphical performance charts in order to keep an eye on potential performance bottlenecks while developing or testing the game.
Every off-the-shelf game engine offers some kind of profiling HUD that you can enable while developing and, if you are rolling your own engine, there are many open source libraries you can quickly plug into your game (I suggest taking a look at Sean Barret’s IProf if you want something lightweight).
When you find a bottleneck the right way to optimize it is not always clear right away, and it often comes down to trial and error.
The approach I tend to use when I’m doing a performance tuning pass over the codebase is to isolate problem spots into small, self-contained, snippets of code and profile them in isolation using Micro-benchmarking.
Micro-benchmarking is a very useful tool for performance tuning algorithms or other self-contained code.
I find that being able to quickly iterate over the piece of code and benchmarking it against the original implementation is often eye-opening. Modern CPUs are complex beasts, and sometimes what we may think is the most optimal implementation may not actually be as fast as a maybe naive implementation.
The Good of micro-benchmarking
Micro-benchmarking is perfect for exploratory-type coding.
As an example, I recently needed to tune the string hashing function used by the asset system of Rival Fortress. Most string hashing is done offline, but during runtime the game has to hash strings coming in from Lua game mods.
After trying several well known hashing algorithms, I hacked together a custom version of the MurmurHash that gave me a significantly better performance characteristic on the dataset used by the game.
The Bad of micro-benchmarking
Micro-benchmarking isn’t so good at giving you the big picture.
Factors that come into play when many subsystems are interacting, like cache locality and memory access, often get lost or skewed when you focus in on a small piece of code.
So it’s important to keep in the back of your mind the context in which the code you are trying to optimize will be executing.
Google benchmark
My go-to library for micro-benchmarking is Google benchmark. Google benchmark is an open source library written in C++ that allows you to quickly run benchmarks over snippets of C/C++ code. Take a look at the Github repository examples.
The excellent talk at CppCon 2015 by Chandler Carruth Tuning C++: Benchmarks, and CPUs, and Compilers! goes into some detail on how to use Google benchmark understand profiler output and how to use perf on Linux to dig deeper into optimized C++ executables.