Avoiding Automatic Structure Padding in C
Note: This post is x86 centric. On other architectures your mileage may vary.
Compiling with the -Wpadded
flag on GCC/Clang or -we4820 -we4121
on MSVC, warns you when the compiler inserts padding in your structs
.
These warnings are off by default, because in most circumstances automatic padding is quite convenient, but if you are writing high performance code or embedded systems, padding explicitly and trying to avoid padding may be the better choice.
Why compiler padding is a thing
Compilers insert padding to keep data structures aligned, thus avoiding misaligned reads and making memory access faster.
The rules for when compilers insert padding depend on the target architecture’s word size and the size of each member field in relation to the following member as explained on “Typical alignment of C structs on x86”.
If the structure contains members with explicit alignment (i.e. types declared with __attribute__((aligned(n)))
on GCC/Clang or __declspec(align(n))
on MSVC), then the compiler will take that in consideration when padding.
Examples of automatic padding
Take the following structure:
It is compiled on x86_64 into an equivalent of the following:
The member field _COMPILER_PADDING
is not really there, but it is as if it was there, because the compiler inserted the 3 bytes of padding between X
and Y
. If you compile the struct yourself and check sizeof(Paddee)
you will see that it is 8 bytes, not 5.
The padding is also added at the tail end of the struct
, so if you rearrange the members the compiler will pad like so:
For a more realistic example, in the game engine for Rival Fortress I have a 4x4 matrix type that looks something like this:
The MS_ALIGN
and GCC_ALIGN
expand to GCC/Clang or MSVC alignment macros. They tell the compiler that this type should always be aligned to 16 byte boundaries (in order to play nice with SSE instructions).
Because of the forced alignment constraint every other type that includes MPEMatrix4
as a member will inherit its alignment requirement. For example, the following dummy type:
Requires 12 bytes of padding(!) in order to be properly aligned. This translates into a lot of wasted memory bandwidth when dealing with arrays with thousands of entries, like for example entities in a game.
The solution, in an extreme case like this, is to either rethink the MPEExample
and move the Flags
field somewhere else (maybe a parallel array that you loop through before or after), or fill the 12 empty bytes with useful data in order to eliminate the wasted memory.
Tips for eliminating compiler padding
You avoid automatic padding by making the compiler happy and aligning your structures optimally. The -Wpadded
compiler flag is your guide in knowing when a structure needs better alignment.
The common ways to align structures manually are:
- Rearrange fields: reorder the field in order to maximize packing. I don’t know about GCC and MSVC, but Clang warns you about misaligned fields, so you know where the problem is.
- Group small types: grouping
char
s,short
s andint
s after larger types, like pointers, can lead to better alignment. - Use smaller/bigger types: when possible choose different primitive types, like a
uint16_t
instead of anint
, or asize_t
instead of anuint32_t
. You can then tie this back to the previous tip about grouping small types. - Insert dummy fields: the last resort is to insert fields in your
structs
between members that require alignment or at the end of thestruct
. This is what the compiler does, but by doing yourself you have a reminder that you can act on when you modify thestruct
. I usually add a byte array named_PADDING
of the size required to reach alignment.
Keep in mind that for large structures, cache line size becomes relevant, so try not to not break groups of fields that you want pulled into the same cache line when restructuring your structs
.
The excellent post The Lost Art of C Structure Packing goes in detail on how to optimally pack structures in C.
What to do when you can’t align, but don’t want padding
Padding is not always a good thing. For example when serializing data types to disk or over the network, it is often better to keep structures tightly packed even if it causes unaligned memory reads.
To selectively disable padding you can use the #pragma pack
directive like so:
This will disable compiler padding and keep sizeof(MPEExample)
equal to the sum of the sizes of its members (in this case 20 bytes, instead of 32).