Avoiding Automatic Structure Padding in C

Note: This post is x86 centric. On other architectures your mileage may vary.

Compiling with the -Wpadded flag on GCC/Clang or -we4820 -we4121 on MSVC, warns you when the compiler inserts padding in your structs.

These warnings are off by default, because in most circumstances automatic padding is quite convenient, but if you are writing high performance code or embedded systems, padding explicitly and trying to avoid padding may be the better choice.

Why compiler padding is a thing

Compilers insert padding to keep data structures aligned, thus avoiding misaligned reads and making memory access faster.

The rules for when compilers insert padding depend on the target architecture’s word size and the size of each member field in relation to the following member as explained on “Typical alignment of C structs on x86”.

If the structure contains members with explicit alignment (i.e. types declared with __attribute__((aligned(n))) on GCC/Clang or __declspec(align(n)) on MSVC), then the compiler will take that in consideration when padding.

Examples of automatic padding

Take the following structure:

typedef struct Paddee
{
  char X; // 1 byte
  int Y;  // 4 bytes
} Paddee;

It is compiled on x86_64 into an equivalent of the following:

typedef struct Paddee
{
  char X;                    // 1 byte
  char _COMPILER_PADDING[3]; // 3 bytes (invisible)
  int Y;                     // 4 bytes
} Paddee;

The member field _COMPILER_PADDING is not really there, but it is as if it was there, because the compiler inserted the 3 bytes of padding between X and Y. If you compile the struct yourself and check sizeof(Paddee) you will see that it is 8 bytes, not 5.

The padding is also added at the tail end of the struct, so if you rearrange the members the compiler will pad like so:

typedef struct Paddee
{
  int Y;                     // 4 bytes
  char X;                    // 1 byte
  char _COMPILER_PADDING[3]; // 3 bytes (invisible)
} Paddee;

For a more realistic example, in the game engine for Rival Fortress I have a 4x4 matrix type that looks something like this:

MS_ALIGN(16) typedef union MPEMatrix4
{
  struct
  {
    MPEVec4 Col1;
    MPEVec4 Col2;
    MPEVec4 Col3;
    MPEVec4 Col4;
  };
  struct
  {
    SIMDVec SIMDCol1;
    SIMDVec SIMDCol2;
    SIMDVec SIMDCol3;
    SIMDVec SIMDCol4;
  };
  MPEVec4 VecColumns[4];
  SIMDVec SIMDColumns[4];
  f32 Flat[16];
} MPEMatrix4 GCC_ALIGN(16);

The MS_ALIGN and GCC_ALIGN expand to GCC/Clang or MSVC alignment macros. They tell the compiler that this type should always be aligned to 16 byte boundaries (in order to play nice with SSE instructions).

Because of the forced alignment constraint every other type that includes MPEMatrix4 as a member will inherit its alignment requirement. For example, the following dummy type:

struct MPEExample
{
  MPEMatrix4 Model; // 16 bytes
  u32 Flags;        // 4 bytes
  u8 _PADDING[12];  // 12 bytes
};

Requires 12 bytes of padding(!) in order to be properly aligned. This translates into a lot of wasted memory bandwidth when dealing with arrays with thousands of entries, like for example entities in a game.

The solution, in an extreme case like this, is to either rethink the MPEExample and move the Flags field somewhere else (maybe a parallel array that you loop through before or after), or fill the 12 empty bytes with useful data in order to eliminate the wasted memory.

Tips for eliminating compiler padding

You avoid automatic padding by making the compiler happy and aligning your structures optimally. The -Wpadded compiler flag is your guide in knowing when a structure needs better alignment.

The common ways to align structures manually are:

Rearrange fields: reorder the field in order to maximize packing. I don’t know about GCC and MSVC, but Clang warns you about misaligned fields, so you know where the problem is.
Group small types: grouping chars, shorts and ints after larger types, like pointers, can lead to better alignment.
Use smaller/bigger types: when possible choose different primitive types, like a uint16_t instead of an int, or a size_t instead of an uint32_t. You can then tie this back to the previous tip about grouping small types.
Insert dummy fields: the last resort is to insert fields in your structs between members that require alignment or at the end of the struct. This is what the compiler does, but by doing yourself you have a reminder that you can act on when you modify the struct. I usually add a byte array named _PADDING of the size required to reach alignment.

Keep in mind that for large structures, cache line size becomes relevant, so try not to not break groups of fields that you want pulled into the same cache line when restructuring your structs.

The excellent post The Lost Art of C Structure Packing goes in detail on how to optimally pack structures in C.

What to do when you can’t align, but don’t want padding

Padding is not always a good thing. For example when serializing data types to disk or over the network, it is often better to keep structures tightly packed even if it causes unaligned memory reads.

To selectively disable padding you can use the #pragma pack directive like so:

#pragma pack(push, 1)
struct MPEExample
{
  MPEMatrix4 Model; // 16 bytes
  u32 Flags;        // 4 bytes
  // No padding
};
#pragma pack(pop)

This will disable compiler padding and keep sizeof(MPEExample) equal to the sum of the sizes of its members (in this case 20 bytes, instead of 32).

Metric Panda Games

One pixel at a time.

Avoiding Automatic Structure Padding in C

Why compiler padding is a thing

Examples of automatic padding

Tips for eliminating compiler padding

What to do when you can’t align, but don’t want padding