Padding is hard | Dave Cheney

We all know that the empty struct consumes no storage, right ? Here is a curious case where this turns out to not be true.

This shows the “after” graph, prior to CL 15522 each allocation was 320 bytes.

This is a story about trying to speed up the Go compiler. Since Go 1.5 we’ve had the great concurrent GC, which reduces the cost of garbage collection, but no matter how efficient a garbage collector is, it doesn’t make allocations free of cost.

The specific allocation I was targeting was the obj.Prog structure which represents a quasi machine code instruction late in the compile cycle. As you can see, for this compilation (which was cmd/compile/internal/gc) obj.Prog values constitute 34.72% of the total allocations for this program—reducing the size of obj.Prog will pay off in real terms.

obj.Prog itself contains a field of type obj.ProgInfo, which is what this blog post will focus on today. Here is the definition for obj.ProgInfo

type ProgInfo struct {
        Flags    uint32   // flag bits
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        _        struct{} // to prevent unkeyed literals
}

How much space will values of this type consume ? TL;DR, here is a table of the results:

	`GOARCH=386`	`GOARCH=amd64`
Go 1.4.x	28 bytes	32 bytes
Go 1.5.x	32 bytes	40 bytes

Go 1.4 32 bit says the value is 28 bytes large, but Go 1.4 64 bit says it’s 32 bytes large. Worse, Go 1.5 32 bit says the value is 32 bytes, and Go 1.5 64 bit says it’s 40 bytes! This type uses fixed sized integer types, but its total size changes every time we compile it. What the heck is going on here ?

Alignment and padding

The reason for the difference between the 32 bit and 64 bit answers is alignment. The Go spec says the address of a struct’s fields must be naturally aligned. So on a 64 bit machine, the address of any uint64 field must be a multiple of 8 bytes. In effect the compiler sees this:

type ProgInfo struct {
        Flags    uint32   // flag bits
        _        [4]byte  // padding added by compiler
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        _        struct{} // to prevent unkeyed literals
}

On 32 bit machines, word boundaries occur every 4 bytes, so there is no requirement to pad for alignment. Note that operations on 64 bit quantities may still require the value to be properly aligned, this is the infamous issue 599.

So, that explains the difference in results between 32 bit and 64 bit machines, Flags (4 bytes) + Reguse (8 bytes) + Regset (8 bytes) + Regindex (8 bytes) == 28 bytes on 32 bit machines, plus another 4 bytes for padding for 64 bit machines.

Well, except that Go 1.5 seems to be consuming another word over and above the Go 1.4 requirements, where is this memory going ?

To give you a hint, I’ll rearrange the definition slightly:

type ProgInfo struct {
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        Flags    uint32   // flag bits
        _        struct{} // to prevent unkeyed literals
}

Now the results are:

	`GOARCH=386`	`GOARCH=amd64`
Go 1.4.x	28 bytes	32 bytes
Go 1.5.x	32 bytes	32 bytes

Hmm, so some progress. We’ve managed to reduce the storage in the Go 1.5 64 bit case, but the others appear unchanged, especially Go 1.5’s 32 bit case.

For the Go 1.5 64 bit case, the padding that we saw above is still there, but it has moved. Effectively what the compiler is seeing is this:

type ProgInfo struct {
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        Flags    uint32   // flag bits
        _        [4]byte  // padding for alignment of ProgInfo
        _        struct{} // to prevent unkeyed literals
}

The reason this padding is still present is to preserve the alignment of the other fields in this structure. Remember that uint64 fields always have to be naturally aligned, every 4 bytes on 32 bit platforms, and every 8 bytes on 64 bit platforms. Consider this small fragment:

var pp [4]ProgInfo	
fmt.Println(&pp[1].Reguse) // address must be a multiple of 8 on 64 bit platforms

The compiler adds padding at the end of the structure to bring its size up to a multiple of the largest element in the structure.

So, that explains everything except for the Go 1.5 32 bit result. For 32 bit platforms, the structure needs to be aligned on a 4 byte boundary, not 8, so no padding should be required. The size under both Go 1.4 and Go 1.5 should be 28 bytes, not 32 bytes … what is going on ?

That empty struct

You’ve probably figured out by now that the only place the additional storage could come from is the unnamed struct{} field. What is it about the presence of an empty struct at the bottom of the type that causes it to increase the size of the struct ?

The answer is that while empty struct{} values consume no storage, you can take their address. That is, if you have a type

type T struct {
      X uint32
      Y struct{}
}

var t T

It is perfectly valid to take the address of t.Y, and as such, the address of t.Y would point beyond the end of the struct!¹

Now, Go code cannot do anything useful with this invalid pointer, but as it is a pointer, the garbage collector has to follow it, and that may lead it to access unmapped memory or dereference an invalid pointer.

The fix for this is detailed in issue 9401 and delivered in Go 1.5. In summary, zero sized values, like struct{} and [0]byte occurring at the end of a structure are assumed to have a size of one byte². With this knowledge, we can now explain the behaviour of the compiler for the rearranged type.

type ProgInfo struct {
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        Flags    uint32   // flag bits
        _        struct{} // to prevent unkeyed literals
        _        [1]byte  // to prevent unkeyed literals  
}

For a 64 bit compiler, it would observe that the largest field inside ProgInfo is 8 bytes, so it would add three additional bytes of padding after the [1]byte to round up to 32 bytes. This is fine, it was already planning to add 4 bytes after Flags.

For a 32 bit compiler, it would observe the largest field inside ProgInfo is 8 bytes, however the natural alignment of 32 bit machines is 4 bytes and the sum of the fields inside the type is 28 bytes, which is what Go 1.4 reported. However because the final field is of zero length, the compiler has replaced it with a [1]byte, causing the total size of the structure to be 29 bytes, forcing the compiler to add three more bytes of padding to round up to the next word boundary, 32 bytes.

Fortunately this last issue can be corrected by moving the zero field to the top of the structure, restoring the original sizes we observed in Go 1.4.

Wrapping up

Do Go programmers need to care about this minutiae ? Probably not. This is the stuff of brain teasers.

Certainly if you are looking to optimise the memory usage of your program, you need to be looking closely at the definitions of all your data structures, but it is unlikely that you’ll encounter the perfect storm of all these factors.

Not to be outdone by this morning’s excitement, Matthew Dempsky put together a quick tool to spot this inefficiency, and has found two other candidates.

One more thing

At the start of this piece I mentioned that CL 15522 resulted in a saving of 32 bytes, yet the biggest saving demonstrated above was 8 bytes. How did CL 15522 achieve such a reduction ? The answer is a thing known inside the runtime as size classes.

For small allocations, that is, less than a few kilobytes, the overhead of tracking the allocation becomes a significant percentage of the object itself. Imagine if every allocation had one word of overhead, every time you put an int into an interface³ value, you’d incur 100% overhead!

To reduce this overhead, rather than tracking every object on the heap individually, the runtime allocates things of the same size together. This leads to lower overhead—one bit per object of a certain size, and excellent space efficiency—all objects are of the same size, eliminating fragmentation.

However, as Go types can be any size, potentially the runtime would have to allocate a pool for every possible size (obviously not until they are actually allocated), and this would undo the efficiency of the mechanism as each pool is a number of machine pages, yet most pools would be largely unused.

The solution to this problem is to not create pools for every possible size, but to start to round up the allocation to the next largest size class, for example an allocation of 40 bytes may be rounded up to 48 bytes. The details of this mechanism are left to the curiosity of the reader⁴, but suffice to say, as allocations get larger, the distance between size classes increases.

Hence, by shaving 8 bytes from obj.ProgInfo, the size of the enclosing obj.Prog type decreased from 296 bytes to 288 bytes, bringing it down to from the previous 320 byte size class, to the 288 byte size class, a saving of 32 bytes on 64 bit platforms.

The explanation for why this is not a violation of Go’s memory safety guarantee is left as an exercise for the reader.
The explanation for why this does not break the semantics of Go programs is left as an exercise for the reader.
The explanation for why storing an int into an interface causes a heap allocation is left as an exercise for the reader.
runtime/msize.go

Alignment and padding

That empty struct

Wrapping up

One more thing

Related posts: