Padding is hard

We all know that the empty struct consumes no storage, right ? Here is a curious case where this turns out to not be true.

Memory Profile

This shows the “after” graph, prior to CL 15522 each allocation was 320 bytes.

This is a story about trying to speed up the Go compiler. Since Go 1.5 we’ve had the great concurrent GC, which reduces the cost of garbage collection, but no matter how efficient a garbage collector is, it doesn’t make allocations free of cost.

The specific allocation I was targeting was the obj.Prog structure which represents a quasi machine code instruction late in the compile cycle. As you can see, for this compilation (which was cmd/compile/internal/gc) obj.Prog values constitute 34.72% of the total allocations for this program—reducing the size of obj.Prog will pay off in real terms.

obj.Prog itself contains a field of type obj.ProgInfo, which is what this blog post will focus on today. Here is the definition for obj.ProgInfo

type ProgInfo struct {
        Flags    uint32   // flag bits
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        _        struct{} // to prevent unkeyed literals

How much space will values of this type consume ? TL;DR, here is a table of the results:

Go 1.4.x 28 bytes 32 bytes
Go 1.5.x 32 bytes 40 bytes

Go 1.4 32 bit says the value is 28 bytes large, but Go 1.4 64 bit says it’s 32 bytes large. Worse, Go 1.5 32 bit says the value is 32 bytes, and Go 1.5 64 bit says it’s 40 bytes! This type uses fixed sized integer types, but its total size changes every time we compile it. What the heck is going on here ?


Alignment and padding

The reason for the difference between the 32 bit and 64 bit answers is alignment. The Go spec says the address of a struct’s fields must be naturally aligned. So on a 64 bit machine, the address of any uint64 field must be a multiple of 8 bytes. In effect the compiler sees this:

type ProgInfo struct {
        Flags    uint32   // flag bits
        _        [4]byte  // padding added by compiler
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        _        struct{} // to prevent unkeyed literals

On 32 bit machines, word boundaries occur every 4 bytes, so there is no requirement to pad for alignment. Note that operations on 64 bit quantities may still require the value to be properly aligned, this is the infamous issue 599.

So, that explains the difference in results between 32 bit and 64 bit machines, Flags (4 bytes) + Reguse (8 bytes) + Regset (8 bytes) + Regindex (8 bytes) == 28 bytes on 32 bit machines, plus another 4 bytes for padding for 64 bit machines.

Well, except that Go 1.5 seems to be consuming another word over and above the Go 1.4 requirements, where is this memory going ?

To give you a hint, I’ll rearrange the definition slightly:

type ProgInfo struct {
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        Flags    uint32   // flag bits
        _        struct{} // to prevent unkeyed literals

Now the results are:

Go 1.4.x 28 bytes 32 bytes
Go 1.5.x 32 bytes 32 bytes

Hmm, so some progress. We’ve managed to reduce the storage in the Go 1.5 64 bit case, but the others appear unchanged, especially Go 1.5’s 32 bit case.

For the Go 1.5 64 bit case, the padding that we saw above is still there, but it has moved. Effectively what the compiler is seeing is this:

type ProgInfo struct {
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        Flags    uint32   // flag bits
        _        [4]byte  // padding for alignment of ProgInfo
        _        struct{} // to prevent unkeyed literals

The reason this padding is still present is to preserve the alignment of the other fields in this structure. Remember that uint64 fields always have to be naturally aligned, every 4 bytes on 32 bit platforms, and every 8 bytes on 64 bit platforms. Consider this small fragment:

var pp [4]ProgInfo	
fmt.Println(&pp[1].Reguse) // address must be a multiple of 8 on 64 bit platforms

The compiler adds padding at the end of the structure to bring its size up to a multiple of the largest element in the structure.

So, that explains everything except for the Go 1.5 32 bit result. For 32 bit platforms, the structure needs to be aligned on a 4 byte boundary, not 8, so no padding should be required. The size under both Go 1.4 and Go 1.5 should be 28 bytes, not 32 bytes … what is going on ?

That empty struct

You’ve probably figured out by now that the only place the additional storage could come from is the unnamed struct{} field. What is it about the presence of an empty struct at the bottom of the type that causes it to increase the size of the struct ?

The answer is that while empty struct{} values consume no storage, you can take their address. That is, if you have a type

type T struct {
      X uint32
      Y struct{}

var t T

It is perfectly valid to take the address of t.Y, and as such, the address of t.Y would point beyond the end of the struct!1

Now, Go code cannot do anything useful with this invalid pointer, but as it is a pointer, the garbage collector has to follow it, and that may lead it to access unmapped memory or dereference an invalid pointer.

The fix for this is detailed in issue 9401 and delivered in Go 1.5. In summary, zero sized values, like struct{} and [0]byte occurring at the end of a structure are assumed to have a size of one byte2. With this knowledge, we can now explain the behaviour of the compiler for the rearranged type.

type ProgInfo struct {
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        Flags    uint32   // flag bits
        _        struct{} // to prevent unkeyed literals
        _        [1]byte  // to prevent unkeyed literals  

For a 64 bit compiler, it would observe that the largest field inside ProgInfo is 8 bytes, so it would add three additional bytes of padding after the [1]byte to round up to 32 bytes. This is fine, it was already planning to add 4 bytes after Flags.

For a 32 bit compiler, it would observe the largest field inside ProgInfo is 8 bytes, however the natural alignment of 32 bit machines is 4 bytes and the sum of the fields inside the type is 28 bytes, which is what Go 1.4 reported. However because the final field is of zero length, the compiler has replaced it with a [1]byte, causing the total size of the structure to be 29 bytes, forcing the compiler to add three more bytes of padding to round up to the next word boundary, 32 bytes.

Fortunately this last issue can be corrected by moving the zero field to the top of the structure, restoring the original sizes we observed in Go 1.4.

Wrapping up

Do Go programmers need to care about this minutiae ? Probably not. This is the stuff of brain teasers.

Certainly if you are looking to optimise the memory usage of your program, you need to be looking closely at the definitions of all your data structures, but it is unlikely that you’ll encounter the perfect storm of all these factors.

Not to be outdone by this morning’s excitement, Matthew Dempsky put together a quick tool to spot this inefficiency, and has found two other candidates.

One more thing

At the start of this piece I mentioned that CL 15522 resulted in a saving of 32 bytes, yet the biggest saving demonstrated above was 8 bytes. How did CL 15522 achieve such a reduction ? The answer is a thing known inside the runtime as size classes.

For small allocations, that is, less than a few kilobytes, the overhead of tracking the allocation becomes a significant percentage of the object itself. Imagine if every allocation had one word of overhead, every time you put an int into an interface3 value, you’d incur 100% overhead!

To reduce this overhead, rather than tracking every object on the heap individually, the runtime allocates things of the same size together. This leads to lower overhead—one bit per object of a certain size, and excellent space efficiency—all objects are of the same size, eliminating fragmentation.

However, as Go types can be any size, potentially the runtime would have to allocate a pool for every possible size (obviously not until they are actually allocated), and this would undo the efficiency of the mechanism as each pool is a number of machine pages, yet most pools would be largely unused.

The solution to this problem is to not create pools for every possible size, but to start to round up the allocation to the next largest size class, for example an allocation of 40 bytes may be rounded up to 48 bytes. The details of this mechanism are left to the curiosity of the reader4, but suffice to say, as allocations get larger, the distance between size classes increases.

Hence, by shaving 8 bytes from obj.ProgInfo, the size of the enclosing obj.Prog type decreased from 296 bytes to 288 bytes, bringing it down to from the previous 320 byte size class, to the 288 byte size class, a saving of 32 bytes on 64 bit platforms.

  1. The explanation for why this is not a violation of Go’s memory safety guarantee is left as an exercise for the reader.
  2. The explanation for why this does not break the semantics of Go programs is left as an exercise for the reader.
  3. The explanation for why storing an int into an interface causes a heap allocation is left as an exercise for the reader.
  4. runtime/msize.go

Building Go 1.5 on the Raspberry Pi

This is a short post to describe my recommended method for building Go on the Raspberry Pi. This method has been tested on the Raspberry Pi 2 Model B (900Mhz, 1Gb ram) and the older Raspberry Pi 1 Model B+ (700Mhz, 512Mb ram).

This method will build Go 1.5 into you home directory, $HOME/go.

As always, please don’t set $GOROOT. You never need to set $GOROOT when building from source.

Step 1. Getting the bootstrap Go 1.4 compiler

Go 1.5 requires an existing Go 1.4 (or later) compiler to build Go 1.5. If you have built Go from source on your host machine you can generate this tarball directly, but to save time I’ve done it for you.

% cd $HOME
% curl | tar xj

Step 2. Fetch the Go 1.5 source

Fetch the Go 1.5 source tarball and unpack it to $HOME/go

% cd $HOME
% curl | tar xz

Step 3. Configure your environment and build

Go 1.5 builds cleanly on arm devices, this is verified by the build dashboard, however if you want to see ./all.bash pass on the Raspberry Pi, some additional configuration is recommended.

Lower the default stack size from 8mb to 1mb.

This is necessary because the runtime tests create many native operating system threads which at 8mb per thread can exhaust the 32bit user mode address space (especially if you are running a recent Raspbian kernel). See issue 11959 for the details.

% ulimit -s 1024     # set the thread stack limit to 1mb
% ulimit -s          # check that it worked

Increase the scaling factor to avoid test timeouts.

The default scaling factor is good for powerful amd64 machines, but is too aggressive for small 32 bit machines. This is done with the GO_TEST_TIMEOUT_SCALE environment variable.

Step 4. Build

% cd $HOME/go/src
% env GO_TEST_TIMEOUT_SCALE=10 GOROOT_BOOTSTRAP=$HOME/go-linux-arm-bootstrap ./all.bash
# Building C bootstrap tool.

# Building compilers and Go bootstrap tool for host, linux/arm.
##### ../test

##### API check
Go version is "go1.5", ignoring -next /home/pi/go/api/next.txt


Installed Go for linux/arm in /home/pi/go
Installed commands in /home/pi/go/bin

On the Raspberry Pi 2 Model B, this should take around an hour, for the older Raspberry Pi 1 Model B+, it takes more than five!

Alternatively, you can substitute ./make.bash above to skip running the tests, which constitutes more than half of the time — or you could cross compile your program from another computer.

As a final step you should add $HOME/go to your $PATH, and to save disk space you can remove $HOME/go-linux-arm-bootstrap.

Cross compilation with Go 1.5

Now that Go 1.5 is out, lots of gophers are excited to try the much improved cross compilation support. For some background on the changes to the cross compilation story you can read my previous post, or Rakyll’s excellent follow up piece.

I’ll assume that you are using the binary version of Go 1.5, as distributed from the Go website. If you are using Go 1.5 from your operating system’s distribution, or homebrew, the process will be the same, however paths may differ slightly.

How to cross compile

To cross compile a Go program using Go 1.5 the process is as follows:

  1. set GOOS and GOARCH to be the values for the target operating system and architecture.
  2. run go build -v YOURPACKAGE

If the compile is successful you’ll have a binary called YOURPACKAGE (possibly with a .exe extension if you’re targeting Windows) in your current working directory.

-o may be used to alter the name and destination of your binary, but remember that go build takes a value that is relative to your $GOPATH/src, not your working directory, so changing directories then executing the go build command is also an option.


I prefer to combine the two steps outlined above into one, like this:

% env GOOS=linux GOARCH=arm go build -v
% file ./gb
./gb: ELF 32-bit LSB  executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped

and that’s all there is to it.

Using go build vs go install

When cross compiling, you should use go build, not go install. This is the one of the few cases where go build is preferable to go install.

The reason for this is go install always caches compiled packages, .a files, into the pkg/ directory that matches the root of the source code.

For example, if you are building $GOPATH/src/, then the compiled package will be installed into $GOPATH/pkg/$GOOS_$GOARCH/

This logic also holds true for the standard library, which lives in /usr/local/go/src, so will be compiled to /usr/local/go/pkg/$GOOS_$GOARCH. This is a problem, because when cross compiling the go tool needs to rebuild the standard library for your target, but the binary distribution expects that /usr/local/go is not writeable.

Using go build rather that go install is the solution here, because go build builds, then throws away most of the result (rather than caching it for later), leaving you with the final binary in the current directory, which is most likely writeable by you.

Ugh, this is really slow!

In the procedure described above, cross compilation always rebuilds that standard library for the target every time. Depending on your workflow this is either not an issue, or a really big issue. If it’s the latter then I recommend you remove the binary distribution and build from source into a path that is writeable by you, then you’ll have the full gamut of go commands available to you.

Cross compilation support in gb is being actively developed and will not have this restriction.

What about GOARM?

The go tool chooses a reasonable value for GOARM by default. You should not change this unless you have a good reason.

But but, what about GOARM=7?

Sure, knock yourself out, but that means your program won’t run on all models of the Raspberry Pi. The difference between GOARM=6 (the default) and GOARM=7 is enabling a few more floating point registers, and a few more operations that allow floating point double values to be passed to and from the ARMv7 (VPFv3) floating point co processor more efficiently. IMO, with the current Go 1.5 arm compiler, it’s not worth the bother.

Using gb as my every day build tool

For the longest time I had this alias in my .bashrc

alias gb='go install -v'

as an homage to John Asmuth’s gb tool which I was very fond of way back before we had the go tool.

Once gb was written, I had to remove that alias and live in a world where I used the go tool and gb concurrently. Now, with the improvements that have landed since GopherCon I’ve decided it’s time to get serious about eating my own dog food and use gb as a full time replacement for the go tool.

I use gb not just to build gb projects, but to work on Juju, my $DAYJOB, as well as using gb to develop gb itself. Because the definition of a gb project is backwards compatible with $GOPATH I don’t even need to rewrite every Go package I want to work on as a gb project, I can work with them with their source happily in my $GOPATH.

gb isn’t perfect yet, there are still some major missing pieces. I hope by aggressively dogfooding I’ll be able to close the gaps before the end of the year. Specifically:

  • gb build works well, but the gb test story is less solid; I hope to reach parity with go test in the next few months by adding support for test flags, -race and coverage options.
  • Support for cross compilation was always high on my list, and is being actively worked on, and will be available to experiment within the next two weeks. At this point I envisage it will be Go 1.5 only, Go 1.4 support may come in time, but is not a high priority.

If others want to join me and make gb their default Go build tool, I’d welcome the company, and of course, your bug reports.

Performance without the event loop

This article is also available in Japanese, イベントループなしでのハイパフォーマンス – C10K問題へのGoの回答

This article is based on a presentation I gave earlier this year at OSCON. It has been edited for brevity and to address some of the points of feedback I received after the talk.

A common refrain when talking about Go is it’s a language that works well on the server; static binaries, powerful concurrency, and high performance.

This article focuses on the last two items, how the language and the runtime transparently let Go programmers write highly scalable network servers, without having to worry about thread management or blocking I/O.

An argument for an efficient programming language

But before I launch into the technical discussion, I want to make two arguments to illustrate the market that Go targets.

Moore’s Law

The oft mis quoted Moore’s law states that the number of transistors per square inch doubles roughly every 18 months.

However clock speeds, which are a function of entirely different properties, topped out a decade ago with the Pentium 4 and have been slipping backwards ever since.

From space constrained to power constrained

Sun e450

Sun Enterprise e450—about the size of a bar fridge, about the same power consumption. Image credit: eBay

This is the Sun e450. When I started my career, these were the workhorses of the industry.

These things were massive. Three of them, stacked one on top of another would consume an entire 19″ Rack. They only consumed about 500 Watts each.

Over the last decade, data centres have moved from space constrained, to power constrained. The last two data centre rollouts I was involved in, we ran out of power when the rack was barely 1/3rd full.

Because compute densities have improved so rapidly, data centre space is no longer a problem. However, modern servers consume significantly more power, in a much smaller area, making cooling harder yet at the same time critical.

Being power constrained has effects at the macro level—you can’t get enough power for a rack of 1200 Watt 1RU servers—and at the micro level, all this power, hundreds of watts, is being dissipated in a tiny silicon die.

Where does this power consumption come from ?


CMOS Inverter. Image credit: Wikipedia

This is an inverter, one of the simplest logic gates possible. If the input, A, is high, then the output, Q, will be low, and vice versa.

All today’s consumer electronics are built with CMOS logic. CMOS stands for Complementary Metal Oxide Semiconductor. The complementary part is the key. Each logic element inside the CPU is implemented with a pair of transistors, as one switches on, another switches off.

When the circuit is on or off, no current flows directly from the source to the drain. However, during transition there is a brief period where both transistors are conducting, creating a direct short.

Power consumption, and thus heat dissipation, is directly proportional to number of transition per second—the cpu clock speed1.

CPU feature size reductions are primarily aimed at reducing power consumption. Reducing power consumption doesn’t just mean “green”. The primary goal is to keep power consumption, and thus heat dissipation, below levels that will damage the CPU.

With clock speeds falling, and in direct conflict with power consumption, performance improvements come mainly from microarchitecture tweaks and esoteric vector instructions, which are not directly useful for the general computation. Added up, each microarchitecture (5 year cycle) change yields at most 10% improvement per generation, and more recently barely 4-6%.

“The free lunch is over”

Hopefully it is clear to you now that hardware is not getting any faster. If performance and scale are important to you, then you’ll agree with me that the days of throwing hardware at the problem are over, at least in the conventional sense. As Herb Sutter put it “The free lunch is over”.

You need a language which is efficient, because inefficient languages just do not justify themselves in production, at scale, on a capital expenditure basis.

An argument for a concurrent programming language

My second argument follows from my first. CPUs are not getting faster, but they are getting wider. This is where the transistors are going and it shouldn’t be a great surprise.

credit intel

Image credit: Intel

Simultaneous multithreading, or as Intel calls it hyper threading allows a single core to execute multiple instruction streams in parallel with the addition of a modest amount of hardware. Intel uses hyper threading to artificially segment the market for processors, Oracle and Fujitsu apply hyper threading more aggressively to their products using 8 or 16 hardware threads per core.

Dual socket has been a reality since the late 1990s with the Pentium Pro and is now mainstream with most servers supporting dual or quad socket designs. Increasing transistor counts have allowed the entire CPU to be co-located with siblings on the same chip. Dual core on mobile parts, quad core on desktop parts, even more cores on server parts are now the reality. You can buy effectively as many cores in a server as your budget will allow.

And to take advantage of these additional cores, you need a language with a solid concurrency story.

Processes, threads and goroutines

Go has goroutines which are the foundation for its concurrency story. I want to step back for a moment and explore the history that leads us to goroutines.


In the beginning, computers ran one job at a time in a batch processing model. In the 60’s a desire for more interactive forms of computing lead to the development of multiprocessing, or time sharing, operating systems. By the 70’s this idea was well established for network servers, ftp, telnet, rlogin, and later Tim Burners-Lee’s CERN httpd, handled each incoming network connections by forking a child process.

In a time-sharing system, the operating systems maintains the illusion of concurrency by rapidly switching the attention of the CPU between active processes by recording the state of the current process, then restoring the state of another. This is called context switching.

Context switching


Image credit: Immae (CC BY-SA 3.0)

There are three main costs of a context switch.

  • The kernel needs to store the contents of all the CPU registers for that process, then restore the values for another process. Because a process switch can occur at any point in a process’ execution, the operating system needs to store the contents of all of these registers because it does not know which are currently in use 2.
  • The kernel needs to flush the CPU’s virtual address to physical address mappings (TLB cache) 3.
  • Overhead of the operating system context switch, and the overhead of the scheduler function to choose the next process to occupy the CPU.

These costs are relatively fixed by the hardware, and depend on the amount of work done between context switches to amortise their cost—rapid context switching tends to overwhelm the amount of work done between context switches.


This lead to the development of threads, which are conceptually the same as processes, but share the same memory space. As threads share address space, they are lighter to schedule than processes, so are faster to create and faster to switch between.

Threads still have an expensive context switch cost; a lot of state must be retained. Goroutines take the idea of threads a step further.


Rather than relying on the kernel to manage their time sharing, goroutines are cooperatively scheduled. The switch between goroutines only happens at well defined points, when an explicit call is made to the Go runtime scheduler. The major points where a goroutine will yield to the scheduler include:

  • Channel send and receive operations, if those operations would block.
  • The go statement, although there is no guarantee that new goroutine will be scheduled immediately.
  • Blocking syscalls like file and network operations.
  • After being stopped for a garbage collection cycle.

In other words, places where the goroutine cannot continue until it has more data, or more space to put data.

Many goroutines are multiplexed onto a single operating system thread by the Go runtime. This makes goroutines cheap to create and cheap to switch between. Tens of thousands of goroutines in a single process are the norm, hundreds of thousands are not unexpected.

From the point of view of the language, scheduling looks like a function call, and has the same semantics. The compiler knows the registers which are in use and saves them automatically. A thread calls into the scheduler holding a specific goroutine stack, but may return with a different goroutine stack. Compare this to threaded applications, where a thread can be preempted at any time, at any instruction.

This results in relatively few operating system threads per Go process, with the Go runtime taking care of assigning a runnable goroutine to a free operating system thread.

Stack management

In the previous section I discussed how goroutines reduce the overhead of managing many, sometimes hundreds of thousands of concurrent threads of execution. There is another side to the goroutine story, and that is stack management.

Process address space

processThis is a diagram of the typical memory layout of a process. The key thing we are interested in is the locations of the heap and the stack.

Inside the address space of a process, traditionally the heap is at the bottom of memory, just above the program code and grows upwards.

The stack is located at the top of the virtual address space, and grows downwards.

guard-pageBecause the heap and stack overwriting each other would be catastrophic, the operating system arranges an area of inaccessible memory between the stack and the heap.

This is called a guard page, and effectively limits the stack size of a process, usually in the order of several megabytes.

Thread stacks

threadsThreads share the same address space, so for each thread, it must have its own stack and its own guard page.

Because it is hard to predict the stack requirements of a particular thread, a large amount of memory must be reserved for each thread’s stack. The hope is that this will be more than needed and the guard page will never be hit.

The downside is that as the number of threads in your program increases, the amount of available address space is reduced.

Goroutine stack management

The early process model allowed the programmer to view the heap and the stack as large enough to not be a concern. The downside was a complicated and expensive subprocess model.

Threads improved the situation a bit, but require the programmer to guess the most appropriate stack size; too small, your program will abort, too large, you run out of virtual address space.

We’ve seen that the Go runtime schedules a large number of goroutines onto a small number of threads, but what about the stack requirements of those goroutines ?

Goroutine stack growth

stack-growth Each goroutine starts with an small stack, allocated from the heap. The size has fluctuated over time, but in Go 1.5 each goroutine starts with a 2k allocation.

Instead of using guard pages, the Go compiler inserts a check as part of every function call to test if there is sufficient stack for the function to run. If there is sufficient stack space, the function runs as normal.

If there is insufficient space, the runtime will allocate a larger stack segment on the heap, copy the contents of the current stack to the new segment, free the old segment, and the function call is restarted.

Because of this check, a goroutine’s initial stack can be made much smaller, which in turn permits Go programmers to treat goroutines as cheap resources. Goroutine stacks can also shrink if a sufficient portion remains unused. This is handled during garbage collection.

Integrated network poller

In 2002 Dan Kegel published what he called the c10k problem. Simply put, how to write server software that can handle at least 10,000 TCP sessions on the commodity hardware of the day. Since that paper was written, conventional wisdom has suggested that high performance servers require native threads, or more recently, event loops.

Threads carry a high overhead in terms of scheduling cost and memory footprint. Event loops ameliorate those costs, but introduce their own requirements for a complex, callback driven style.

Go provides programmers the best of both worlds.

Go’s answer to c10k

In Go, syscalls are usually blocking operations, this includes reading and writing to file descriptors. The Go scheduler handles this by finding a free thread or spawning another to continue to service goroutines while the original thread blocks. In practice this works well for file IO as a small number of blocking threads can quickly exhaust your local IO bandwidth.

However for network sockets, by design at any one time almost all of your goroutines are going to be blocked waiting for network IO. In a naive implementation this would require as many threads as goroutines, all blocked waiting on network traffic. Go’s integrated network poller handles this efficiently due to the cooperation between the runtime and net packages.

In older versions of Go, the network poller was a single goroutine that was responsible for polling for readiness notification using kqueue or epoll. The polling goroutine would communicate back to waiting goroutines via a channel. This achieved the goal of avoiding a thread per syscall overhead, but used a generalised wakeup mechanism of channel sends. This meant the scheduler was not aware of the source or importance of the wakeup.

In current versions of Go, the network poller has been integrated into the runtime itself. As the runtime knows which goroutine is waiting for the socket to become ready it can put the goroutine back on the same CPU as soon as the packet arrives, reducing latency and increasing throughput.

Goroutines, stack management, and an integrated network poller

In conclusion, goroutines provide a powerful abstraction that free the programmer from worrying about thread pools or event loops.

The stack of a goroutine is as big as it needs to be without being concerned about sizing thread stacks or thread pools.

The integrated network poller lets Go programmers avoid convoluted callback styles while still leveraging the most efficient IO completion logic available from the operating system.

The runtime make sure that there will be just enough threads to service all your goroutines and keep your cores active.

And all of these features are transparent to the Go programmer.


  1. CMOS power consumption is not only caused by the short circuit current when the circuit is switching. Additional power consumption comes from charging the output capacitance of the gate, and leakage current through the MOSFET gate increases as the size of the transistor decreases. You can read more about this from in a the lecture materials from CMU’s ECE322 course. Bill Herd has a published a series of articles on how CMOS works.
  2. This is an oversimplification. In some cases the operating system can avoid saving and restoring infrequently used architectural registers by starting the the process in a mode where access to floating point or MMX/SSE registers will cause the program to fault, thereby informing the kernel that the process will now use those registers and it should from then on save and restore them.
  3. Some CPUs have what is known as a tagged TLB. In the case of tagged TLB support the operating system can tell the processor to associate particular TLB cache entries with an identifier, derived from the process ID, rather than treating each cache entry as global. The upside is this avoids flushing out entries on each process switch if the process is placed back on the same CPU in short order.

Why Go and Rust are not competitors

This is a short blog post explaining why I believe that Go and Rust are not competitors.

Why people think Rust and Go are competitors

To explain why I think Rust and Go are not competitors, I want to to lay out the reasons why I think the question is being asked in the first place.

  • Rust and Go were announced around the same time. Go was conceived in 2007 and became public in November of 2009. Rust appeared a few months later in 2010, although Graydon hints that Rust may have been conceived much earlier. In either case, both languages have a distinguished pedigree of influential predecessors. In the case of Go, Hoare’s CSP, Alef, and Pike’s Newsqueak. Rust is viewed as an extension of the ML family of languages.
  • Rust and Go are both touted as memory safe. While this statement is absolutely true, both languages will not tolerate unsafe memory access, what is more important is that the world will not tolerate any new language which is not memory safe. It just so happens that Go and Rust are the first two languages to emerge after decades of evidence that in the real world, programmers as a whole, cannot safely manage memory manually.
  • Both are young languages, Go achieving 1.0 status in 2012, and Rust earlier this year, implying ambition and upward mobility towards a space occupied by incumbent languages.

These are the arguments that I see used to justify why Rust and Go are competitors. Laying them out like this, it’s hard to find them compelling, indeed they appear circumstantial.

Why I think rust and Go are not competitors

So, why do I think that Go and Rust are not competitors ?

  • Rust is focused on “free of charge” abstractions. If this sounds familiar, this has been the catch cry for C++ for decades. As a language which defers many low level actions to its runtime, Go sacrifices some performance for its goals of simplicity and orthogonality.
  • Rust is designed for interoperability with C; Rust code is expected to be embedded in other large programs which follow the C calling convention. Go does allow some interoperability through cgo, but this is clearly not the expected way that Go programs will be written.
  • Go is focused on concurrency as a first class concept. That is not to say you cannot find aspects of Go’s actor oriented concurrency in Rust, but it is left as an exercise to the programmer.
  • Go is focused on programmer productivity, across the whole software development lifecycle. Rust, as a front end to LLVM, punts on many of those decision.

Rust and Go are not competitors

Go is focused on making large teams of programmers efficient, though rigid application of simplicity — ideas which are not simple or lead to non orthogonal corner cases are rejected.

Rust is focused on a class of programs which cannot tolerate unsafe memory access (neither can Go, I don’t think there is any appetite for a new unsafe programming languages) or runtime overhead — ideas which are unsafe, or come with overhead, are rejected or at least pushed out of the core language.

Rust competes for mindshare with C++ and D for programmers who are prepared to accept more complex syntax and semantics (and presumably higher readability costs) in return for the maximum possible performance. For example, micro controllers, AAA game engines, and web rendering engines.

Go competes for mindshare in the post 2006 Internet 2.0 generation of companies who have outgrown languages like Ruby, Python, and Node.js (v8) and have lost patience with the high deployment costs of JVM based languages.

Listen up!

Today I was devastated to learn of yet another women being driven from the tech industry.

This is so far from being all right it does not even register on the same scale.

The tech industry is a sick, male dominated, misogynistic, flaccid void that continues to permit weak, small, frightened men to abuse and harass women from a presumed position of anonymous safety.

As harassment, especially online, continues unadmonished I can take the only course of action available to me as a civilian; exclusion.

If you harass or abuse women, you will be excluded from any conference, project, meetup, or venture that I organise or participate in. I will also ensure that other conference organisers, meetup organisers, and business owners are fully aware of your repugnant actions, and urge them to exclude you.

If you harass or abuse women, I will not work at any company who chooses to employ you, and I will not permit any company that I work for to hire you.

So, if you think that my statements are unfair or possibly an overreaction, that it’s all in good fun, or maybe I should lighten up, then next time you choose to step out of your fetid cave to abuse a woman, remember that this is not a threat.

It is a promise.

gb, a project based build tool for the Go programming language

A few months ago I introduced gb as a proof of concept to the audience at GDG Berlin. Since then, together with a small band of contributors and an enthusiastic cabal of early adopters, gb has moved from proof of concept, written mostly on trains during a trip through Europe, to something approaching a usable build tool.

This post gives an introduction to gb, and explains the benefits of adopting a project based workflow for building solutions in Go. If you want to read more about the motivations for gb, please read the previous post in this series.


gb is a project based build tool for the Go programming language.

gb projects define an on disk layout that permits repeatable builds via source vendoring. When vendoring (copying) code into a gb project, the original source is not rewritten or modified.

As gb is written in Go, its packages can be used to create plugins to gb that extend its functionality.

Why is a project based approach useful ?

Why is a project based approach, as opposed to a workspace based approach like $GOPATH, useful ?

First and foremost, by structuring your Go application as a project, rather than a piece of a distributed global workspace, you gain control over all the source that goes into your application, including that of its dependencies.

Second, while your project’s layout contains both the code that you have written, and the code that your code depends on, there is a clear separation between the two; they live in different subdirectory trees.

Thirdly, now your project, and by extension the repository that houses it, contains all the source to build your application, having multiple working copies on disk is trivial. If you are part of a team responsible for maintaining multiple releases, this is a very useful property.

Lastly. As your project can be built at any time without going out to the internet to fetch code, you are insulated from political, financial, or technical events that can unexpectedly break your build. Conversely, as the dependencies for your project are included with the project, upgrading those dependencies is atomic and affects everyone on the team automatically without them having to run external steps like go get -u or godeps -u dependencies.txt.

How is gb different ?

In the previous section I outlined the advantages I see in using a project based tool to build Go applications. I want to digress for a moment to explain why gb is different to other existing go build tools.

gb is not a wrapper around the go tool. Not wrapping the go tool means gb is not constrained to solutions that can be implemented with $GOPATH tricks. Not relying on the go tool means gb can ship faster, and fix bugs faster than the fixed pace of releases of the Go toolchain.

You can read more about the rationale for gb here, and reasons for not wrapping the go tool here.

Being a project owner

In the discussions I’ve had about gb, I’ve tried to emphasise the role of the project owner. The owner of the project has a special responsibility; they are responsible for admitting new dependencies into a project, and for curating those dependencies once they are part of the shipping product.

Whether the role of project owner falls to a single engineer, or is distributed across your whole team, it is the project owner who is ultimately responsible for shipping your product, so gb gives you the tools to achieve this without having to rely on a third party.

Project ownership is important. You, the developer, the project owner, the build engineer, needs to own all the source that goes into your product whether you wrote it or not. Don’t be the person who cannot deliver a release because GitHub is down.


No import rewriting

gb is built around a philosophy of leaving the source of a project’s dependency untouched.

The are various technical reasons why I believe import rewriting is a bad idea for Go projects, I won’t repeat them here.

It is my hope that maybe one day, build tools like gb can get a bit smarter about managing dependencies, and avoid the need for whole cloth vendoring, but this cannot happen if imports are rewritten.

Demo time

Enough background, let’s show off gb.

Creating a gb project

Creating a gb project is as simple as creating a directory. This directory can be anywhere of your choosing; it does not need to be inside $GOPATH, in fact gb does not use $GOPATH.

% mkdir -p demo/src
% cd demo

demo is now a gb project. A gb project is defined as any directory that contains a directory called src/. We’ll refer to the root of the project, demo in this case, as the project root, or $PROJECT for short. Let’s go ahead and create a single main package.

% mkdir -p src/cmd/helloworld
% cat > src/cmd/helloworld/main.go <<EOF
package main

import "fmt"

func main() {
       fmt.Println("Hello world from gb")

Commands (main packages) don’t have to be placed in src/cmd/, but that is a nice tradition that has emerged from the Go standard library, so we’ll follow it here. Also note that although gb does not use the go tool to compile Go code, that code must still be structured into packages.

In fact gb is much stricter in this respect, Go code can only be built if it is inside a package, there are no facilities to build or run a single .go source file.

gb supports all the usual ways of compiling one package by passing the name of the package to gb build, but it is simpler to just build the entire project by staying at the root of your project and issuing gb build to build all the source in your project.

gb has support for incremental compilation, so even though gb build is told to build all the source in the project, it will only recompile the parts that have changed; there is no need to point them out to the compiler. Also note that there is no gb install command. gb build both builds and installs (caches packages forincremental compilation later).

With all this said, let’s go ahead an build this project, then run the resulting program

% gb build
% bin/helloworld 
Hello world from gb

By default gb prints out the names of packages it is compiling, you can use the -q flag if you want to suppress this output. When compiling, packages will be built and placed in $PROJECT/pkg/ for possible reuse by latter compilation cycles, main packages (commands), will be placed in $PROJECT/bin/.

If this project contained multiple commands, they would all be built and placed in $PROJECT/bin/. To demonstrated this I created a few more main packages in this project, let’s compile them and look at the result

% gb build
% ls bin/
client  helloworld  server

gb project layout

The previous section walked through the creation of a gb project from scratch and showed using gb build to compile the project.

Let’s have a look at the directory tree of this project an add some annotations to reinforce the gb project concepts.

% tree $(pwd)
├── bin
│   ├── client
│   ├── helloworld
│   └── server
└── src
    └── cmd
        ├── client
        │   └── main.go
        ├── helloworld
        │   └── main.go
        └── server
            └── main.go

6 directories, 6 files

Starting from the top, we have a bin/ directory, this is created by gb build when building main packages to hold the final output of linking executable programs. Inside bin/ we have the the binaries that were built.

Next is the src/ which contains the subdirectory cmd/ and inside that, three packages, client, helloworld, and server.

The final directories you will find inside a gb project is $PROJECT/pkg/ for compiled go packages, and $PROJECT/vendor/ for the source of your project’s dependencies. We’ll discuss vendoring dependencies later in this piece.

Source control

gb doesn’t care about source control, all it cares about is the source of your project is arranged in the format it expects. How those files got there, or who is responsible for tracking changes to them is outside gb’s concern.

Of course, source control is a great idea, and you should be tracking the source of your project using source control. Let’s create a git repo in the $PROJECT root now

% git init .
Initialized empty Git repository in /home/dfc/demo/.git/
% git add src/
% git commit -am 'initial import'
[master (root-commit) aa1acfd] initial import
 3 files changed, 21 insertions(+)
 create mode 100644 src/cmd/client/main.go
 create mode 100644 src/cmd/helloworld/main.go
 create mode 100644 src/cmd/server/main.go

Then of course add a git remote and push to it.

You should not place $PROJECT/bin/ or $PROJECT/pkg/ under source control, as they are temporary directories. You may wish to add a .gitignore or similar to prevent doing so accidentally.

Dependency management

A project which doesn’t have any dependencies, apart from the standard library, is not going to be very a compelling use case for gb. For this next section I’ll walk through creating a new gb project which has several dependencies.

The source for this project is online at Let’s start by creating the project structure and adding some source

% mkdir example-gsftp
% cd example-gsftp
% mkdir -p src/cmd/gsftp # this is our main package
% vim src/cmd/gsftp/main.go

The source for cmd/gsftp/main.go is too long to include, but is available here.

This project depends on package and package, which itself has a dependency on In its current state, if you were to gb build this project it would fail with an error like this

% gb build
FATAL command "build" failed: failed to resolve import path "cmd/gsftp": cannot find package "" in any of:
        /home/dfc/go/src/ (from $GOROOT)
        /home/dfc/example-gsftp/src/ (from $GOPATH)

gb is unable to find the a package with an import path of, in either the project’s source directory, $PROJECT/src/, or the project’s vendored source directory, $PROJECT/vendor/src/.

note the references to $GOPATH are a side effect of reusing the go/build package. gb does not use $GOPATH, this message will be addressed in the future.

Now, I have copies of all of the source of these packages in my $GOPATH, so I can copy them into the $PROJECT/vendor/src directory by hand to satisfy the build.

% mkdir -p vendor/src/
% cp -r $GOPATH/src/* vendor/src/
% mkdir -p vendor/src/
% cp -r $GOPATH/src/* vendor/src/
% mkdir -p vendor/src/
% cp -r $GOPATH/src/* vendor/src/
% gb build
% ls -l bin/gsftp
-rwxrwxr-x 1 dfc dfc 5949744 Jun  8 14:05 bin/gsftp

For completeness’s sake, let’s take a look at the directory structure of the project with these vendored dependencies.

% tree -d $(pwd)                                                                                                                         
├── bin
├── pkg
│   └── linux
│       └── amd64
│           ├──
│           │   ├── kr
│           │   └── pkg
│           └──
│               └── x
│                   └── crypto
│                       └── ssh
├── src
│   └── cmd
│       └── gsftp
└── vendor
    └── src
        │   ├── kr
        │   │   └── fs
        │   └── pkg
        │       └── sftp
        │           └── examples
        │               ├── buffered-read-benchmark
        │               ├── buffered-write-benchmark
        │               ├── streaming-read-benchmark
        │               └── streaming-write-benchmark
            └── x
                └── crypto
                    └── ssh
                        ├── agent
                        ├── terminal
                        ├── test
                        └── testdata

34 directories

Using the gb-vendor plugin

gb’s answer to dependency management is vendoring, copying the source of your project’s dependencies into $PROJECT/vendor/src/. As you saw above, this process can be quite tedious, especially if you do not have the source of the dependency easily to hand.

To assist with this process, gb ships with a plugin called gb-vendor, which aims to automate a lot of this work.

gb-vendor can fetch the dependencies of your project. Let’s use it to automate the steps we just did above.

% rm -rf vendor/src
% gb vendor fetch
% gb vendor fetch
% gb vendor fetch
% gb build

At this point it is a good idea to add your project’s vendor/ directory to source control.

% git add vendor/
% git commit -am 'added vendored dependencies'

gb-vendor also provides commands to update, delete, and report on the vendor dependencies of a project. For example

% gb vendor update

will replace the source of with the latest available upstream.

% gb vendor delete

Will remove $PROJECT/vendor/src/ from disk, and remove its entry from the manifest file.

Lastly, the list subcommand behaves similarly to go list and lets you report on the dependencies recorded in the manifest file.

% gb vendor list     master  f234c3c6540c0358b1802f7fd90c0879af9232eb        master  2788f0dbd16903de03cb8186e5c7d97b69ad387b  master  c10c31b5e94b6f7a0283272dc2bb27163dcea24b

gb-vendor is completely optional

At this point you’re probably saying, “Hang on, aren’t you the person who made a big song and dance about no metadata files ?”.

Yes, it is true that gb-vendor records the dependencies it fetches in a manifest file, ($PROJECT/vendor/manifest), but this manifest file is only used by gb-vendor, and is not part of gb.

gb-vendor is a plugin, it adds a little bit of smarts on top of git clone, or hg checkout, but it isn’t mandatory to use gb-vendor to build gb projects.

All gb cares about is the source on disk, you don’t have to use it. If your workflow works well with svn externals or git subtrees, or maybe just copying the package and recording the revision you copied in the commit message, you can use that approach as well.

gb-vendor is not required to use gb, and gb is completely oblivious to its operation. All that gb cares about is finding the source on disk with the correct layout. You are free to use any method of managing the contents of your $PROJET/vendor/src directory.

How does gb handle the diamond dependency problem ?

In every Go program, regardless of which tool built it (gb, the go tool, Makefile, or by hand), there may only be one copy of a package linked into the final binary.

For project owners this means that if they encounter a situation where two dependencies of their project expect different exported API’s of a third package, they must resolve this problem at the point they introduce these dependencies into their project.

Resolving a diamond dependency conflict requires the project owner choose which copy (Go packages do not have a notion of versions) of the source of that third dependency they will place in $PROJECT/vendor/src/ and adjusting, updating, or replacing other dependencies as required.

Can a gb project be a library ?

In the presentations I’ve made about gb, I have focussed on development teams shipping products written in Go. I see these teams as the ones who have the most to gain, and the least to lose, from adopting gb, so it is reasonable to focus on those teams first.

You can also use gb to build libraries (effectively gb projects that don’t have main packages), and then vendor the source of that project’s src/ directory into another gb project as demonstrated above.

At the moment no automated tools exist to assist with this process, but it is likely that gb-vendor will acquire this ability if there is significant demand in developing libraries in the gb project format.

Wrapping up

This post has described the theory and the practice of using gb. I hope that you have found it useful, and in turn that you may find gb useful if you are part of a team charged with delivering solutions using Go.

  • Project based workflow
  • Repeatable builds via source vendoring without import rewriting
  • Reusable components with a plugin interface

Friday pop quiz: the smallest buffer

bytes.Buffer is a tremendously useful type, but it’s a bit large1.

% sizeof -p bytes Buffer
Buffer 112

… and that is just the overhead, we haven’t put any data into the buffer yet.

This Friday’s2 challenge is to write a replacement for bytes.Buffer that implements io.ReadWriter and allows the caller to discover the length and capacity of the buffer.

The smallest and most creative solution wins fame, adoration, and a first run gb sticker.


  • The code must continue to be correctly formatted.
  • Points will be deducted for arguing with the judge (me).
  • Everything you need to win this challenge is in the description; think laterally.


As I hoped, most readers quickly figured out a good way to save a few lines was to declare Read and Write methods on a []byte, not a struct. This would lead to some small complications dereferencing the value rather than treating it as a struct with a buf []byte field, but everyone seemed to figure that out, which is good, as these are useful skills to have in your Go toolbelt.

A few readers also spotted the deliberate loophole I left in the wording of the question around obtaining the length and the capacity of the buffer. Declaring a new type with an underlying type of a slice gives you access to the len and cap, so finding the length of a slice requires no additional methods on the type.

type Buffer []byte

func main() {
        var b Buffer

Thus, the core of this challenge was to define a new slice type that had Read and Write methods, which would end up taking an overhead of 3 machine words, 24 bytes on 64bit platforms, 12 on 32bit.

One nice property of this arrangement is that if you already have a []byte slice, you can convert it into a Buffer and consume zero additional storage, as you are effectively replacing the 3 words that described the []byte with 3 words which describe your new slice type.

s := []byte{0x01, 0x02, 0x03}
buf := Buffer(s)

However, as usually happens with these quizzes, a solution arrives that wipes the smug smile from my face,

Kevin, I take my imaginary hat off to you, Sir.

For the record, here was the solution I came up with last night. It is longer than I hoped it would be because of the odd contract that the standard library bytes.Buffer tests require. I think a more liberal reading of the io.Reader contract would result in a smaller entry.

// A Buffer is a variable-sized buffer of bytes with Read and Write
// methods. The zero value for Buffer is an empty buffer ready to use.
type Buffer []byte

// Write writes len(p) bytes from p to the Buffer.
func (b *Buffer) Write(p []byte) (int, error) {
        *b = append(*b, p...)
        return len(p), nil

// Read reads up to len(p) bytes into p from the Buffer.
func (b *Buffer) Read(p []byte) (int, error) {
        if len(p) == 0 {
                return 0, nil
        if len(*b) == 0 {
                return 0, io.EOF
        n := copy(p, *b)
        *b = (*b)[n:]
        return n, nil

Playground link

So, prizes and glory to @rf, Ben Lubar, and @kevingillette, with special mentions to Egon Elbre, and Dan Kortschak and Douglas Clark from G+. Some of you were more correct than others, but you were all very quick, and that’s got to count for something. I’ll be in touch with your prize.

If a reader wants to debate their solution, and possibly best ours, consider this an open challenge.

  1. Where do you get the sizeof program ? Why, from Russ Cox of course, (oops, it looks like this doesn’t work with Go 1.4.2, better use tip, or try this online version)
  2. I’m sorry if it isn’t Friday where you live. I can’t help it if Australians live in the future.

Hear me speak about Go performance at OSCON

I’m going to be speaking at OSCON this year about Go performance. The title of the talk is High performance servers without the event loop and will focus on the features of the language and its runtime that transparently let Go programmers write high performance network servers without resorting to event loops and callback spaghetti.

As the OSCON website is a bit light on for details, here are some more to entice you.


Conventional wisdom suggests that high performance servers require native threads, or more recently, event loops.

Neither solution is without its downside. Threads carry a high overhead in terms of scheduling cost and memory footprint. Event loops ameliorate those costs, but introduce their own requirements for a complex callback driven style.

A common refrain when talking about Go is it’s a language that works well on the server; static binaries, powerful concurrency, and high performance.

This talk focuses on the last two items, how the language and the runtime transparently let Go programmers write highly scalable network servers without having to worry about thread management or blocking I/O.

The goal of this talk is to introduce the following features of the language and the runtime:

  • Escape Analysis
  • Stack management
  • Processes and threads vs goroutines
  • Integrated network poller

These four features work in concert to build an argument for the suitability of Go as a language for writing high performance servers.

How does a 20% discount sound ?

O’Reilly have provided me with a discount code DAVEC20 that you can use to save 20% on your registration.

As a bonus, if enough people register for OSCON using the discount code DAVEC20, the organisers have said they will give me a free pass to the event which I plan to donate to Women Who Code. I know that this is a lot of ifs, but it is worth a try.

So, if you’re interested in hearing me, and hundreds of others speak at OSCON this year, you can save yourself 20% and recieve some good karma in the process. What’s too lose ?