Dynamically scoped variables in Go

This is a thought experiment in API design. It starts with the classic Go unit testing idiom:

func TestOpenFile(t *testing.T) {
        f, err := os.Open("notfound")
        if err != nil {

        // ...

What’s the problem with this code? The assertion. if err != nil { ... } is repetitive and in the case where multiple conditions need to be checked, somewhat error prone if the author of the test uses t.Error not t.Fatal, eg:

        f, err := os.Open("notfound")
        if err != nil {
        f.Close() // boom!

What’s the solution? DRY it up, of course, by moving the repetitive assertion logic to a helper:

func TestOpenFile(t *testing.T) {
        f, err := os.Open("notfound")
        check(t, err)

        // ...
func check(t *testing.T, err error) {
       if err != nil {

Using the check helper the code is a little cleaner, and clearer, check the error, and hopefully the indecision between t.Error and t.Fatal has been solved. The downside of abstracting the assertion to a helper function is now you need to pass a testing.T into each and every invocation. Worse, you need to pass a *testing.T to everything that needs to call check, transitively, just in case.

This is ok, I guess, but I will make the observation that the t variable is only needed when the assertion fails — and even in a testing scenario, most of the time, most of the tests pass, so that means reading, and writing, all these t‘s is a constant overhead for the relatively rare occasion that a test fails.

What about if we did something like this instead?

func TestOpenFile(t *testing.T) {
        f, err := os.Open("notfound")
        // ...
func check(err error) {
        if err != nil {

Yeah, that’ll work, but it has a few problems

% go test
--- FAIL: TestOpenFile (0.00s)
panic: open notfound: no such file or directory [recovered]
        panic: open notfound: no such file or directory

goroutine 22 [running]:
        /Users/dfc/go/src/testing/testing.go:874 +0x3a3
panic(0x111b040, 0xc0000866f0)
        /Users/dfc/go/src/runtime/panic.go:679 +0x1b2
        /Users/dfc/src/github.com/pkg/expect/expect_test.go:10 +0xa1
testing.tRunner(0xc0000b4400, 0x115ac90)
        /Users/dfc/go/src/testing/testing.go:909 +0xc9
created by testing.(*T).Run
        /Users/dfc/go/src/testing/testing.go:960 +0x350
exit status 2

Let’s start with the good; we didn’t have to pass a testing.T every place we call check, the test fails immediately, and we get a nice message in the panic — albeit twice. But where the assertion failed is hard to see. It occurred on expect_test.go:11 but you’d be forgiven for not knowing that.

So panic isn’t really a good solution, but there’s something in this stack trace that is — can you see it? Here’s a hint, github.com/pkg/expect_test.TestOpenFile(0xc0000b4400).

TestOpenFile has a t value, it was passed to it by tRunner, so there’s a testing.T in memory at address 0xc0000b4400. What if we could get access to that t inside check? Then we could use it to call t.Helper and t.Fatal. Is that possible?

Dynamic scoping

What we want is to be able to access a variable whose declaration is neither global, or local to the function, but somewhere higher in the call stack. This is called dynamic scoping. Go doesn’t support dynamic scoping, but it turns out, for restricted cases, we can fake it. I’ll skip to the chase:

// getT returns the address of the testing.T passed to testing.tRunner
// which called the function which called getT. If testing.tRunner cannot
// be located in the stack, say if getT is not called from the main test
// goroutine, getT returns nil.
func getT() *testing.T {
        var buf [8192]byte
        n := runtime.Stack(buf[:], false)
        sc := bufio.NewScanner(bytes.NewReader(buf[:n]))
        for sc.Scan() {
                var p uintptr
                n, _ := fmt.Sscanf(sc.Text(), "testing.tRunner(%v", &p)
                if n != 1 {
                return (*testing.T)(unsafe.Pointer(p))
        return nil

We know that each Test is called by the testing package in its own goroutine (see the stack trace above). The testing package launches the test via a function called tRunner which takes a *testing.T and a func(*testing.T) to invoke. Thus we grab a stack trace of the current goroutine, scan through it for the line beginning with testing.tRunner — which can only be the testing package as tRunner is a private function — and parse the address of the first parameter, which is a pointer to a testing.T. With a little unsafe we convert the raw pointer back to a *testing.T and we’re done.

If the search fails then it is likely that getT wasn’t called from a Test. This is actually ok because the reason we needed the *testing.T was to call t.Fatal and the testing package already requires that t.Fatal be called from the main test goroutine.

import "github.com/pkg/expect"

func TestOpenFile(t *testing.T) {
        f, err := os.Open("notfound")
        // ...

Putting it all together we’ve eliminated the assertion boilerplate and possibly made the expectation of the test a little clearer to read, after opening the file err is expected to be nil.

Is this fine?

At this point you should be asking, is this fine? And the answer is, no, this is not fine. You should be screaming internally at this point. But it’s probably worth introspecting those feelings of revulsion.

Apart from the inherent fragility of scrobbling around in a goroutine’s call stack, there are some serious design issues:

  1. The expect.Nil‘s behaviour now depends on who called it. Provided with the same arguments it may have different behaviour depending on where it appears in the call stack — this is unexpected.
  2. Taken to the extreme dynamic scoping effective brings into the scope of a single function all the variables passed into any function that preceded it. It is a side channel for passing data in to and out of functions that is not explicitly documented in function declaration.

Ironically these are precisely the critiques I have of context.Context. I’ll leave it to you to decide if they are justified.

A final word

This is a bad idea, no argument there. This is not a pattern you should ever use in production code. But, this isn’t production code, it’s a test, and perhaps there are different rules that apply to test code. After all, we use mocks, and stubs, and monkey patching, and type assertions, and reflection, and helper functions, and build flags, and global variables, all so we can test our code effectively. None of those, uh, hacks will ever show up in the production code path, so is it really the end of the world?

If you’ve read this far perhaps you’ll agree with me that as unconventional as this approach is, not having to pass a *testing.T into every function that could possibly need to assert something transitively, makes for clearer test code.

So maybe, in this case, the ends do justify the means.

If you’re interested, I’ve put together a small assertion library using this pattern. Caveat emptor.

Complementary engineering indicators

Last year I had the opportunity to watch Cat Swetel’s presentation The Development Metrics You Should Use (but Don’t). The information that could be gleaned from just tracking the start and finish date of work items was eye opening. If you’re using an issue tracker this information is probably already (perhaps with some light data munging) available — no need for TPS reports. Additionally, statistics obtained by data mining your project’s issue tracker are, perhaps, less likely to be juked.

Around the time I saw Cat’s presentation I finished reading Andy Grove’s High Output Management. The hidden gem in this book (assuming becoming a meeting powerhouse isn’t your bag) was Grove’s notion of indicator pairs. An example of a paired indicator might be the number of sales deals closed paired with the customer retention rate. The underling principle being optimising for one indicator will have an adverse impact on the other. In the example, overly aggressive or deceptive tactics could superficially raise the number of sales made, but would be reflected in a dip in the retention rate as customers returned the product or terminated their service prematurely.

These ideas lead me to thinking about indicators you could use for a team delivering a software product. Could those indicators be derived cheaply from the hand to hand combat of software delivery? Could they be structured in a way that aggressively pursuing one metric would be reflected negatively in another? I think so.

These are the three metrics that I’ve been using to track the health of the project that I lead.

  • Date; was the software done when we said it would be done. If you prefer this indicator as a scalar, how many days difference is there between the ship date agreed on at the start of the sprint/milestone/whatever and what was the actual date that you considered it done.
  • Completeness; when the software is done, how many of the things we said we’re going to do actually got delivered in that release.
  • Defects reported; once the software is in the field, what is the rate of bugs reported.

It is relatively easy, for example, to hit a delivery date if you aggressively descope anything risky or simply don’t do it. But in doing so this lack of promised functionality would impact the completeness metric.

Conversely, it’s straight forward to hit your milestone’s completeness target if you let the release date slip and slip. Bringing both the metics into line requires good estimation skills to judge how much can be attempted in milestone and provide direct feedback if your estimation skills needed work.

The third indicator, defects reported in the field, acts as a check on the other two. It would be easy to consistent hit your delivery date with 100% feature completion if your team does a shoddy job. The high fives and :tada: emojis will be short lived if each release brings with it a swathe of high priority bug reports. This indicator also tends to have a second order effect, rushed features to meet a deadline tend to generate remedial work in the following milestones, crowding out promised work or blowing later deadlines.

I consider these to be complementary metrics, they should be considered together, as a group, rather than individually. Ideally your team should be delivering what you promised, when you promised it, with a low defect rate. But more importantly, if that isn’t the case, if one of the indicators is unhealthy, addressing it shouldn’t result in the problem moving to another.

Use internal packages to reduce your public API surface

In the beginning, before the go tool, before Go 1.0, the Go distribution stored the standard library in a subdirectory called pkg/ and the commands which built upon it in cmd/. This wasn’t so much a deliberate taxonomy but a by product of the original make based build system. In September 2014, the Go distribution dropped the pkg/ subdirectory, but then this tribal knowledge had set root in large Go projects and continues to this day.

I tend to view empty directories inside a Go project with suspicion. Often they are a hint that the module’s author may be trying to create a taxonomy of packages rather than ensuring each package’s name, and thus its enclosing directory, uniquely describes its purpose. While the symmetry with cmd/ for package main commands is appealing, a directory that exists only to hold other packages is a potential design smell.

More importantly, the boilerplate of an empty pkg/ directory distracts from the more useful idiom of an internal/ directory. internal/ is a special directory name recognised by the go tool which will prevent one package from being imported by another unless both share a common ancestor. Packages within an internal/ directory are therefore said to be internal packages.

To create an internal package, place it within a directory named internal/. When the go command sees an import of a package with internal/ in the import path, it verifies that the importing package is within the tree rooted at the parent of the internal/ directory.

For example, a package /a/b/c/internal/d/e/f can only be imported by code in the directory tree rooted at /a/b/c. It cannot be imported by code in /a/b/g or in any other repository.

If your project contains multiple packages you may find you have some exported symbols which are intended to be used by other packages in your project, but are not intended to be part of your project’s public API. Although Go has limited visibility modifiers–public, exported, symbols and private, non exported, symbols–internal packages provide a useful mechanism for controlling visibility to parts of your project which would otherwise be considered part of its public versioned API.

You can, of course, promote internal packages later if you want to commit to supporting that API; just move them up a directory level or two. The key is this process is opt-in. As the author, internal packages give you control over which symbols in your project’s public API without being forced to glob concepts together into unwieldy mega packages to avoid exporting them.

Be wary of functions which take several parameters of the same type

APIs should be easy to use and hard to misuse.

— Josh Bloch

A good example of a simple looking, but hard to use correctly, API is one which takes two or more parameters of the same type. Let’s compare two function signatures:

func Max(a, b int) int
func CopyFile(to, from string) error

What’s the difference between these functions? Obviously one returns the maximum of two numbers, the other copies a file, but that’s not the important thing.

Max(8, 10) // 10
Max(10, 8) // 10

Max is commutative; the order of its parameters does not matter. The maximum of eight and ten is ten regardless of if I compare eight and ten or ten and eight.

However, this property does not hold true for CopyFile.

CopyFile("/tmp/backup", "presentation.md")
CopyFile("presentation.md", "/tmp/backup")

Which one of these statements made a backup of your presentation and which one overwrite your presentation with last week’s version? You can’t tell without consulting the documentation. A code reviewer cannot know if you’ve got the order correct without consulting the documentation.

The general advice is to try to avoid this situation. Just like long parameter lists, indistinct parameter lists are a design smell.

A challenge

When this situation is unavoidable my solution to this class of problem is to introduce a helper type which will be responsible for calling CopyFile correctly.

type Source string

func (src Source) CopyTo(dest string) error {
	return CopyFile(dest, string(src))

func main() {
	var from Source = "presentation.md"

In this way CopyFile is always called correctly and, given its poor API can possibly be made private, further reducing the likelihood of misuse.

Can you suggest a better solution?

Don’t force allocations on the callers of your API

This is a post about performance. Most of the time when worrying about the performance of a piece of code the overwhelming advice should be (with apologies to Brendan Gregg) don’t worry about it, yet. However there is one area where I counsel developers to think about the performance implications of a design, and that is API design.

Because of the high cost of retrofitting a change to an API’s signature to address performance concerns, it’s worthwhile considering the performance implications of your API’s design on its caller.

A tale of two API designs

Consider these two Read methods:

func (r *Reader) Read(buf []byte) (int, error)
func (r *Reader) Read() ([]byte, error)

The first method takes a []byte buffer and returns the number of bytes read into that buffer and possibly an error that occurred while reading. The second takes no arguments and returns some data as a []byte or an error.

This first method should be familiar to any Go programmer, it’s io.Reader.Read. As ubiquitous as io.Reader is, it’s not the most convenient API to use. Consider for a moment that io.Reader is the only Go interface in widespread use that returns both a result and an error. Meditate on this for a moment. The standard Go idiom, checking the error and iff it is nil is it safe to consult the other return values, does not apply to Read. In fact the caller must do the opposite. First they must record the number of bytes read into the buffer, reslice the buffer, process that data, and only then, consult the error. This is an unusual API for such a common operation and one that frequently catches out newcomers.

A trap for young players?

Why is it so? Why is one of the central APIs in Go’s standard library written like this? A superficial answer might be io.Reader‘s signature is a reflection of the underlying read(2) syscall, which is indeed true, but misses the point of this post.

If we compare the API of io.Reader to our alternative, func Read() ([]byte, error), this API seems easier to use. Each call to Read() will return the data that was read, no need to reslice buffers, no need to remember the special case to do this before checking the error. Yet this is not the signature of io.Reader.Read. Why would one of Go’s most pervasive interfaces choose such an awkward API? The answer, I believe, lies in the performance implications of the APIs signature on the caller.

Consider again our alternative Read function, func Read() ([]byte, error). On each call Read will read some data into a buffer1 and return the buffer to the caller. Where does this buffer come from? Who allocates it? The answer is the buffer is allocated inside Read. Therefore each call to Read is guaranteed to allocate a buffer which would escape to the heap. The more the program reads, the faster it reads data, the more streams of data it reads concurrently, the more pressure it places on the garbage collector.

The standard libraries’ io.Reader.Read forces the caller to supply a buffer because if the caller is concerned with the number of allocations their program is making this is precisely the kind of thing they want to control. Passing a buffer into Read puts the control of the allocations into the caller’s hands. If they aren’t concerned about allocations they can use higher level helpers like ioutil.ReadAll to read the contents into a []byte, or bufio.Scanner to stream the contents instead.

The opposite, starting with a method like our alternative func Read() ([]byte, error) API, prevents callers from pooling or reusing allocations–no amount of helper methods can fix this. As an API author, if the API cannot be changed you’ll be forced to add a second form to your API taking a supplied buffer and reimplementing your original API in terms of the newer form. Consider, for example, io.CopyBuffer. Other examples of retrofitting APIs for performance reasons are the fmt package and the net/http package which drove the introduction of the sync.Pool type precisely because the Go 1 guarantee prevented the APIs of those packages from changing.

If you want to commit to an API for the long run, consider how its design will impact the size and frequency of allocations the caller will have to make to use it.

Go compiler intrinsics

Go allows authors to write functions in assembly if required. This is called a stub or forward declaration.

package asm

// Add returns the sum of a and b.
func Add(a int64, b int64) int64

Here we’re declaring Add, a function which takes two int64‘s and returns their sum.Add is a normal Go function declaration, except it is missing the function body.

If we were to try to compile this package the compiler, justifiably, complains;

% go build
./decl.go:4:6: missing function body

To satisfy the compiler we must supply a body for Add via assembly, which we do by adding a .s file in the same package.

TEXT ·Add(SB),$0-24
        MOVQ a+0(FP), AX
        ADDQ b+8(FP), AX
        MOVQ AX, ret+16(FP)

Now we can build, test, and use our Add function just like normal Go code. But, there’s a problem, assembly functions cannot be inlined.

This has long been a complaint by Go developers who want to use assembly either for performance or to access operations which are not exposed in the language. Some examples would be vector instructions, atomic instructions, and so on. Without the ability to inline assembly functions writing these functions in Go can have a relatively large overhead.

var Result int64

func BenchmarkAddNative(b *testing.B) {
        var r int64
        for i := 0; i < b.N; i++ {
                r = int64(i) + int64(i)
        Result = r 

func BenchmarkAddAsm(b *testing.B) {
        var r int64
        for i := 0; i < b.N; i++ {
                r = Add(int64(i), int64(i))
        Result = r
BenchmarkAddNative-8    1000000000               0.300 ns/op
BenchmarkAddAsm-8       606165915                1.93 ns/op

Over the years there have been various proposals for an inline assembly syntax similar to gcc’s asm(...) directive. None have been accepted by the Go team. Instead, Go has added intrinsic functions1.

An intrinsic function is Go code written in regular Go. These functions are known the the Go compiler which contains replacements which it can substitute during compilation. As of Go 1.13 the packages which the compiler knows about are:

  • math/bits
  • sync/atomic

The functions in these packages have baroque signatures but this lets the compiler, if your architecture supports a more efficient way of performing the operation, transparently replace the function call with comparable native instructions.

For the remainder of this post we’ll study two different ways the Go compiler produces more efficient code using intrinsics.

Ones count

Population count, the number of 1 bits in a word, is an important cryptographic and compression primitive. Because this is an important operation most modern CPUs provide a native hardware implementation.

The math/bits package exposes support for this operation via the OnesCount series of functions. The various OnesCount functions are recognised by the compiler and, depending on the CPU architecture and the version of Go, will be replaced with the native hardware instruction.

To see how effective this can be lets compare three different ones count implementations. The first is Kernighan’s Algorithm2.

func kernighan(x uint64) int {
        var count int
        for ; x > 0; x &= (x - 1) {
        return count                

This algorithm has a maximum loop count of the number of bits set; the more bits set, the more loops it will take.

The second algorithm is taken from Hacker’s Delight via issue 14813.

func hackersdelight(x uint64) int {
        const m1 = 0x5555555555555555
        const m2 = 0x3333333333333333
        const m4 = 0x0f0f0f0f0f0f0f0f
        const h01 = 0x0101010101010101

        x -= (x >> 1) & m1
        x = (x & m2) + ((x >> 2) & m2)
        x = (x + (x >> 4)) & m4
        return int((x * h01) >> 56)

Lots of clever bit twiddling allows this version to run in constant time and optimises very well if the input is a constant (the whole thing optimises away if the compiler can figure out the answer at compiler time).

Let’s benchmark these implementations against math/bits.OnesCount64.

var Result int

func BenchmarkKernighan(b *testing.B) {
        var r int
        for i := 0; i < b.N; i++ {
                r = kernighan(uint64(i))
        Result = r

func BenchmarkPopcnt(b *testing.B) {
        var r int
        for i := 0; i < b.N; i++ {
                r = hackersdelight(uint64(i))
        Result = r

func BenchmarkMathBitsOnesCount64(b *testing.B) {
        var r int
        for i := 0; i < b.N; i++ {
                r = bits.OnesCount64(uint64(i))
        Result = r

To keep it fair, we’re feeding each function under test the same input; a sequence of integers from zero to b.N. This is fairer to Kernighan’s method as its runtime increases with the number of one bits in the input argument.3

BenchmarkKernighan-8                    100000000               11.2 ns/op
BenchmarkPopcnt-8                       618312062                2.02 ns/op
BenchmarkMathBitsOnesCount64-8          1000000000               0.565 ns/op 

The winner by nearly 4x is math/bits.OnesCount64, but is this really using a hardware instruction, or is the compiler just doing a better job at optimising this code? Let’s check the assembly

% go test -c
% go tool objdump -s MathBitsOnesCount popcnt-intrinsic.test
TEXT examples/popcnt-intrinsic.BenchmarkMathBitsOnesCount64(SB) /examples/popcnt-intrinsic/popcnt_test.go
   popcnt_test.go:45     0x10f8610               65488b0c2530000000      MOVQ GS:0x30, CX
   popcnt_test.go:45     0x10f8619               483b6110                CMPQ 0x10(CX), SP
   popcnt_test.go:45     0x10f861d               7668                    JBE 0x10f8687
   popcnt_test.go:45     0x10f861f               4883ec20                SUBQ $0x20, SP
   popcnt_test.go:45     0x10f8623               48896c2418              MOVQ BP, 0x18(SP)
   popcnt_test.go:45     0x10f8628               488d6c2418              LEAQ 0x18(SP), BP
   popcnt_test.go:47     0x10f862d               488b442428              MOVQ 0x28(SP), AX
   popcnt_test.go:47     0x10f8632               31c9                    XORL CX, CX
   popcnt_test.go:47     0x10f8634               31d2                    XORL DX, DX
   popcnt_test.go:47     0x10f8636               eb03                    JMP 0x10f863b
   popcnt_test.go:47     0x10f8638               48ffc1                  INCQ CX
   popcnt_test.go:47     0x10f863b               48398808010000          CMPQ CX, 0x108(AX)
   popcnt_test.go:47     0x10f8642               7e32                    JLE 0x10f8676
   popcnt_test.go:48     0x10f8644               803d29d5150000          CMPB $0x0, runtime.x86HasPOPCNT(SB)
   popcnt_test.go:48     0x10f864b               740a                    JE 0x10f8657
   popcnt_test.go:48     0x10f864d               4831d2                  XORQ DX, DX
   popcnt_test.go:48     0x10f8650               f3480fb8d1              POPCNT CX, DX // math/bits.OnesCount64
   popcnt_test.go:48     0x10f8655               ebe1                    JMP 0x10f8638
   popcnt_test.go:47     0x10f8657               48894c2410              MOVQ CX, 0x10(SP)
   popcnt_test.go:48     0x10f865c               48890c24                MOVQ CX, 0(SP)
   popcnt_test.go:48     0x10f8660               e87b28f8ff              CALL math/bits.OnesCount64(SB)
   popcnt_test.go:48     0x10f8665               488b542408              MOVQ 0x8(SP), DX
   popcnt_test.go:47     0x10f866a               488b442428              MOVQ 0x28(SP), AX
   popcnt_test.go:47     0x10f866f               488b4c2410              MOVQ 0x10(SP), CX
   popcnt_test.go:48     0x10f8674               ebc2                    JMP 0x10f8638
   popcnt_test.go:50     0x10f8676               48891563d51500          MOVQ DX, examples/popcnt-intrinsic.Result(SB)
   popcnt_test.go:51     0x10f867d               488b6c2418              MOVQ 0x18(SP), BP
   popcnt_test.go:51     0x10f8682               4883c420                ADDQ $0x20, SP
   popcnt_test.go:51     0x10f8686               c3                      RET
   popcnt_test.go:45     0x10f8687               e884eef5ff              CALL runtime.morestack_noctxt(SB)
   popcnt_test.go:45     0x10f868c               eb82                    JMP examples/popcnt-intrinsic.BenchmarkMathBitsOnesCount64(SB)
   :-1                   0x10f868e               cc                      INT $0x3
   :-1                   0x10f868f               cc                      INT $0x3 

There’s quite a bit going on here, but the key take away is on line 48 (taken from the source code of the _test.go file) the program is using the x86 POPCNT instruction as we hoped. This turns out to be faster than bit twiddling.

Of interest is the comparison two instructions prior to the POPCNT,

CMPB $0x0, runtime.x86HasPOPCNT(SB)

As not all intel CPUs support POPCNT the Go runtime records at startup if the CPU has the necessary support and stores the result in runtime.x86HasPOPCNT. Each time through the benchmark loop the program is checking does the CPU have POPCNT support before it issues the POPCNT request.

The value of runtime.x86HasPOPCNT isn’t expected to change during the life of the program’s execution so the result of the check should be highly predictable making the check relatively cheap.

Atomic counter

As well as generating more efficient code, intrinsic functions are just regular Go code, the rules of inlining (including mid stack inlining) apply equally to them.

Here’s an example of an atomic counter type. It’s got methods on types, method calls several layers deep, multiple packages, etc.

import (

type counter uint64

func (c counter) get() uint64 {
         return atomic.LoadUint64((uint64)(c))

func (c counter) inc() uint64 {
        return atomic.AddUint64((uint64)(c), 1)

func (c counter) reset() uint64 {
        return atomic.SwapUint64((uint64)(c), 0)

var c counter

func f() uint64 {
        return c.reset()

You’d be forgiven for thinking this would have a lot of overhead. However, because of the interaction between inlining and compiler intrinsics, this code collapses down to efficient native code on most platforms.

TEXT main.f(SB) examples/counter/counter.go
   counter.go:23         0x10512e0               90                      NOPL
   counter.go:29         0x10512e1               b801000000              MOVL $0x1, AX
   counter.go:13         0x10512e6               488d0d0bca0800          LEAQ main.c(SB), CX
   counter.go:13         0x10512ed               f0480fc101              LOCK XADDQ AX, 0(CX) // c.inc
   counter.go:24         0x10512f2               90                      NOPL
   counter.go:10         0x10512f3               488b05fec90800          MOVQ main.c(SB), AX // c.get
   counter.go:25         0x10512fa               90                      NOPL
   counter.go:16         0x10512fb               31c0                    XORL AX, AX
   counter.go:16         0x10512fd               488701                  XCHGQ AX, 0(CX) // c.reset
   counter.go:16         0x1051300               c3                      RET 

By way of explanation. The first operation, counter.go:13 is c.inc a LOCKed XADDQ, which on x86 is an atomic increment. The second, counter.go:10 is c.get which on x86, due to its strong memory consistency model, is a regular load from memory. The final operation, counter.go:16, c.reset is an atomic exchange of the address in CX with AX which was zeroed on the previous line. This puts the value in AX, zero, into the address stored in CX. The value previously stored at (CX) is discarded.


Intrinsics are a neat solution that give Go programmers access to low level architectural operations without having to extend the specification of the language. If an architecture doesn’t have a specific sync/atomic primitive (like some ARM variants), or a math/bits operation, then the compiler transparently falls back to the operation written in pure Go.

Clear is better than clever

This article is based on my GopherCon Singapore 2019 presentation. In the presentation I referenced material from my post on declaring variables and my GolangUK 2017 presentation on SOLID design. For brevity those parts of the talk have been elided from this article. If you prefer, you can watch the recording of the talk.

Readability is often cited as one of Go’s core tenets, I disagree. In this article I’ll discuss the differences between clarity and readability, show you what I mean by clarity and how it applies to Go code, and argue that Go programmers should strive for clarity–not just readability–in their programs.

Why would I read your code?

Before I pick apart the difference between clarity and readability, perhaps the question to ask is, “why would I read your code?” To be clear, when I say I, I don’t mean me, I mean you. And when I say your code I also mean you, but in the third person. So really what I’m asking is, “why would you read another person’s code?”

I think Russ Cox, paraphrasing Titus Winters, put it best:

Software engineering is what happens to programming when you add time and other programmers.

Russ Cox, GopherCon Singapore 2018

The answer to the question, “why would I read your code” is, because we have to work together. Maybe we don’t work in the same office, or live in the same city, maybe we don’t even work at the same company, but we do collaborate on a piece of software, or more likely consume it as a dependency.

This is the essence of Russ and Titus’ observation; software engineering is the collaboration of software engineers over time. I have to read your code, and you read mine, so that I can understand it, so that you can maintain it, and in short, so that any programmer can change it.

Russ is making the distinction between software programming and software engineering. The former is a program you write for yourself, the latter is a program, ​a project, a service, a product, ​that many people will contribute to over time. Engineers will come and go, teams will grow and shrink, requirements will change, features will be added and bugs fixed. This is the nature of software engineering.

We don’t read code, we decode it

It was sometime after that presentation that I finally realized the obvious: Code is not literature. We don’t read code, we decode it.

Peter Seibel

The author Peter Seibel suggests that programs are not read, but are instead decoded. In hindsight this is obvious, after all we call it source code, not source literature. The source code of a program is an intermediary form, somewhere between our concept–what’s inside our heads–and the computer’s executable notation.

In my experience, the most common complaint when faced with a foreign codebase written by someone, or some team, is the code is unreadable. Perhaps you agree with me?

But readability as a concept is subjective. Readability is nit picking about line length and variable names. Readability is holy wars about brace position. Readability is the hand to hand combat of style guides and code review guidelines that regulate the use of whitespace.

Clarity ≠ Readability

Clarity, on the other hand, is the property of the code on the page. Clear code is independent of the low level details of function names and indentation because clear code is concerned with what the code is doing, not just how it is written down.

When you or I say that a foreign codebase is unreadable, what I think what we really mean is, I don’t understand it. For the remainder of this article I want to try to explore the difference between clear code and code that is simply readable, because the goal is not how quickly you can read a piece of code, but how quickly you can grasp its meaning.

Keep to the left

Go programs are traditionally written in a style that favours guard clauses and preconditions. This encourages the successful path to proceed down the page rather than indented inside a conditional block. Mat Ryer calls this line of sight coding, because, the active part of your function is not at risk of sliding out of sight beyond the right hand margin of your screen.

By keeping conditional blocks short, and for the exceptional condition, we avoid nested blocks and potentially complex value shadowing. The successful flow of control continues down the page. At every point in the sequence of statements, if you’ve arrived at that point, you are confident that a growing set of preconditions holds true.

func ReadConfig(path string) (*Config, error) {
        f, err := os.Open(path)
        if err != nil {
                return nil, err
        defer f.Close()
        // ...

The canonical example of this is the classic Go error check idiom; if err != nil then return it to the caller, else continue with the function. We can generalise this pattern a little and in pseudocode we have:

if some condition {
        // true: cleanup
 // false: continue 

If some condition is true, then return to the caller, else continue onwards towards the end of the function. 

This form holds true for all preconditions, error checks, map lookups, length checks, and so forth. The exact form of the precondition’s check changes, but the pattern is always the same; the cleanup code is inside the block, terminating with a return, the success condition lies outside the block, and is only reachable if the precondition is false.

Even if you are unsure what the preceding and succeeding code does, how the precondition is formed, and how the cleanup code works, it is clear to the reader that this is a guard clause.

Structured programming

Here we have a comp function that takes two ints and returns an int;

func comp(a, b int) int {
        if a < b {
                return -1
        if a > b {
                return 1
        return 0

The comp function is written in a similar form to guard clauses from earlier. If a is less than b, the return -1 path is taken. If a is greater than b, the return 1 path is taken. Else, a and b are by induction equal, so the final return 0 path is taken.

func comp(a, b int) int {
        if condition A {
                body A
        if condition B {
                 body B
        return 0

The problem with comp as written is, unlike the guard clause, someone maintaining this function has to read all of it. To understand when 0 is returned, the reader has to consult the conditions and the body of each clause. This is reasonable when you’re dealing with functions which fit on a slide, but in the real world complicated functions–​the ones we’re paid for our expertise to maintain–are rarely slide sized, and their conditions and bodies are rarely simple.

Let’s address the problem of making it clear under which condition 0 will be returned:

func comp(a, b int) int {
        if a < b {
                return -1
        } else if a > b {
                return 1
        } else {
                return 0

Now, although this code is not what anyone would argue is readable–​long chains of if else if statements are broadly discouraged in Go–​it is clearer to the reader that zero is only returned if none of the conditions are met.

How do we know this? The Go spec declares that each function that returns a value must end in a terminating statement. This means that the body of all conditions must return a value. Thus, this does not compile:

func comp(a, b int) int {
        if a > b {
                a = b // does not compile
        } else if a < b {
                return 1
        } else {
                return 0

Further, it is now clear to the reader that this code isn’t actually a series of conditions. This is an example of selection. Only one path can be taken regardless of the operation of the condition blocks. Based on the inputs one of -1, 0, or 1 will always be returned. 

func comp(a, b int) int {
        if a < b {
                return -1
        } else if a > b {
                return 1
        } else {
                return 0

However this code is hard to read as each of the conditions is written differently, the first is a simple if a < b, the second is the unusual else if a > b, and the last conditional is actually unconditional.

But it turns out there is a statement which we can use to make our intention much clearer to the reader; switch.

func comp(a, b int) int {
        switch {
        case a < b:
                return -1
        case a > b:
                return 1
                return 0

Now it is clear to the reader that this is a selection. Each of the selection conditions are documented in their own case statement, rather than varying else or else if clauses.

By moving the default condition inside the switch, the reader only has to consider the cases that match their condition, as none of the cases can fall out of the switch block because of the default clause.1

Structured programming submerges structure and emphasises behaviour

–Richard Bircher, The limits of software

I found this quote recently and I think it is apt. My arguments for clarity are in truth arguments intended to emphasise the behaviour of the code, rather than be side tracked by minutiae of the structure itself. Said another way, what is the code trying to do, not how is it is trying to do it.

Guiding principles

I opened this article with a discussion of readability vs clarity and hinted that there were other principles of well written Go code. It seems fitting to close on a discussion of those other principles.

Last year Bryan Cantrill gave a wonderful presentation on operating system principles, wherein he highlighted that different operating systems focus on different principles. It is not that they ignore the principles that differ between their competitors, just that when the chips are down, they prioritise a core set. So what is that core set of principles for Go?


If you were going to say readability, hopefully I’ve provided you with an alternative.

Programs must be written for people to read, and only incidentally for machines to execute.

–Hal Abelson and Gerald Sussman. Structure and Interpretation of Computer Programs

Code is read many more times than it is written. A single piece of code will, over its lifetime, be read hundreds, maybe thousands of times. It will be read hundreds or thousands of times because it must be understood. Clarity is important because all software, not just Go programs, is written by people to be read by other people. The fact that software is also consumed by machines is secondary.

The most important skill for a programmer is the ability to effectively communicate ideas.

–Gastón Jorquera

Legal documents are double spaced to aide the reader, but to the layperson that does nothing to help them comprehend what they just read. Readability is a property of how easy it was to read the words on the screen. Clarity, on the other hand, is the answer to the question “did you understand what you just read?”.

If you’re writing a program for yourself, maybe it only has to run once, or you’re the only person who’ll ever see it, then do what ever works for you. But if this is a piece of software that more than one person will contribute to, or that will be used by people over a long enough time that requirements, features, or the environment it runs in may change, then your goal must be for your program to be maintainable.

The first step towards writing maintainable code is making sure intent of the code is clear.


The next principle is obviously simplicity. Some might argue the most important principle for any programming language, perhaps the most important principle full stop.

Why should we strive for simplicity? Why is important that Go programs be simple?

The ability to simplify means to eliminate the unnecessary so that the necessary may speak

–Hans Hofmann

We’ve all been in a situation where we say “I can’t understand this code”. We’ve all worked on programs we were scared to make a change because we worried that it’ll break another part of the program; a part you don’t understand and don’t know how to fix. 

This is complexity. Complexity turns reliable software in unreliable software. Complexity is what leads to unmaintainable software. Complexity is what kills software projects. Clarity and simplicity are interlocking forces that lead to maintainable software.


The last Go principle I want to highlight is productivity. Developer productivity boils down to this; how much time do you spend doing useful work verses waiting for your tools or hopelessly lost in a foreign code-base? Go programmers should feel that they can get a lot done with Go.

“I started another compilation, turned my chair around to face Robert, and started asking pointed questions. Before the compilation was done, we’d roped Ken in and had decided to do something.”

–Rob Pike, Less is Exponentially more

The joke goes that Go was designed while waiting for a C++ program to compile. Fast compilation is a key feature of Go and a key recruiting tool to attract new developers. While compilation speed remains a constant battleground, it is fair to say that compilations which take minutes in other languages, take seconds in Go. This helps Go developers feel as productive as their counterparts working in dynamic languages without the maintenance issues inherent in those languages.

Design is the art of arranging code to work today, and be changeable  forever.

–Sandi Metz

More fundamental to the question of developer productivity, Go programmers realise that code is written to be read and so place the act of reading code above the act of writing it. Go goes so far as to enforce, via tooling and custom, that all code be formatted in a specific style. This removes the friction of learning a project specific dialect and helps spot mistakes because they just look incorrect.

Go programmers don’t spend days debugging inscrutable compile errors. They don’t waste days with complicated build scripts or deploying code to production. And most importantly they don’t spend their time trying to understand what their coworker wrote.

Complexity is anything that makes software hard to understand or to modify.

–John Ousterhout, A Philosophy of Software Design

Something I know about each of you reading this post is you will eventually leave your current employer. Maybe you’ll be moving on to a new role, or perhaps a promotion, perhaps you’ll move cities, or follow your partner overseas. Whatever the reason, we must all consider the succession of the maintainership of the programs we create.

If we strive to write programs that are clear, programs that are simple, and to focus on the productivity of those working with us that will set all Go programmers in good stead.

Because if we don’t, as we move from job to job, we’ll leave behind programs which cannot be maintained. Programs which cannot be changed. Programs which are too hard to onboard new developers, and programs which feel like career digression for those that work on them.

If software cannot be maintained, then it will be rewritten; and that could be the last time your company invests in Go.