Category Archives: Go

A few bytes here, a few there, pretty soon you’re talking real memory

Today’s post comes from a recent Go pop quiz. Consider this benchmark fragment.1

func BenchmarkSortStrings(b *testing.B) {
        s := []string{"heart", "lungs", "brain", "kidneys", "pancreas"}
        b.ReportAllocs()
        for i := 0; i < b.N; i++ {
                sort.Strings(s)
        }
}

A convenience wrapper around sort.Sort(sort.StringSlice(s)), sort.Strings sorts the input in place, so it isn’t expected to allocate (or at least that’s what 43% of the tweeps who responded thought). However it turns out that, at least in recent versions of Go, each iteration of the benchmark causes one heap allocation. Why does this happen?

Interfaces, as all Go programmers should know, are implemented as a two word structure. Each interface value contains a field which holds the type of the interface’s contents, and a pointer to the interface’s contents.2

In pseudo Go code, an interface might look something like this:

type interface struct {
        // the ordinal number for the type of the value
        // assigned to the interface 
        type uintptr

        // (usually) a pointer to the value assigned to
        // the interface
        data uintptr
}

interface.data can hold one machine word–8 bytes in most cases–but a []string is 24 bytes; one word for the pointer to the slice’s underlying array; one word for the length; and one for the remaining capacity of the underlying array, so how does Go fit 24 bytes into 8? With the oldest trick in the book, indirection. s, a []string is 24 bytes, but *[]string–a pointer to a slice of strings–is only 8.

Escaping to the heap

To make the example a little more explicit, here’s the benchmark rewritten without the sort.Strings helper function:

func BenchmarkSortStrings(b *testing.B) {
        s := []string{"heart", "lungs", "brain", "kidneys", "pancreas"}
        b.ReportAllocs()
        for i := 0; i < b.N; i++ {
                var ss sort.StringSlice = s
                var si sort.Interface = ss // allocation
                sort.Sort(si)
        }
}

To make the interface magic work, the compiler rewrites the assignment as var si sort.Interface = &ss–the address of ss is assigned to the interface value.3 We now have a situation where the interface value holds a pointer to ss, but where does it point? Where in memory does ss live?

It appears that ss is moved to the heap, causing the allocation that the benchmark reports.

  Total:    296.01MB   296.01MB (flat, cum) 99.66%
      8            .          .           func BenchmarkSortStrings(b *testing.B) { 
      9            .          .           	s := []string{"heart", "lungs", "brain", "kidneys", "pancreas"} 
     10            .          .           	b.ReportAllocs() 
     11            .          .           	for i := 0; i < b.N; i++ { 
     12            .          .           		var ss sort.StringSlice = s 
     13     296.01MB   296.01MB           		var si sort.Interface = ss // allocation 
     14            .          .           		sort.Sort(si) 
     15            .          .           	} 
     16            .          .           } 

The allocation occurs because the compiler currently cannot convince itself that ss outlives si. The general attitude amongst Go compiler hackers seems to be that this could be improved, but that’s a discussion for another time. As it stands, ss is allocated on the heap. Thus the question becomes, how many bytes are allocated per iteration? Why don’t we ask the testing package.

% go test -bench=. sort_test.go
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-5650U CPU @ 2.20GHz
BenchmarkSortStrings-4          12591951                91.36 ns/op           24 B/op          1 allocs/op
PASS
ok      command-line-arguments  1.260s

Using Go 1.16beta1, on amd64, 24 bytes are allocated per operation.4 However, the previous Go version, on the same platform, consumes 32 bytes per operation

% go1.15 test -bench=. sort_test.go
goos: darwin
goarch: amd64
BenchmarkSortStrings-4          11453016                96.4 ns/op            32 B/op          1 allocs/op
PASS
ok      command-line-arguments  1.225s

This brings us, circuitously, to the subject of this post, a nifty quality of life improvement coming in Go 1.16. But before I can talk about it, I need to discuss size classes.

Size classes

To explain what a size class is, consider how a theoretical Go runtime could allocate 24 bytes on its heap. A simple way to do this would be keep track of all the memory allocated so far using a pointer to the last allocated byte on the heap. To allocate 24 bytes, the heap pointer is incremented by 24, and the previous value returned to the caller. As long as the code that asked for 24 bytes never writes beyond that mark this mechanism has no overhead. Sadly, in real life, memory allocators don’t just allocate memory, sometimes they have to free it.

Eventually the Go runtime will have to free those 24 bytes, but from the runtime’s point of view, the only thing it knows is the starting address it gave to the caller. It does not know how many bytes after that address were allocated. To permit deallocation, our hypothetical Go runtime allocator would have to record, for each allocation on the heap, its length. Where are the allocation for those lengths allocated? On the heap of course.

In our scenario, when the runtime wants to allocate memory, it could request a little more than it was asked for and use it to store the amount requested. For our slice example, when we asked for 24 bytes, we’d actually consume 24 bytes plus some overhead to store the number 24. How large is this overhead? It turns out the minimum amount that is practical is one word.5

To record a 24 byte allocation the overhead would 8 bytes. 25% is not great, but not terrible, and as the size of the allocation goes up, the overhead would become insignificant. However, what would happen if we wanted to store just one byte on the heap? The overhead would be eight times the amount of data we asked for! Is there are more efficient way to allocate small amounts on the heap?

Rather than storing the length alongside each allocation, what if all the thing of the same size were stored together? If all the 24 byte things are stored together, then the runtime would automatically know how large they are. All the runtime needs is a single bit to indicate if a 24 byte area is in use or not. In Go these areas are known as size classes, because all the things of the same size are stored together (think school class–all the students are the same grade–not a C++ class). When the runtime needs to allocate a small amount, it does so using the smallest size class that can accomodate the allocation.

Unlimited size classes for all

Now we know how size classes work, the obvious question is, where are they stored? Not surprisingly, the memory for a size class comes from the heap. To minimise overhead, the runtime allocates a larger amount from the heap (usually a multiple of the system page size) then dedicates that space for allocations of a single size. But, there’s a problem.

The pattern of allocating a large area to store things of the same size works well6 if there’s a fixed, preferably small, number of allocation sizes, but in a general purpose language programs can ask the runtime for an allocation of any size.7

For example, imagine asking the runtime for 9 bytes. 9 bytes is an uncommon size, so it’s likely that a new size class for things of size 9 would be required. As 9 byte things are uncommon, it’s likely the remainder of the allocation, 4kb or more, would be wasted. Because of this the set of possible size classes is fixed. If a size class of the exact amount is not available, the allocation is rounded up to the next size class. In our example 9 bytes might be allocated in the 12 byte size class. The overhead of 3 unused bytes is better than an entire size class allocation which goes mostly unused.

All together now

This is the final piece of the puzzle. Go 1.15 did not have a 24 byte size class, so the heap allocation of ss was allocated in the 32 byte size class. Thanks to the work of Martin Möhrmann Go 1.16 has a 24 byte size class, which is a perfect fit for slice values assigned to interfaces.

How to dump the GOSSAFUNC graph for a method

The Go compiler’s SSA backend contains a facility to produce HTML debugging output of the compilation phases. This post covers how to print the SSA output for function and methods.

Let’s start with a sample program which contains a function, a value method, and a pointer method:

package main

import (
    "fmt"
)

type Numbers struct {
    vals []int
}

func (n *Numbers) Add(v int) {
    n.vals = append(n.vals, v)
}

func (n Numbers) Average() float64 {
    sum := 0.0
    for _, num := range n.vals {
        sum += float64(num)
    }
    return sum / float64(len(n.vals))
}


func main() {
    var numbers Numbers
    numbers.Add(200)
    numbers.Add(43)
    numbers.Add(-6)
    fmt.Println(numbers.Average())
}

Control of the SSA debugging output is via the GOSSAFUNC environment variable. This variable takes the name of the function to dump. This is not the functions fully qualified name. For func main above the name of the function is main not main.main.

% env GOSSAFUNC=main go build
runtime
dumped SSA to ../../go/src/runtime/ssa.html
t
dumped SSA to ./ssa.html

In this example GOSSAFUNC=main matched both main.main and a function called runtime.main.1 This is a little unfortunate, but in practice probably not a big deal as, if you’re performance tuning your code, it won’t be in a giant spaghetti blob in func main.

What is more likely is your code will be in a method, so you’ve probably landed on this post looking for the correct incantation to dump the SSA output for a method.

To print the SSA debug for the pointer method func (n *Numbers) Add, the equivalent function name is(*Numbers).Add:2

% env "GOSSAFUNC=(*Numbers).Add" go build
t
dumped SSA to ./ssa.html

To print the SSA debug for a value method func (n Numbers) Average, the equivalent function name is (*Numbers).Average even though this is a value method:

% env "GOSSAFUNC=(*Numbers).Average" go build
t
dumped SSA to ./ssa.html

Diamond interface composition in Go 1.14

Per the overlapping interfaces proposal, Go 1.14 now permits embedding of interfaces with overlapping method sets. This is a brief post explain what this change means:

Let’s start with the definition of the three key interfaces from the io package; io.Reader, io.Writer, and io.Closer:

package io

type Reader interface {
    Read([]byte) (int, error)
}

type Writer interface {
    Write([]byte) (int, error)
}

type Closer interface {
    Close() error
}    

Just as embedding a type inside a struct allows the embedded type’s fields and methods to be accessed as if it were declared on the embedding type1, the process is true for interfaces. Thus there is no difference between explicitly declaring

type ReadCloser interface {
    Read([]byte) (int, error)
    Close() error
}

and using embedding to compose the interface

type ReadCloser interface {
    Reader
    Closer
}

You can even mix and match

type WriteCloser interface {
    Write([]byte) (int, error)
    Closer
}

However, prior to Go 1.14, if you continued to compose interface declarations in this manner you would likely find that something like this,

type ReadWriteCloser interface {
    ReadCloser
    WriterCloser
}

would fail to compile

% go build interfaces.go
command-line-arguments
./interfaces.go:27:2: duplicate method Close

Fortunately, with Go 1.14 this is no longer a limitation, thus solving problems that typically occur with diamond-shaped embedding graphs.

However, there is a catch that I ran into attempting to demonstrate this feature to the local user group–this feature is only enabled when the Go compiler uses the 1.14 (or later) spec.

As near as I can make out the rules for which version of the Go spec is used during compilation appear to be:

  1. If your source code is stored inside GOPATH (or you have disabled modules with GO111MODULE=off) then the version of the Go spec used to compile with matches the version of the compiler you are using. Said another way, if you have Go 1.13 installed, your Go version is 1.13. If you have Go 1.14 installed, your version is 1.14. No surprises here.
  2. If your source code is stored outside GOPATH (or you have forced modules on with GO111MODULE=on) then the go tool will take the Go version from the go.mod file.
  3. If there is no Go version listed in go.mod then the version of the spec will be the version of Go installed. This is identical to point 1.
  4. If you are in module mode, either by being outside GOPATH or with GO111MODULE=on, but there is no go.mod file in the current, or any parent, directory then the version of the Go spec used to compile your code defaults to Go 1.13.

The last point caught me out.

Fatih’s question

A few days ago Fatih posted this question on twitter.

I’m going to attempt to give my answer, however to do that I need to apply some simplifications as my previous attempts to answer it involved a lot of phrases like a pointer to a pointer, and other unhelpful waffling. Hopefully my simplified answer can be useful in building a mental framework to answer Fatih’s original question.

Restating the question

Fatih’s original tweet showed four different variations of json.Unmarshal. I’m going to focus on the last two, which I’ll rewrite a little:

package main

import (
    "encoding/json"
    "fmt"
)

type Result struct {
    Foo string `json:"foo"`
}

func main() {
    content := []byte(`{"foo": "bar"}`)
    var result1, result2 *Result

    err := json.Unmarshal(content, &result1)
    fmt.Println(result1, err) // &{bar} <nil>

    err = json.Unmarshal(content, result2)
    fmt.Println(result2, err) // <nil> json: Unmarshal(nil *main.Result)
}

Restated in words, result1 and result2 are the same type; *Result. Decoding into result1 works as expected, whereas decoding into result2 causes the json package to complain that the value passed to Unmarshal is nil. However, both values were declared without an initialiser so both would have taken on the type’s zero value, nil.

Eagle eyed readers will have spotted that the reason for the difference is the first invocation is passed &result1, while the second is passed result2, but this explanation is unsatisfactory because the documentation for json.Unmarshal states:

Unmarshal parses the JSON-encoded data and stores the result in the value pointed to by v. If v is nil or not a pointer, Unmarshal returns an InvalidUnmarshalError.

Which is confusing because result1 and result2 are pointers. Furthermore, without initialisation, both are nil. Now, the documentation is correct (as you’d expect from a package that has been hammered on for a decade), but explaining why takes a little more investigation.

Functions receive a copy of their arguments

Every assignment in Go is a copy, this includes function arguments and return values.

package main

import (
    "fmt"
)

func increment(v int) {
    v++
}

func main() {
    v := 1
    increment(v)
    fmt.Println(v) // 1
}

In this example, increment is operating on a copy of main‘s v. This is because the v declared in main and increment‘s v parameter have different addresses in memory. Thus changes to increment‘s v cannot affect the contents of main‘s v.

package main

import (
    "fmt"
)

func increment(v *int) {
    *v++
}

func main() {
    v := 1
    increment(&v)
    fmt.Println(v) // 2
}

If we wanted to write increment in a way that it could affect the contents of its caller we would need to pass a reference, a pointer, to main.v.1 This example demonstrates why json.Unmarshal needs a pointer to the value to decode JSON into.

Pointers to pointers

Returning to the original question, both result1 and result2 are declared as *Result, that is, pointers to a Result value. We established that you have to pass the address of caller’s value to json.Unmarshal otherwise it won’t be able to alter the contents of the caller’s value. Why then must we pass the address of result1, a **Result, a pointer to a pointer to a Result, for the operation to succeed.

To explain this another detour is required. Consider this code:

package main

import (
    "encoding/json"
    "fmt"
)

type Result struct {
    Foo *string `json:"foo"`
}

func main() {
    content := []byte(`{"foo": "bar"}`)
    var result1 *Result

    err := json.Unmarshal(content, &result1)
    fmt.Printf("%#v %v", result1, err) // &main.Result{Foo:(*string)(0xc0000102f0)} <nil>
}

In this example Result contains a pointer typed field, Foo *string. During JSON decoding Unmarshal allocated a new string value, stored the value bar in it, then placed the address of the string in Result.Foo. This behaviour is quite handy as it frees the caller from having to initialise Result.Foo and makes it easier to detect when a field was not initialised because the JSON did not contain a value. Beyond the convenience this offers for simple examples it would be prohibitively difficult for the caller to properly initialise all the reference type fields in a structure before decoding unknown JSON without first inspecting the incoming JSON which itself may be problematic if the input is coming from an io.Reader without the ability to rewind the input.

To unmarshal JSON into a pointer, Unmarshal first handles the case of the JSON being the JSON literal null. In that case, Unmarshal sets the pointer to nil. Otherwise, Unmarshal unmarshals the JSON into the value pointed at by the pointer. If the pointer is nil, Unmarshal allocates a new value for it to point to.

json.Unmarshal‘s handling of pointer fields is clearly documented, and works as you would expect, allocating a new value whenever there is a need to decode into a pointer shaped field. It is this behaviour that gives us a hint to what is happening in the original example.

We’ve seen that when json.Unmarshal encounters a field which points to nil it will allocate a new value of the correct type and assign its address the field before proceeding. Not only is does behaviour is applied recursively–for example in the case of a complex structure which contains pointers to other structures–but it also applies to the value passed to Unmarshal.

package main

import (
    "encoding/json"
    "fmt"
)

func main() {
    content := []byte(`1`)
    var result *int

    err := json.Unmarshal(content, &result)
    fmt.Println(*result, err) // 1 <nil>
}

In this example result is not a struct, but a simple *int which, lacking an initialiser, defaults to nil. After JSON decoding, result now points to an int with the value 1.

Putting the pieces together

Now I think I’m ready to take a shot at answering Fatih’s question.

json.Unmarshal requires the address of the variable you want to decode into, otherwise it would decode into a temporary copy which would be discard on return. Normally this is done by declaring a value, then passing its address, or explicitly initialising the the value

var result1 Result
err := json.Unmarshal(content, &result1) // this is fine

var result2 = new(Result)
err = json.Unmarshal(content, result2) // and this

var result3 = &Result{}
err = json.Unmarshal(content, result3) // this is also fine

In all three cases the address that the *Result points too is not nil, it points to initialised memory that json.Unmarshal decodes into.

Now consider what happens when json.Unmarshal encounters this

var result4 *Result
err = json.Unmarshal(content, result4) // err json: Unmarshal(nil *main.Result)

result2, result3, and the expression &result1 point to a Result. However result4, even though it has the same type as the previous three, does not point to initialised memory, it points to nil. Thus, according to the examples we saw previously, before json.Unmarshal can decode into it, the memory result4 points too must be initialised.

However, because each function receives a copy of its arguments, the caller’s result4 variable and the copy inside json.Unmarshal are unique. json.Unmarshal can allocate a new Result value and decode into it, but it cannot alter result4 to point to this new value because it was not provided with a reference to result4, only a copy of its contents.

Ensmallening Go binaries by prohibiting comparisons

Conventional wisdom dictates that the larger the number of types declared in a Go program, the larger the resulting binary. Intuitively this makes sense, after all, what’s the point in defining a bunch of types if you’re not going to write code that operates on them. However, part of the job of a linker is to detect functions which are not referenced by a program–say they are part of a library of which only a subset of functionality is used–and remove them from the final output. Yet, the adage mo’ types, mo’ binary holds true for the majority of Go programs.

In this post I’ll dig into what equality, in the context of a Go program, means and why changes like this have a measurable impact on the size of a Go program.

Defining equality between two values

The Go spec defines the concepts of assignability and equality. Assignabiity is the act of assigning a value to an identifier. Not everything which is declared can be assigned, for example constants and functions. Equality is the act of comparing two identifies by asking are their contents the same?

Being a strongly typed language, the notion of sameness is fundamentally rooted in the identifier’s type. Two things can only be the same if they are of the same type. Beyond that, the type of the values defines how they are compared.

For example, integers are compared arithmetically. For pointer types, equality is determining if the addresses they point too are the same. Reference types like maps and channels, like pointers, are considered to be the same if they have the same address.

These are all examples of bitwise equality, that is, if the bit patterns of the memory that value occupies are the same, those values are equal. This is known as memcmp, short for memory comparison, as equality is defined by comparing the contents of two areas of memory.

Hold on to this idea, I’ll come back to in a second.

Struct equality

Beyond scalar types like integers, floats, and pointers is the realm of compound types; structs. All structs are laid out in memory in program order, thus this declaration:

type S struct {
    a, b, c, d int64
}

will consume 32 bytes of memory; 8 bytes for a, then 8 bytes for b, and so on. The spec says that struct values are comparable if all their fields are comparable. Thus two structs are equal iff each of their fields are equal.

a := S{1, 2, 3, 4}
b := S{1, 2, 3, 4}
fmt.Println(a == b) // prints true

Under the hood the compiler uses memcmp to compare the 32 bytes of a and b.

Padding and alignment

However the simplistic bitwise comparison strategy will fail in situations like this:

type S struct {
    a byte
    b uint64
    c int16
    d uint32
}

func main()
    a := S{1, 2, 3, 4}
    b := S{1, 2, 3, 4}
    fmt.Println(a == b) // prints true
}

The code compiles, the comparison is still true, but under the hood the compiler cannot rely on comparing the bit patterns of a and b because the structure contains padding.

Go requires each field in a struct to be naturally aligned. 2 byte values must start on an even address, four byte values on an address divisible by 4, and so on1. The compiler inserts padding to ensure the fields are aligned to according to their type and the underlying platform. In effect, after padding, this is what the compiler sees2:

type S struct {
    a byte
    _ [7]byte // padding
    b uint64
    c int16
    _ [2]int16 // padding
    d uint32
}

Padding exists to ensure the correct field alignments, and while it does take up space in memory, the contents of those padding bytes are unknown. You might assume that, being Go, the padding bytes are always zero, but it turns out that’s not the case–the contents of padding bytes are simply not defined. Because they’re not defined to always be a certain value, doing a bitwise comparison may return false because the nine bytes of padding spread throughout the 24 bytes of S are may not be the same.

The Go compiler solves this problem by generating what is known as an equality function. In this case S‘s equality function knows how to compare two values of type S by comparing only the fields in the function while skipping over the padding.

Type algorithms

Phew, that was a lot of setup to illustrate why, for each type defined in a Go program, the compiler may generate several supporting functions, known inside the compiler as the type’s algorithms. In addition to the equality function the compiler will generate a hash function if the type is used as a map key. Like the equality function, the hash function must consider factors like padding when computing its result to ensure it remains stable.

It turns out that it can be hard, and sometimes non obvious, to intuit when the compiler will generate these functions–it’s more than you’d expect–and it can be hard for the linker to eliminate the ones that are not needed as reflection often causes the linker to be more conservative when trimming types.

Reducing binary size by prohibiting comparisons

Now we’re at a point to explain Brad’s change. By adding an incomparable field 3 to the type, the resulting struct is by extension incomparable, thus forcing the compiler to elide the generation of eq and hash algorithms, short circuiting the linkers elimination of those types and, in practice, reducing the size of the final binary. As an example of this technique, this program:

package main

import "fmt"

func main() {
    type t struct {
        // _ [0][]byte uncomment to prevent comparison
        a byte
        b uint16
        c int32
        d uint64
    }
    var a t
    fmt.Println(a)
}

when compiled with Go 1.14.2 (darwin/amd64), decreased from 2174088 to 2174056, a saving of 32 bytes. In isolation this 32 byte saving may seem like small beer, but consider that equality and hash functions can be generated for every type in the transitive closure of your program and all its dependencies, and the size of these functions varies depending on the size of the type and its complexity, prohibiting them can have a sizeable impact on the final binary over and above the old saw of -ldflags="-s -w".

The bottom line, if you don’t wish to make your types comparable, a hack like this enforces it at the source level while contributing to a small reduction in the size of your binary.


Addendum: thanks to Brad’s prodding, Go 1.15 already has a bunch of improvements by Cherry Zhang and Keith Randall that fix the most egregious of the failures to eliminate unnecessary equality and hash functions (although I suspect it was also to avoid the proliferation of this class of CLs).

Mid-stack inlining in Go

In the previous post I discussed how leaf inlining allows the Go compiler to reduce the overhead of function calls and extend optimisation opportunities across function boundaries. In this post I’ll discuss the limits of inlining and leaf vs mid-stack inlining.

The limits of inlining

Inlining a function into its caller removes the call’s overhead and increases the opportunity for the compiler to apply additional optimisations so the question should be asked, if some inlining is good, would more be better, why not inline as much as possible?

Inlining trades possibly larger program sizes for potentially faster execution time. The main reason to limit inlining is creating many inlined copies of a function can increase compile time and result in larger binaries for marginal gain. Even taking into account the opportunities for further optimisation, aggressive inlining tends to increase the size of, and the time too compile, the resulting binary.

Inlining works best for small functions that do relatively little work compared to the overhead of calling them. As the size of a function grows, the time saved avoiding the call’s overhead diminishes relative to the work done inside the function. Larger functions tend to be more complex, thus the benefits of optimising their inlined forms vs in situ are reduced.

Inlining budget

During compilation each function’s inlineabilty is calculated using what is known as the inlining budget1. The cost calculation can be tricky to internalise but is broadly one unit per node in the AST for simple things like unary and binary operations but can be higher for complex operations like make. Consider this example:

package main

func small() string {
    s := "hello, " + "world!"
    return s
}

func large() string {
    s := "a"
    s += "b"
    s += "c"
    s += "d"
    s += "e"
    s += "f"
    s += "g"
    s += "h"
    s += "i"
    s += "j"
    s += "k"
    s += "l"
    s += "m"
    s += "n"
    s += "o"
    s += "p"
    s += "q"
    s += "r"
    s += "s"
    s += "t"
    s += "u"
    s += "v"
    s += "w"
    s += "x"
    s += "y"
    s += "z"
    return s
}

func main() {
    small()
    large()
}

Compiling this function with -gcflags=-m=2 allows us to see the cost the compiler assigns to each function.

% go build -gcflags=-m=2 inl.go 
# command-line-arguments
./inl.go:3:6: can inline small with cost 7 as: func() string { s := "hello, world!"; return s }
./inl.go:8:6: cannot inline large: function too complex: cost 82 exceeds budget 80
./inl.go:38:6: can inline main with cost 68 as: func() { small(); large() }
./inl.go:39:7: inlining call to small func() string { s := "hello, world!"; return s }

The compiler determined that func small() can be inlined due to its cost of 7. func large() was determined to be too expensive. func main()has been marked as eligible and assigned a cost of 68; 7 from the body of small, 57 from the function call to small and the remainder in its own overhead.

The inlining budget can be controlled to some degree with the -gcflag=-l flag. Currently the values that apply are:

  • -gcflags=-l=0 is the default level of inlining.
  • -gcflags=-l (or -gcflags=-l=1) disables inlining.
  • -gcflags=-l=2 and -gcflags=-l=3 are currently unused and have no effect over -gcflags=-l=0
  • -gcflags=-l=4 reduces the cost for inlining non-leaf functions and calls through interfaces.2

Hairy optimisations

Some functions with a relatively low inlining cost may be ineligible because of their complexity. This is known as the function’s hairiness as the semantics of some operations are hard to reason about once inlined, for example recover, break. Others, like select and go, involve co-ordination with the runtime so the extra effort of inlining doesn’t pay for itself.

The list of hairy statements also includes things like for and range which don’t have an inherently large cost, but simply haven’t been optimised yet.

Mid stack inlining

Historically the Go compiler only performed leaf inlining–only functions which did not call other functions were eligible. In the context of the hairiness discussion previously, a function call would disqualify the function from being inlined.

Enter mid stack inlining which, as its name implies, allows functions in the middle of a call stack to be inlined without requiring everything below them to be eligible. Mid stack inlining was introduced by David Lazar in Go 1.9 and improved in subsequent releases. This presentation goes into some of the difficulties with retaining the behaviour of stack traces and runtime.Callers in code paths that had been heavily inlined.

We see an example of mid-stack inlining in the previous example. After inlining, func main() contains the body of func small() and a call to func large(), thus it is considered a non-leaf function. Historically this would have prevented it from being further inlined even though its combined cost was less than the inlining budget.

The primary use case for mid stack inlining is to reduce the overhead of a path through the call stack. Consider this example:

package main

import (
    "fmt"
    "strconv"
)

type Rectangle struct {}

//go:noinline
func (r *Rectangle) Height() int {
    h, _ := strconv.ParseInt("7", 10, 0)
    return int(h)
}

func (r *Rectangle) Width() int {
    return 6
}

func (r *Rectangle) Area() int { return r.Height() * r.Width() }

func main() {
    var r Rectangle
    fmt.Println(r.Area())
}

In this example r.Area() is a simple function which calls two others. r.Width() can be inlined while r.Height(), simulated here with the //go:noinline annotation, cannot. 3

% go build -gcflags='-m=2' square.go                                                                                                          
# command-line-arguments
./square.go:12:6: cannot inline (*Rectangle).Height: marked go:noinline                                                                               
./square.go:17:6: can inline (*Rectangle).Width with cost 2 as: method(*Rectangle) func() int { return 6 }
./square.go:21:6: can inline (*Rectangle).Area with cost 67 as: method(*Rectangle) func() int { return r.Height() * r.Width() }                       ./square.go:21:61: inlining call to (*Rectangle).Width method(*Rectangle) func() int { return 6 }                                                     
./square.go:23:6: cannot inline main: function too complex: cost 150 exceeds budget 80                        
./square.go:25:20: inlining call to (*Rectangle).Area method(*Rectangle) func() int { return r.Height() * r.Width() }
./square.go:25:20: inlining call to (*Rectangle).Width method(*Rectangle) func() int { return 6 }

As the multiplication performed by r.Area() is cheap compared to the overhead of calling it, inlining r.Area()‘s single expression is a net win even if its downstream caller to r.Height() remains ineligible.

Fast path inlining

The most startling example of the power of mid-stack inlining comes from 2019 when Carlo Alberto Ferraris improved the performance of sync.Mutex.Lock() by allowing the fast path of the lock–the uncontended case–to be inlined into its caller. Prior to this change sync.Mutex.Lock() was a large function containing many hairy conditions which made it ineligible to be inlined. Even in the case where the lock was available, the caller had to pay the overhead of calling sync.Mutex.Lock().

Carlo’s change split sync.Mutex.Lock() into two functions (a process he dubbed outlining). The outer sync.Mutex.Lock() method now calls sync/atomic.CompareAndSwapInt32() and returns to the caller immediately if the CAS succeeds. If not, the function falls through to sync.Mutex.lockSlow() which handles the slow path required to register interest on the lock and park the goroutine.4

% go build -gcflags='-m=2 -l=0' sync 2>&1 | grep '(*Mutex).Lock'
../go/src/sync/mutex.go:72:6: can inline (*Mutex).Lock with cost 69 as: method(*Mutex) func() { if "sync/atomic".CompareAndSwapInt32(&m.state, 0, mutexLocked) { if race.Enabled {  }; return  }; m.lockSlow() }

By splitting the function into an easily inlineable outer function, falling through to a complex inner function to handle the slow path Carlo’s combined mid stack inlining and the compiler’s support for intrinsic operations to reduce the cost of an uncontended lock by 14%. Then he repeated the trick for an additional 9% saving in sync.RWMutex.Unlock().

Inlining optimisations in Go

This is a post about how the Go compiler implements inlining and how this optimisation affects your Go code.

n.b. This article focuses on gc, the de facto Go compiler from golang.org. The concepts discussed apply broadly to other Go compilers like gccgo and tinygo but may differ in implementation and efficacy.

What is inlining?

Inlining is the act of combining smaller functions into their respective callers. In the early days of computing this optimisation was typically performed by hand. Nowadays inlining is one of a class of fundamental optimisations performed automatically during the compilation process.

Why is inlining important?

Inlining is important for two reasons. The first is it removes the overhead of the function call itself. The second is it permits the compiler to more effectively apply other optimisation strategies.

Function call overhead

Calling a function1 in any language carries a cost. There are the overheads of marshalling parameters into registers or onto the stack (depending on the ABI) and reversing the process on return. Invoking a function call involves jumping the program counter from one point in the instruction stream to another which can cause a pipeline stall. Once inside the function there is usually some preamble required to prepare a new stack frame for the function to execute and a similar epilogue needed to retire the frame before returning to the caller.

In Go a function call carries additional costs to support dynamic stack growth. On entry the amount of stack space available to the goroutine is compared to the amount required for the function. If insufficient stack space is available, the preamble jumps into the runtime logic that grows the stack by copying it to a new, larger, location. Once this is done the runtime jumps back to the start of the original function, the stack check is performed again, which now passes, and the call continues. In this way goroutines can start with a small stack allocation which grows only when needed.2

This check is cheap–only a few instructions–and because goroutine stacks grows geometrically the check rarely fails. Thus, the branch prediction unit inside a modern processor can hide the cost of the stack check by assuming it will always be successful. In the case where the processor mis-predicts the stack check and has to discard the work done while it was executing speculatively, the cost of the pipeline stall is relatively small compared to the cost of the work needed for the runtime to grow a goroutines stack.

While the overhead of the generic and Go specific components of each function call are well optimised by modern processors using speculative execution techniques, those overheads cannot be entirely eliminated, thus each function call carries with it a performance cost over and above the time it takes to perform useful work. As a function call’s overhead is fixed, smaller functions pay a larger cost relative to larger ones because they tend to do less useful work per invocation.

The solution to eliminating these overheads must therefore be to eliminate the function call itself, which the Go compiler does, under certain conditions, by replacing the call to a function with the contents of the function. This is known as inlining because it brings the body of the function in line with its caller.

Improved optimisation opportunities

Dr. Cliff Click describes inlining as the optimisation performed by modern compilers as it forms the basis for optimisations like constant propagation and dead code elimination. In effect, inlining allows the compiler to see further, allowing it to observe, in the context that a particular function is being called, logic that can be further simplified or eliminated entirely. As inlining can be applied recursively optimisation decisions can be made not only in the context of each individual function, but also applied to the chain of functions in a call path.

Inlining in action

The effects of inlining can be demonstrated with this small example

package main

import "testing"

//go:noinline
func max(a, b int) int {
    if a > b {
        return a
    }
    return b
}

var Result int

func BenchmarkMax(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        r = max(-1, i)
    }
    Result = r
}

Running this benchmark gives the following result:3

% go test -bench=. 
BenchmarkMax-4   530687617         2.24 ns/op

The cost of max(-1, i) is around 2.24 nanoseconds on my 2015 MacBook Air. Now let’s remove the //go:noinline pragma and see the result:

% go test -bench=. 
BenchmarkMax-4   1000000000         0.514 ns/op

From 2.24 ns to 0.51 ns, or according to benchstat, a 78% improvement.

% benchstat {old,new}.txt
name   old time/op  new time/op  delta
Max-4  2.21ns ± 1%  0.49ns ± 6%  -77.96%  (p=0.000 n=18+19)

Where did these improvements come from?

First, the removal of the function call and associated preamble4 was a major contributor. Pulling the contents of max into its caller reduced the number of instructions executed by the processor and eliminated several branches.

Now the contents of max are visible to the compiler as it optimises BenchmarkMax it can make some additional improvements. Consider that once max is inlined, this is what the body of BenchmarkMax looks like to the compiler:

func BenchmarkMax(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        if -1 > i {
            r = -1
        } else {
            r = i
        }
    }
    Result = r
}

Running the benchmark again we see our manually inlined version performs as well as the version inlined by the compiler

% benchstat {old,new}.txt
name   old time/op  new time/op  delta
Max-4  2.21ns ± 1%  0.48ns ± 3%  -78.14%  (p=0.000 n=18+18)

Now the compiler has access to the result of inlining max into BenchmarkMax it can apply optimisation passes which were not possible before. For example, the compiler has noted that i is initialised to 0 and only incremented so any comparison with i can assume i will never be negative. Thus, the condition -1 > i will never be true.5

Having proved that -1 > i will never be true, the compiler can simplify the code to

func BenchmarkMax(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        if false {
            r = -1
        } else {
            r = i
        }
    }
    Result = r
}

and because the branch is now a constant, the compiler can eliminate the unreachable path leaving it with

func BenchmarkMax(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        r = i
    }
    Result = r
}

Thus, through inlining and the optimisations it unlocks, the compiler has reduced the expression r = max(-1, i) to simply r = i.

The limits of inlining

In this article I’ve discussed, so called, leaf inlining; the act of inlining a function at the bottom of a call stack into its direct caller. Inlining is a recursive process, once a function has been inlined into its caller, the compiler may inline the resulting code into its caller, as so on. For example, this code

func BenchmarkMaxMaxMax(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        r = max(max(-1, i), max(0, i))
    }
    Result = r
}

runs as fast as the previous example as the compiler is able to repeatedly apply the optimisations outlined above to reduce the code to the same r = i expression.

In the next article I’ll discuss an alternative inlining strategy when the Go compiler wishes to inline a function in the middle of a call stack. Finally I’ll discuss the limits that the compiler is prepared to go to to inline code, and which Go constructs are currently beyond its capability.

go test -v streaming output

The testing package is one of my favourite packages in the Go standard library, not just because of its low noise approach to unit testing, but, over the lifetime of Go, it has received a steady stream of quality of life improvements driven by real world usage.

The most recent example of this is, in Go 1.14, go test -v will stream t.Log output as it happens, rather than hoarding it til the end of the test run. Here’s an example;

package main

import (
	"fmt"
	"testing"
	"time"
)

func TestLogStreaming(t *testing.T) {
	for i := 0; i < 5; i++ {
		time.Sleep(300 * time.Millisecond)
		fmt.Println("fmt.Println:", i)
		t.Log("t.Log:", i)
	}
}

Note: Calling fmt.Println inside a test is generally considered a no no as it bypasses the testing package’s output buffering irrespective of the -v flag. However, for this example, it‘s necessary to demonstrate the streaming t.Log change.

% go1.13 test -v tlog_test.go
=== RUN   TestLogStreaming
fmt.Println: 0
fmt.Println: 1
fmt.Println: 2
fmt.Println: 3
fmt.Println: 4
--- PASS: TestLogStreaming (1.52s)
    tlog_test.go:13: t.Log: 0
    tlog_test.go:13: t.Log: 1
    tlog_test.go:13: t.Log: 2
    tlog_test.go:13: t.Log: 3
    tlog_test.go:13: t.Log: 4
PASS
ok      command-line-arguments  1.971s

Under Go 1.13 and earlier the fmt.Println lines output immediately. t.Log lines are buffered and are printed after the test completes.

% go1.14 test -v tlog_test.go
=== RUN   TestLogStreaming
fmt.Println: 0
    TestLogStreaming: tlog_test.go:13: t.Log: 0
fmt.Println: 1
    TestLogStreaming: tlog_test.go:13: t.Log: 1
fmt.Println: 2
    TestLogStreaming: tlog_test.go:13: t.Log: 2
fmt.Println: 3
    TestLogStreaming: tlog_test.go:13: t.Log: 3
fmt.Println: 4
    TestLogStreaming: tlog_test.go:13: t.Log: 4
--- PASS: TestLogStreaming (1.51s)
PASS
ok      command-line-arguments  1.809s

Under Go 1.14 the fmt.Println and t.Log lines are interleaved, rather than waiting for the test to complete, demonstrating that test output is streamed when go test -v is used.

This is a great quality of life improvement for integration style tests that often retry for long periods when the test is failing. Streaming t.Log output will help Gophers debug those test failures without having to wait until the entire test times out to receive their output.

Are large slices more expensive than smaller ones?

Programmers have a tendency to be superstitious. Particularly, when a programmer hears that copies are expensive, they start to see them everywhere, especially when they learn that, in Go, every assignment is a copy.

Consider this code; x is three orders of magnitude larger than y, is the assignment of x to a more expensive than the assignment of y to b?

func f() {
       x, y := make([]byte, 9000), make([]byte, 9)
       a := x
       b := y
       // ...
 } 

The answer is; no. x and y have the same type, []byte, that is, a slice of bytes. As both variables have the same type, their assignment involves copying the same amount of data. Both assignments have the same cost.

All slices are the same size; three machine words (three uintptrs). The first word in the slice is a pointer to the slice’s backing array, the storage for the slice, the second word is the slice’s length, and the third is the capacity. Assigning one slice variable to another copies just three machine words.

Further reading: Go slices: usage and internals (blog.golang.org)

The Zen of Go

This article was derived from my GopherCon Israel 2020 presentation. It’s also quite long. If you’d prefer a shorter version, head over to the-zen-of-go.netlify.com.

A recording of the presentation is available on YouTube.


How should I write good code?

Something that I’ve been thinking about a lot recently, when reflecting on the body of my own work, is a common subtitle, how should I write good code? Given nobody actively seeks to write bad code, this leads to the question; how do you know when you’ve written good Go code?

If there’s a continuum between good and bad, how to do we know what the good parts are? What are its properties, its attributes, its hallmarks, its patterns, and its idioms?

Idiomatic Go

Which brings me to idiomatic Go. To say that something is idiomatic is to say that it follows the style of the time. If something is not idiomatic, it is not following the prevailing style. It is unfashionable.

More importantly, to say to someone that their code is not idiomatic does not explain why it’s not idiomatic. Why is this? Like all truths, the answer is found in the dictionary.

idiom (noun): a group of words established by usage as having a meaning not deducible from those of the individual words.

Idioms are hallmarks of shared values. Idiomatic Go is not something you learn from a book, it’s something that you acquire by being part of a community.

My concern with the mantra of idiomatic Go is, in many ways, it can be exclusionary. It’s saying “you can’t sit with us.” After all, isn’t that what we mean when critique of someone’s work as non-idiomatic? They didn’t do It right. It doesn’t look right. It doesn’t follow the style of time.

I offer that idiomatic Go is not a suitable mechanism for teaching how to write good Go code because it is defined, fundamentally, by telling someone they did it wrong. Wouldn’t it be better if the advice we gave didn’t alienate the author right at the point they were most willing to accept it?

Proverbs

Stepping away problematic idioms, what other cultural artefacts do Gophers have? Perhaps we can turn to Rob Pike’s wonderful Go Proverbs. Are these suitable teaching tools? Will these tell newcomers how to write good Go code?

In general, I don’t think so. This is not to dismiss Pike’s work, it is just that the Go Proverbs, like Segoe Kensaku’s original, are observations, not statements of value. Again, the dictionary comes to the rescue:

proverb (noun): a short, well-known pithy saying, stating a general truth or piece of advice.

The goal of the Go Proverbs are to reveal a deeper truth about the design of the language, but how useful is advice like the empty interface says nothing to a novice from a language that doesn’t have structural typing?

It’s important to recognise that, in a growing community, at any time the people learning Go far outnumber those who claim to have mastered the language. Thus proverbs are perhaps not the best teaching tool in this scenario.

Engineering Values

Dan Luu found an old presentation by Mark Lucovsky about the engineering culture of the windows team around the windows NT-windows 2000 timeframe. The reason I mention it is Lukovsky’s description of a culture as a common way of evaluating designs and making tradeoffs.

There are many ways of discussing culture, but with respect to an engineering culture Lucovsky’s description is apt. The central idea is values guide decisions in an unknown design space. The values of the NT team were; portability, reliability, security, and extensibility. Engineering values are, crudely translated, the way things are done around here.

Go’s values

What are the explicit values of Go? What are the core beliefs or philosophy that define the way a Go programmer interprets the world? How are they promulgated? How are they taught? How are they enforced? How do they change over time?

How will you, as a newly minted Go programmer, inculcate the engineering values of Go? Or, how will you, a seasoned Go professional promulgate your values to a future generations? And just so we’re clear, this process of knowledge transfer is not optional. Without new blood and new ideas, our community become myopic and wither.

The values of other languages

To set the scene for what I’m getting at we can look to other languages we see examples of their engineering values.

For example, C++ (and by extension Rust) believe that a programmer should not have to pay for a feature they do not use. If a program does not use some computationally expensive feature of the language, then it shouldn’t be forced to shoulder the cost of that feature. This value extends from the language, to its standard library, and is used as a yardstick for judging the design of all code written in C++.

In Java, and Ruby, and Smalltalk, the core value that everything is an object drives the design of programs around message passing, information hiding, and polymorphism. Designs that shoehorn a procedural style, or even a functional style, into these languages are considered to be wrong–or as Gophers would say, non idiomatic.

Turning to our own community, what are the engineering values that bind Go programmers? Discourse in our community is often fractious, so deriving a set of values from first principles would be a formidable challenge. Consensus is critical, but exponentially more difficult as the number of contributors to the discussion increases. But what if someone had done the hard work for us.

The Zen of Python Go

Several decades ago Tim Peters sat down and penned PEP-20, the Zen of Python. Peters’ attempted to document the engineering values that he saw Guido van Rossum apply in his role as BDFL for Python.

For the remainder of this article, I’m going to look towards the Zen of Python and ask, is there anything that can inform the engineering values of Go programmers?

A good package starts with a good name

Let’s start with something spicy,

“Namespaces are one honking great idea–let’s do more of those!”

The Zen of Python, Item 19

This is pretty unequivocal, Python programmers should use namespaces. Lots of them.

In Go parlance a namespace is a package. I doubt there is any question that grouping things into packages is good for design and potentially reuse. But there might be some confusion, especially if you’re coming with a decade of experience in another language, about the right way to do this.

In Go each package should have a purpose, and the best way to know a package’s purpose is by its name—a noun. A package’s name describes what it provides. So too reinterpret Peters’ words, every Go package should have a single purpose.

This is not a new idea, I’ve been saying this a while, but why should you do this rather than approach where packages are used for fine grained taxonomy? Why, because change.

“Design is the art of arranging code to work today, and be changeable forever.”

Sandi Metz

Change is the name of the game we’re in. What we do as programmers is manage change. When we do that well we call it design, or architecture. When we do it badly we call it technical debt, or legacy code.

If you are writing a program that works perfectly, one time, for one fixed set of inputs then nobody cares if the code is good or bad because ultimately the output of the program is all the business cares about.

But this is never true. Software has bugs, requirements change, inputs change, and very few programs are written solely to be executed once, thus your program will change over time. Maybe it’s you who’ll be tasked with this, more likely it will be someone else, but someone has to change that code. Someone has to maintain that code.

So, how can we make it easy to for programs to change? Interfaces everywhere? Make everything mockable? Pernicious dependency injection? Well, maybe, for some classes of programs, but not many, those techniques will be useful. However, for the majority of programs, designing something to be flexible up front is over engineering.

What if, instead, we take a position that rather than enhancing components, we replace them. Then the best way to know when something needs to be replaced, is when it doesn’t do what it says on the tin.

A good package starts with choosing a good name. Think of your package’s name as an elevator pitch, using just one word, to describe what it provides. When the name no longer matches the requirement, find a replacement.

Simplicity matters

“Simple is better than complex.”

The Zen of Python, Item 3

PEP-20 says simple is better than complex, I couldn’t agree more. A couple of years ago I made this tweet;

https://twitter.com/davecheney/status/539576755254611968

My observation, at least at the time, was that I couldn’t think of a language introduced in my life time that didn’t purport to be simple. Each new language offered as a justification, and an enticement, their inherent simplicity. But as I researched, I found that simplicity was not a core value of the many of the languages considered Go’s contemporaries. 1 Maybe this is just a cheap shot, but could it be that either these languages aren’t simple, or they don’t think of themselves as being simple. They don’t consider simplicity to be a core value.

Call me old fashioned, but when did being simple fall out of style? Why does the commercial software development industry continually, gleefully, forget this fundamental truth?

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”

C. A. R. Hoare, The Emperor’s Old Clothes, 1980 Turing Award Lecture

Simple does not mean easy, we know that. Often it is more work to make something simple to use, than easy to build.

“Simplicity is prerequisite for reliability.”

Edsger W Dijkstra, EWD498, 18 June 1975

Why should we strive for simplicity? Why is important that Go programs be simple? Simple doesn’t mean crude, it means readable and maintainable. Simple doesn’t mean unsophisticated, it means reliable, relatable, and understandable.

“Controlling complexity is the essence of computer programming.”

Brian W. Kernighan, Software Tools (1976)

Whether Python abides by its mantra of simplicity is a matter for debate, but Go holds simplicity as a core value. I think that we can all agree that when it comes to Go, simple code is preferable to clever code.

Avoid package level state

“Explicit is better than implicit.”

The Zen of Python, Item 2

This is a place where I think Peters’ was more aspirational than factual. Many things in Python are not explicit; decorators, dunder methods, and so on. Without doubt they are powerful, there’s a reason those features exists. Each feature is something someone cared enough about to do the work to implement it, especially the complicated ones. But heavy use of those features makes is harder for the reader to predict the cost of an operation.

The good news is we have a choice, as Go programmers, to choose to make our code explicit. Explicit could mean many things, perhaps you may be thinking explicit is just a nice way of saying bureaucratic and long winded, but that’s a superficial interpretation. It’s a misnomer to focus only on the syntax on the page, to fret about line lengths and DRYing up expressions. The more valuable, in my opinon, place to be explicit are to do with coupling and with state.

Coupling is a measure of the amount one thing depends on another. If two things are tightly coupled, they move together. An action that affects one is directly reflected in another. Imagine a train, each carriage joined–ironically the correct word is coupled–together; where the engine goes, the carriages follow.

Another way to describe coupling is the word cohesion. Cohesion measures how well two things naturally belong together. We talk about a cohesive argument, or a cohesive team; all their parts fit together as if they were designed that way.

Why does coupling matter? Because just like trains, when you need to change a piece of code, all the code that is tightly coupled to it must change. A prime example, someone release a new version of their API and now your code doesn’t compile.

APIs are an unavoidable source of coupling but there are more insidious forms of coupling. Clearly everyone knows that if an API’s signature changes the data passing into and out of that call changes. It’s right there in the signature of the function; I take values of these types and return values of other types. But what if the API passed data another way? What if every time you called this API the result was based on the previous time you called that API even though you didn’t change your parameters.

This is state, and management of state is the problem in computer science.

package counter

var count int

func Increment(n int) int {
        count += n
        return count
}

Suppose we have this simple counter package. You can call Increment to increment the counter, you can even get the value back if you Increment with a value of zero.

Suppose you had to test this code, how would you reset the counter after each test? Suppose you wanted to run those tests in parallel, could you do it? Now suppose that you wanted to count more than one thing per program, could you do it?

No, of course not. Clearly the answer is to encapsulate the count variable in a type.

package counter

type Counter struct {
        count int
}

func (c *Counter) Increment(n int) int {
        c.count += n
        return c.count
}

Now imagine that this problem isn’t restricted to just counters, but your applications main business logic. Can you test it in isolation? Can you test it in parallel? Can you use more than one instance at a time? If the answer those question is no, the reason is package level state.

Avoid package level state. Reduce coupling and spooky action at a distance by providing the dependencies a type needs as fields on that type rather than using package variables.

Plan for failure, not success

“Errors should never pass silently.”

The Zen of Python, Item 10

It’s been said of languages that favour exception handling follow the Samurai principle; return victorious or not at all. In exception based languages functions only return valid results. If they don’t succeed then control flow takes an entirely different path.

Unchecked exceptions are clearly an unsafe model to program in. How can you possibly write code that is robust in the presence of errors when you don’t know which statements could throw an exception? Java tried to make exceptions safer by introducing the notion of a checked exception which, to the best of my knowledge, has not been repeated in another mainstream language. There are plenty of languages which use exceptions but they all, with the singular exception of Java, do so in the unchecked variety.

Obviously Go chose a different path. Go programmers believe that robust programs are composed from pieces that handle the failure cases before they handle the happy path. In the space that Go was designed for; server programs, multi threaded programs, programs that handle input over the network, dealing with unexpected data, timeouts, connection failures and corrupted data must be front and centre of the programmer’s mind if they are to produce robust programs.

“I think that error handling should be explicit, this should be a core value of the language.”

Peter Bourgon, GoTime #91

I want to echo Peter’s assertion, as it was the impetus for this article. I think so much of the success of Go is due to the explicit way errors are handled. Go programmers thinks about the failure case first. We solve the “what if…​” case first. This leads to programs where failures are handled at the point of writing, rather than the point they occur in production.

The verbosity of

if err != nil {
    return err
}

is outweighed by the value of deliberately handling each failure condition at the point at which they occur. Key to this is the cultural value of handling each and every error explicitly.

Return early rather than nesting deeply

“Flat is better than nested.”

The Zen of Python, Item 5

This is sage advice coming from a language where indentation is the primary form of control flow. How can we interpret this advice in terms of Go? gofmt controls the overall whitespace of a Go program so there’s not thing doing there.

I wrote earlier about package names, and there is probably some advice here about avoiding a complicated package hierarchy. In my experience the more a programmer tries to subdivide and taxonimise their Go codebase the more they risk hitting the dead end that is package import loops.

I think the best application of item 5’s advice is the control flow within a function. Simply put, avoid control flow that requires deep indentation.

“Line of sight is a straight line along which an observer has unobstructed vision.”

May Ryer, Code: Align the happy path to the left edge

Mat Ryer describes this idea as line of sight coding. Light of sight coding means things like:

  • Using guard clauses to return early if a precondition is not met.
  • Placing the successful return statement at the end of the function rather than inside a conditional block.
  • Reducing the overall indentation level of the function by extracting functions and methods.

Key to this advice is the thing that you care about, the thing that the function does, is never in danger of sliding out of sight to the right of your screen. This style has a bonus side effect that you’ll avoid pointless arguments about line lengths on your team.

Every time you indent you add another precondition to the programmers stack, consuming one of their 7 ±2 short term memory slots. Rather than nesting deeply, keep the successful path of the function close to the left hand side of your screen.

If you think it’s slow, prove it with a benchmark

“In the face of ambiguity, refuse the temptation to guess.”

The Zen of Python, Item 12

Programming is based on mathematics and logic, two concepts which rarely involve the element of chance. But there are many things we, as programmers, guess about every day. What does this variable do? What does this parameter do? What happens if I pass nil here? What happens if I call Register twice? There’s actually a lot of guesswork in modern programming, especially when it comes to using libraries you didn’t write.

“APIs should be easy to use and hard to misuse.”

Josh Bloch

One of the best ways I know to help a programmer avoid having to guess is to, when building an API, focus on the default use case. Make it as easy as you can for the caller to do the most common thing. However, I’ve written and talked a lot about API design in the past, so instead my interpretation of item 12 is; don’t guess about performance.

Despite how you may feel about Knuth’s advice, one of the drivers of Go’s success is its efficient execution. You can write efficient programs in Go and thus people will choose Go because of this. There are a lot of misconceptions about performance, so my request is, when you’re looking to performance tune your code or you’re facing some dogmatic advice like defer is slow, CGO is expensive, or always use atomics not mutexes, don’t guess.

Don’t complicate your code because of outdated dogma, and, if you think something is slow, first prove it with a benchmark. Go has excellent benchmarking and profiling tools that come in the distribution for free. Use them to find your bottlenecks.

Before you launch a goroutine, know when it will stop

At this point I think I think I’ve mined the valuable points from PEP-20 and possibly stretched its reinterpretation beyond the point of good taste. I think that’s fine, because although this was a useful rhetorical device, ultimately we are talking about two different languages.

“You type g o, a space, and then a function call. Three keystrokes, you can’t make it much shorter than that. Three keystrokes and you’ve just started a sub process.”

Rob Pike, Simplicity is Complicated, dotGo 2015

The next two suggestions I’ll dedicate to goroutines. Goroutines are the signature feature of the language, our answer for first class concurrency. They are so easy to use, just put the word go in front of the statement and you’ve launched that function asynchronously. It’s so simple, no threads, no stack sizes, no thread pool executors, no ID’s, no tracking completion status.

Goroutines are cheap. Because of the runtime’s ability to multiplex goroutines onto a small pool of threads (which you don’t have to manage), hundreds of thousands, millions of goroutines are easily accommodated. This opens up designs that would be not be practical under competing concurrency models like threads or evented callbacks.

But as cheap as goroutines are, they’re not free. At a minimum there’s a few kilobytes for their stack, which, when you’re getting up into the 10^6 goroutines, does start to add up. This is not to say you shouldn’t use millions of goroutines if that is what the design calls for, but when you do, it’s critical that you keep track of them because 10^6 of anything can consume a non trivial amount of resources in aggregate.

Goroutines are the key to resource ownership in Go. To be useful a goroutine has to do something, and that means it almost always holds reference to, or ownership of, a resource; a lock, a network connection, a buffer with data, the sending end of a channel. While that goroutine is alive, the lock is held, the network connection remains open, the buffer retained and the receivers of the channel will continue to wait for more data.

The simplest way to free those resources is to tie them to the lifetime of the goroutine–when the goroutine exits, the resource has been freed. So while it’s near trivial to start a goroutine, before you write those three letters, g o and a space, make sure you have an answer to these questions:

  • Under what condition will a goroutine stop? Go doesn’t have a way to tell a goroutine to exit. There is no stop or kill function, for good reason. If we cannot command a goroutine to stop, we must instead ask it, politely. Almost always this comes down to a channel operation. Range loops over a channel exit when the channel is closed. A channel will become selectable if it is closed. The signal from one goroutine to another is best expressed as a closed channel.
  • What is required for that condition to arise? If channels are both the vehicle to communicate between goroutines and the mechanism for them to signal completion, the next question to the programmer becomes, who will close the channel, when will that happen?
  • What signal will you use to know the goroutine has stopped? When you signal a goroutine to stop, that stopping will happen at some time in the future relative to the goroutine’s frame of reference. It might happen quickly in terms of human perception, but computers execute billions of instructions every second, and from the point of view of each goroutine, their execution of instructions is unsynchronised. The solution is often to use a channel to signal back or a waitgroup where a fan in approach is needed.

Leave concurrency to the caller

It is likely that in any serious Go program you write there will be concurrency involved. This raises the problem, many of the libraries and code that we write fall into this a one goroutine per connection, or worker pattern. How will you manage the lifetime of those goroutines?

net/http is a prime example. Shutting down the server owning the listening socket is relatively straight forward, but what about a goroutines spawned from that accepting socket? net/http does provide a context object inside the request object which can be used to signal–to code that is listening–that the request should be canceled, thereby terminating the goroutine, however it is less clear how to know when all of these things have been done. It’s one thing to call context.Cancel, its another to know that the cancellation has completed.2

The point I want to make about net/http is that its a counter example to good practice. Because each connection is handled by a goroutine spawned inside the net/http.Server type, the program, living outside the net/http package, does not have an ability to control the goroutines spawned for the accepting socket.

This is an area of design that is still evolving, with efforts like go-kit’s run.Group and the Go team’s ErrGroup which provide a framework to execute, cancel and wait on functions run asynchronously.

The bigger design maxim here is for library writers, or anyone writing code that could be run asynchronously, leave the responsibility of starting to goroutine to your caller. Let the caller choose how they want to start, track, and wait on your functions execution.

Write tests to lock in the behaviour of your package’s API

Perhaps you were hoping to read an article from me where I didn’t rant about testing. Sadly, today is not that day.

Your tests are the contract about what your software does and does not do. Unit tests at the package level should lock in the behaviour of the package’s API. They describe, in code, what the package promises to do. If there is a unit test for each input permutation, you have defined the contract for what the code will do in code, not documentation.

This is a contract you can assert as simply as typing go test. At any stage, you can know with a high degree of confidence, that the behaviour people relied on before your change continues to function after your change.

Tests lock in api behaviour. Any change that adds, modifies or removes a public api must include changes to its tests.

Moderation is a virtue

Go is a simple language, only 25 keywords. In some ways this makes the features that are built into the language stand out. Equally these are the features that the language sells itself on, lightweight concurrency, structural typing.

I think all of us have experienced the confusion that comes from trying to use all of Go’s features at once. Who was so excited to use channels that they used them as much as they could, as often as they could? Personally for me I found the result was hard to test, fragile, and ultimately overcomplicated. Am I alone?

I had the same experience with goroutines, attempting to break the work into tiny units I created a hard to manage hurd of Goroutines and ultimately missed the observation that most of my goroutines were always blocked waiting for their predecessor– the code was ultimately sequential and I had added a lot of complexity for little real world benefit. Who has experienced something like this?

I had the same experience with embedding. Initially I mistook it for inheritance. Then later I recreated the fragile base class problem by composing complicated types, which already had several responsibilities, into more complicated mega types.

This is potentially the least actionable piece of advice, but one I think is important enough to mention. The advice is always the same, all things in moderation, and Go’s features are no exception. If you can, don’t reach for a goroutine, or a channel, or embed a struct, anonymous functions, going overboard with packages, interfaces for everything, instead prefer simpler approach rather than the clever approach.

Maintainability counts

I want to close with one final item from PEP-20,

“Readability Counts.”

The Zen of Python, Item 7

So much has been said, about the importance of readability, not just in Go, but all programming languages. People like me who stand on stages advocating for Go use words like simplicity, readability, clarity, productivity, but ultimately they are all synonyms for one word–maintainability.

The real goal is to write maintainable code. Code that can live on after the original author. Code that can exist not just as a point in time investment, but as a foundation for future value. It’s not that readability doesn’t matter, maintainability matters more.

Go is not a language that optimises for clever one liners. Go is not a language which optimises for the least number of lines in a program. We’re not optimising for the size of the source code on disk, nor how long it takes to type the program into an editor. Rather, we want to optimise our code to be clear to the reader. Because its the reader who’s going to have to maintain this code.

If you’re writing a program for yourself, maybe it only has to run once, or you’re the only person who’ll ever see it, then do what ever works for you. But if this is a piece of software that more than one person will contribute to, or that will be used by people over a long enough time that requirements, features, or the environment it runs in may change, then your goal must be for your program to be maintainable. If software cannot be maintained, then it will be rewritten; and that could be the last time your company will invest in Go.

Can the thing you worked hard to build be maintained after you’re gone? What can you do today to make it easier for someone to maintain your code tomorrow?

the-zen-of-go.netlify.com