Category Archives: Go

How the Go runtime implements maps efficiently (without generics)

This post discusses how maps are implemented in Go. It is based on a presentation I gave at the GoCon Spring 2018 conference in Tokyo, Japan.

What is a map function?

To understand how a map works, let’s first talk about the idea of the map function. A map function maps one value to another. Given one value, called a key, it will return a second, the value.

map(key) → value

Now, a map isn’t going to be very useful unless we can put some data in the map. We’ll need a function that adds data to the map

insert(map, key, value)

and a function that removes data from the map

delete(map, key)

There are other interesting properties of map implementations like querying if a key is present in the map, but they’re outside the scope of what we’re going to discuss today. Instead we’re just going to focus on these properties of a map; insertion, deletion and mapping keys to values.

Go’s map is a hashmap

The specific map implementation I’m going to talk about is the hashmap, because this is the implementation that the Go runtime uses. A hashmap is a classic data structure offering O(1) lookups on average and O(n) in the worst case. That is, when things are working well, the time to execute the map function is a near constant.

The size of this constant is part of the hashmap design and the point at which the map moves from O(1) to O(n) access time is determined by its hash function.

The hash function

What is a hash function? A hash function takes a key of an unknown length and returns a value with a fixed length.

hash(key) → integer

this hash value is almost always an integer for reasons that we’ll see in a moment.

Hash and map functions are similar. They both take a key and return a value. However in the case of the former, it returns a value derived from the key, not the value associated with the key.

Important properties of a hash function

It’s important to talk about the properties of a good hash function as the quality of the hash function determines how likely the map function is to run near O(1).

When used with a hashmap, hash functions have two important properties. The first is stability. The hash function must be stable. Given the same key, your hash function must return the same answer. If it doesn’t you will not be able to find things you put into the map.

The second property is good distribution. Given two near identical keys, the result should be wildly different. This is important for two reasons. Firstly, as we’ll see, values in a hashmap should be distributed evenly across buckets, otherwise the access time is not O(1). Secondly as the user can control some of the aspects of the input to the hash function, they may be able to control the output of the hash function, leading to poor distribution which has been a DDoS vector for some languages. This property is also known as collision resistance.

The hashmap data structure

The second part of a hashmap is the way data is stored.
The classical hashmap is an array of buckets each of which contains a pointer to an array of key/value entries. In this case our hashmap has eight buckets (as this is the value that the Go implementation uses) and each bucket can hold up to eight entries each (again drawn from the Go implementation). Using powers of two allows the use of cheap bit masks and shifts rather than expensive division.

As entries are added to a map, assuming a good hash function distribution, then the buckets will fill at roughly the same rate. Once the number of entries across each bucket passes some percentage of their total size, known as the load factor, then the map will grow by doubling the number of buckets and redistributing the entries across them.

With this data structure in mind, if we had a map of project names to GitHub stars, how would we go about inserting a value into the map?

We start with the key, feed it through our hash function, then mask off the bottom few bits to get the correct offset into our bucket array. This is the bucket that will hold all the entries whose hash ends in three (011 in binary). Finally we walk down the list of entries in the bucket until we find a free slot and we insert our key and value there. If the key was already present, we’d just overwrite the value.

Now, lets use the same diagram to look up a value in our map. The process is similar. We hash the key as before, then masking off the lower 3 bits, as our bucket array contains 8 entries, to navigate to the fifth bucket (101 in binary). If our hash function is correct then the string "moby/moby" will always hash to the same value, so we know that the key will not be in any other bucket. Now it’s a case of a linear search through the bucket comparing the key provided with the one stored in the entry.

Four properties of a hash map

That was a very high level explanation of the classical hashmap. We’ve seen there are four properties you need to implement a hashmap;

1. You need a hash function for the key.
2. You need an equality function to compare keys.
3. You need to know the size of the key and,
4. You need to know the size of the value because these affect the size of the bucket structure, which the compiler needs to know, as you walk or insert into that structure, how far to advance in memory.

Hashmaps in other languages

Before we talk about the way Go implements a hashmap, I wanted to give a brief overview of how two popular languages implement hashmaps. I’ve chosen these languages as both offer a single map type that works across a variety of key and values.

C++

The first language we’ll discuss is C++. The C++ Standard Template Library (STL) provides std::unordered_map which is usually implemented as a hashmap.

This is the declaration for std::unordered_map. It’s a template, so the actual values of the parameters depend on how the template is instantiated.

template<
    class Key,                             // the type of the key
    class T,                               // the type of the value
    class Hash = std::hash<Key>,            // the hash function
    class KeyEqual = std::equal_to<Key>,    // the key equality function
    class Allocator = std::allocator< std::pair<const Key, T> > 
> class unordered_map;

There is a lot here, but the important things to take away are;

The template takes the type of the key and value as parameters, so it knows their size.
The template takes a std::hash function specialised on the key type, so it knows how to hash a key passed to it.
And the template takes an std::equal_to function, also specialised on key type, so it knows how to compare two keys.

Now we know how the four properties of a hashmap are communicated to the compiler in C++’s std::unordered_map, let’s look at how they work in practice.

First we take the key, pass it to the std::hash function to obtain the hash value of the key. We mask and index into the bucket array, then walk the entries in that bucket comparing the keys using the std::equal_to function.

Java

The second language we’ll discuss is Java. In java the hashmap type is called, unsurprisingly, java.util.Hashmap.

In java, the java.util.Hashmap type can only operate on objects, which is fine because in Java almost everything is a subclass of java.lang.Object. As every object in Java descends from java.lang.Object they inherit, or override, a hashCode and an equals method.

However, you cannot directly store the eight primitive types; boolean, int, short, long, byte, char, float, and double, because they are not subclasss of java.lang.Object. You cannot use them as a key, you cannot store them as a value. To work around this limitation, those types are silently converted into objects representing their primitive values. This is known as boxing.

Putting this limitation to one side for the moment, let’s look at how a lookup in Java’s hashmap would operate.

First we take the key and call its hashCode method to obtain the hash value of the key. We mask and index into the bucket array, which in Java is a pointer to an Entry, which holds a key and value, and a pointer to the next Entry in the bucket forming a linked list of entries.

Tradeoffs

Now that we’ve seen how C++ and Java implement a Hashmap, let’s compare their relative advantages and disadvantages.

C++ templated `std::unordered_map`

Advantages

Size of the key and value types known at compile time.
Data structure are always exactly the right size, no need for boxing or indiretion.
As code is specialised at compile time, other compile time optimisations like inlining, constant folding, and dead code elimination, can come into play.

In a word, maps in C++ can be as fast as hand writing a custom map for each key/value combination, because that is what is happening.

Disadvantages

Code bloat. Each different map are different types. For N map types in your source, you will have N copies of the map code in your binary.
Compile time bloat. Due to the way header files and template work, each file that mentions a std::unordered_map the source code for that implementation has to be generated, compiled, and optimised.

Java util Hashmap

Advantages

One implementation of a map that works for any subclass of java.util.Object. Only one copy of java.util.HashMap is compiled, and its referenced from every single class.

Disadvantages

Everything must be an object, even things which are not objects, this means maps of primitive values must be converted to objects via boxing. This adds gc pressure for wrapper objects, and cache pressure because of additional pointer indirections (each object is effective another pointer lookup)
Buckets are stored as linked lists, not sequential arrays. This leads to lots of pointer chasing while comparing objects.
Hash and equality functions are left as an exercise to the author of the class. Incorrect hash and equals functions can slow down maps using those types, or worse, fail to implement the map behaviour.

Go’s hashmap implementation

Now, let’s talk about how the hashmap implementation in Go allows us to retain many of the benfits of the best map implementations we’ve seen, without paying for the disadvantages.

Just like C++ and just like Java, Go’s hashmap written in Go. But–Go does not provide generic types, so how can we write a hashmap that works for (almost) any type, in Go?

Does the Go runtime use interface{} ?

No, the Go runtime does not use interface{} to implement its hashmap. While we have the container/{list,heap} packages which do use the empty interface, the runtime’s map implementation does not use interface{}.

Does the compiler use code generation?

No, there is only one copy of the map implementation in a Go binary. There is only one map implementation, and unlike Java, it doesn’t use interface{} boxing. So, how does it work?

There are two parts to the answer, and they both involve co-operation between the compiler and the runtime.

Compile time rewriting

The first part of the answer is to understand that map lookups, insertion, and removal, are implemented in the runtime package. During compilation map operations are rewritten to calls to the runtime. eg.

v := m["key"]     → runtime.mapaccess1(m, ”key", &v)
v, ok := m["key"] → runtime.mapaccess2(m, ”key”, &v, &ok)
m["key"] = 9001   → runtime.mapinsert(m, ”key", 9001)
delete(m, "key")  → runtime.mapdelete(m, “key”)

It’s also useful to note that the same thing happens with channels, but not with slices.

The reason for this is channels are complicated data types. Send, receive, and select have complex interactions with the scheduler so that’s delegated to the runtime. By comparison slices are much simpler data structures, so the compiler natively handles operations like slice access, len and cap while deferring complicated cases in copy and append to the runtime.

Only one copy of the map code

Now we know that the compiler rewrites map operations to calls to the runtime. We also know that inside the runtime, because this is Go, there is only one function called mapaccess, one function called mapaccess2, and so on.

So, how can the compiler can rewrite this

v := m[“key"]

into this

 runtime.mapaccess(m, ”key”, &v)

without using something like interface{}? The easiest way to explain how map types work in Go is to show you the actual signature of runtime.mapaccess1.

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer

Let’s walk through the parameters.

key is a pointer to the key, this is the value you provided as the key.
h is a pointer to a runtime.hmap structure. hmap is the runtime’s hashmap structure that holds the buckets and other housekeeping values ¹.
t is a pointer to a maptype, which is odd.

Why do we need a *maptype if we already have a *hmap? *maptype is the special sauce that makes the generic *hmap work for (almost) any combination of key and value types. There is a maptype value for each unique map declaration in your program. There will be one that describes maps from strings to ints, from strings to http.Headers, and so on.

Rather than having, as C++ has, a complete map implementation for each unique map declaration, the Go compiler creates a maptype during compilation and uses that value when calling into the runtime’s map functions.

type maptype struct {
         typ           _type
         key         *_type
         elem        *_type
         bucket        *_type // internal type representing a hash bucket 
        hmap          *_type // internal type representing a hmap
         keysize       uint8  // size of key slot
         indirectkey   bool   // store ptr to key instead of key itself
         valuesize     uint8  // size of value slot
         indirectvalue bool   // store ptr to value instead of value itself
         bucketsize    uint16 // size of bucket
         reflexivekey  bool   // true if k==k for all keys
         needkeyupdate bool   // true if we need to update key on overwrite 
}

Each maptype contains details about properties of this kind of map from key to elem. It contains infomation about the key, and the elements. maptype.key contains information about the pointer to the key we were passed. We call these type descriptors.

type _type struct {
         size       uintptr
         ptrdata    uintptr // size of memory prefix holding all pointers
         hash       uint32
         tflag      tflag
         align      uint8 
        fieldalign uint8
         kind       uint8
         alg       *typeAlg 
        // gcdata stores the GC type data for the garbage collector.
         // If the KindGCProg bit is set in kind, gcdata is a GC program. 
        // Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
         gcdata    *byte
         str       nameOff 
        ptrToThis typeOff
 }

In the _type type, we have things like it’s size, which is important because we just have a pointer to the key value, but we need to know how large it is, what kind of a type it is; it is an integer, is it a struct, and so on. We also need to know how to compare values of this type and how to hash values of that type, and that is what the _type.alg field is for.

type typeAlg struct {
         // function for hashing objects of this type
         // (ptr to object, seed) -> hash
         hash func(unsafe.Pointer, uintptr) uintptr
         // function for comparing objects of this type
         // (ptr to object A, ptr to object B) -> ==? 
        equal func(unsafe.Pointer, unsafe.Pointer) bool 
}

There is one typeAlg value for each type in your Go program.

Putting it all together, here is the (slightly edited for clarity) runtime.mapaccess1 function.

// mapaccess1 returns a pointer to h[key].  Never returns nil, instead 
// it will return a reference to the zero object for the value type if 
// the key is not in the map.
 func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer { 
        if h == nil || h.count == 0 {
                 return unsafe.Pointer(&zeroVal[0]) 
        } 
        alg := t.key.alg
         hash := alg.hash(key, uintptr(h.hash0))
         m := bucketMask(h.B)
         b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))

One thing to note is the h.hash0 parameter passed into alg.hash. h.hash0 is a random seed generated when the map is created. It is how the Go runtime avoids hash collisions.

Anyone can read the Go source code, so they could come up with a set of values which, using the hash ago that go uses, all hash to the same bucket. The seed value adds an amount of randomness to the hash function, providing some protection against collision attack.

Conclusion

I was inspired to give this presentation at GoCon because Go’s map implementation is a delightful compromise between C++’s and Java’s, taking most of the good without having to accomodate most of the bad.

Unlike Java, you can use scalar values like characters and integers without the overhead of boxing. Unlike C++, instead of N runtime.hashmap implementations in the final binary, there are only N runtime.maptype values, a substantial saving in program space and compile time.

Now I want to be clear that I am not trying to tell you that Go should not have generics. My goal today was to describe the situation we have today in Go 1 and how the map type in Go works under the hood. The Go map implementation we have today is very fast and provides most of the benefits of templated types, without the downsides of code generation and compile time bloat.

I see this as a case study in design that deserves recognition.

Go’s hidden #pragmas

This is an article about compiler directives; or as they are commonly known, pragmas. It’s derived from a talk of a similar name that I gave last year at GopherChina in Shanghai.

But first, a history lesson

Before we talk about Go, let’s talk a little about pragmas, and their history. Many languages have the notion of an attribute, or directive, that changes the way source code is interpreted during compilation. For example, Perl has the use function:

use strict;
use strict "vars";
use strict "refs";
use strict "subs";

use enable features, or makes the compiler interpret the source of the program differently, by making the compiler more pedantic or enabling a new syntax mode.

Javascript has something similar. ECMAScript 5 extended the language with optional modes, such as:

"use strict";

When the Javascript interpreter comes across the words "use strict"; it enables, so called, Strict Mode when parsing your Javascript source. ²

Rust is similar, it uses the attributes syntax to enable unstable features in the compiler or standard library.

#[inline(always)]
 fn super_fast_fn() { ... }

#[cfg(target_os = "macos")]
mod macos_only { ... }

The inline(always) attribute tells the compiler that it must inline super_fast_fn. The target_os attribute tells the compiler to only compile the macos_only module on OS X.

The name pragma comes from ALGOL 68, where they were called pragmats, which was itself shorthand for the word pragmatic. When they were adopted by C in the 1970’s, the name was shortened again to #pragma, and due to the widespread use of C, became fully integrated into the programmer zeitgeist.

#pragma pack(2)
 struct T {
    int i;
    short j;
double k; 
};

This example says to the compiler that the structure should be packed on a two byte boundary; so the double, k, will start at an offset of 6 bytes from the address of T, not the usual 8.

C’s #pragma directive spawned a host of compiler specific extensions, like gcc’s __builtin directive.

Does Go have pragmas?

Now that we know a little bit of the history of pragmas, maybe we can now ask the question, does Go have pragmas?

You saw earlier that #pragma, like #include and #define are implemented in C style languages with a preprocessor, but Go does not have a preprocessor, or macros, so, the question remains, does Go have pragmas?

It turns out that, yes, even though Go does not have macros, or a preprocessor, Go does indeed support pragmas. They are implemented by the compiler as comments.

Just to drive home the point, they’re actually called pragmas in the source of the Go compiler.

So, clearly the name pragma, along with the idea, isn’t going away.

This article focuses on a only a few of the pragmas that the compiler recognises, partly because the list changes frequently, but mostly because not all of them are usable by you as programmers.

Here are some examples to whet your appetite

//go:noescape 
func gettimeofday(tv *Timeval) (err Errno)

This is an example of the noescape directive on the gettimeofday stub from the syscall package.

//go:noinline
 func lshNop1(x uint64) uint64 { 
        // two outer shifts should be removed
         return (((x << 5) >> 2) << 2) 
}

This is an example of the noinline directive from a test fixture in the compiler tests.

//go:nosplit
 func atomicstorep(ptr unsafe.Pointer, new unsafe.Pointer) {
         writebarrierptr_prewrite((*uintptr)(ptr), uintptr(new))
         atomic.StorepNoWB(noescape(ptr), new) 
}

This is an example of the nosplit directive inside the runtime’s atomic support functions.

Don’t worry if this was all a bit quick, we’re going to explore these examples, and more, during the remainder of this article.

A word of caution ?

Before I continue, I want to offer a word of caution.

Pragmas are not part of the language. They might be implemented the gc compiler, but you will not find them in the spec. At a higher level, the idea of adding pragmas to the language caused considerable debate, especially after the first few established a precedent. In a debate about adding the //go:noinline directive Rob Pike opined,

“Useful” is always true for a feature request. The question is, does the usefulness justify the cost? The cost here is continued proliferation of magic comments, which are becoming too numerous already.
–Rob Pike

I’ll leave you to decide if adding pragmas to Go was a good idea or not.

As I mentioned earlier pragma directives are placed in Go comments with a precise syntax. The syntax has the general form:

//go:directive

The go: prefix can be replaced with another, so you can see that the Go team were at least considering future growth, even though they don’t encourage it. It’s also important to note that there is no space between the // and the go keyword. This is partly an accident of history, but it also makes it less likely to conflict with a regular comment.

Lastly, some of these directives require you to do one or more of the following:

import the unsafe package.
compile with the undocumented -+ flag.
be part of the runtime package.

If you get it wrong, your directive might be ignored, and in most cases you code will compile but might be slower or behave incorrectly.

//go:noescape

Enough with the preflight safety checks.

Early in Go’s life, the parts that went into a complete Go program would include Go code (obviously), some C code from the runtime, and some assembly code, again from the runtime or syscall package. The take away is it was expected that inside a package, you’d occasionally find functions which were not implemented in Go.

Now, normally this mixing of languages wouldn’t be a problem, except when it interacts with escape analysis. In Go it’s very common to do something like this,

func NewBook() (*Book) {
        b := Book{ Mice: 12, Men: 9 }
        return &b
}

That is, inside NewBook we declare and initialise a new Book variable b, then return the address of b. We do this so often inside Go it probably doesn’t sink in that if you were to do something like this in C, the result would be pernicious memory corruption as the address returned from NewBook would point to the location on the stack where b was temporarily allocated.

Escape analysis

Escape analysis identifies variables whose lifetimes will live beyond the lifetime of the function in which it is declared, and moves the location where the variable is allocated from the stack to the heap. Technically we say that b escapes to the heap.

Obviously there is a cost; heap allocated variables have to be garbage collected when they are no longer reachable, stack allocated variables are automatically free’d when their function returns. Keep that in mind.

func BuildLibrary() {
        b := Book{Mice: 99: Men: 3} 
        AddToCollection(&b) 
}

Now, lets consider a slightly different version of what we saw above. In this contrived example, BuildLibrary declares a new Book, b, and passes the address of b to AddToCollection. The question is, “does b escape to the heap?”

The answer is, it depends. It depends on what AddToCollection does with the *Book passed to it. If AddToCollection did something like this,

func AddToCollection(b *Book) {
        b.Classification = "fiction" 
}

then that’s fine. AddToCollection can address those fields in Book irrespective of if b points to an address on the stack or on the heap. Escape analysis would conclude that the b declared in BuildLibrary did not escape, because AddToCollection did not retain a copy of the *Book passed to it, and can therefore be allocated cheaply on the stack.

However, if AddToCollection did something like this,

var AvailableForLoan []*Book

func AddToCollection(b *Book) {
        AvailableForLoan = append(AvailableForLoan, b) 
}

that is, keep a copy of b in some long lived slice, then that will have an impact on the b declared in BuildLibrary. b must be allocated on the heap so that it lives beyond the lifetime of AddToCollection and BuildLibrary. Escape analysis has to know what AddToCollection does, what functions it calls, and so on, to know if a value should be heap or stack allocated. This is the essence of escape analysis.

os.File.Read

That was a lot of background, let’s get back to the //go:noescape pragma. Now we know that the call stack of functions affects whether a value escapes or not, consider this very common situation (error handling elided for brevity),

f, _ := os.Open("/tmp/foo")
 buf := make([]byte, 4096)
 n, _ := f.Read(buf)

We open a file, make a buffer, and we read into that buffer. Is buf allocated on the stack, or on the heap?

As we saw above, it depends on what happens inside os.File.Read. os.File.Read calls down through a few layers to syscall.Read, and this is where it gets complicated. syscall.Read calls down into syscall.Syscall to do the operating system call. syscall.Syscall is implemented in assembly. Because syscall.Syscall is implemented in assembly, the compiler, which works on Go code, cannot “see” into that function, so it cannot see if the values passed to syscall.Syscall escape or not. Because the compiler cannot know if the value might escape, it must assume it will escape.

This was the situation in issue 4099. If you wanted to write a small bit of glue code in assembly, like the bytes, md5, or syscall package, anything you passed to it would be forced to allocated on the heap even if you knew that it doesn’t.

package bytes
//go:noescape
 // IndexByte returns the index of the first instance of c in s, 
// or -1 if c is not present in s. 
func IndexByte(s []byte, c byte) int // ../runtime/asm_$GOARCH.s

So this is precisely what the //go:noescape pragma does. It says to the compiler, “the next function declaration you see, assume that none of the arguments escape.” We’ve said to the compiler; trust us, IndexByte and its children do not keep a reference to the byte slice.

In this example from Go 1.5 you can see that bytes.IndexByte is implemented in assembly ³. By marking this function //go:noescape, it will avoid stack allocated []byte slices escaping to the heap unnecessarily.

Can you use //go:noescape in your code?

Can you use //go:noescape in your own code? Yes, but it can only be used on the forward declarations.

package main

import "fmt"

//go:noescape
func length(s string) int // implemented in an .s file

func main() {
        s := "hello world"
        l := length(s)
        fmt.Println(l)
}

Note, you’re bypassing the checks of the compiler, if you get this wrong you’ll corrupt memory and no tool will be able to spot this.

//go:norace

Forking in a multithreaded program is complicated. The child process gets a complete, independent, copy of the parent’s memory, so things like locks, implemented as values in memory can become corrupt when suddenly two copies of the same program see locks in different state.

Fork/exec in the Go runtime is handled with care by the syscall package which coordinates to make sure that the runtime is in quiescent state during the brief fork period. However, when the race runtime is in effect, this becomes harder.

To spot races, when compiling in race mode, the program is rewritten so every read and write goes via the race detector framework to detect unsafe memory access. I’ll let the commit explain.

// TODO(rsc): Remove. Put //go:norace on forkAndExecInChild instead.
 func isforkfunc(fn *Node) bool {
         // Special case for syscall.forkAndExecInChild.
         // In the child, this function must not acquire any locks, because 
        // they might have been locked at the time of the fork. This means 
        // no rescheduling, no malloc calls, and no new stack segments. 
        // Race instrumentation does all of the above. 
        return myimportpath != "" && myimportpath == "syscall" &&
                fn.Func.Nname.Sym.Name == "forkAndExecInChild" 
}

As Russ’s comment shows above, the special casing in the compiler was removed in favour of a directive on the syscall.forkAndExecInChild functions in the syscall package.

// Fork, dup fd onto 0..len(fd), and exec(argv0, argvv, envv) in child. 
// If a dup or exec fails, write the errno error to pipe. 
// (Pipe is close-on-exec so if exec succeeds, it will be closed.)
 // In the child, this function must not acquire any locks, because 
// they might have been locked at the time of the fork. This means
 // no rescheduling, no malloc calls, and no new stack segments.
 // For the same reason compiler does not race instrument it. 
// The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
 //go:norace 
func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir 
        *byte, attr *ProcAttr, sys *SysProcAttr, pipe int)
        (pid int, err Errno) {

This was replaced by the annotation //go:norace by Ian Lance Taylor in Go 1.6, which removed the special case in the compiler, however //go:norace is still only used in one place in the standard library.

Should you use //go:norace in your own code?

Should you use //go:norace in your own code? Using //go:norace will instruct the compiler to not annotate the function, thus will not detect any data races if they exist. This program contains a data race, which will not be reported by the race detector because of the //go:norace annotation.

package main

var v int

//go:norace
func add() {
        v++
}

func main() {
        for i := 0; i < 5; i++ {
                go add()
        }
}

Given the race detector has no known false positives, there should be very little reason to exclude a function from its scope.

//go:nosplit

Hopefully by now everyone knows that a goroutine’s stack is not a static allocation. Instead each goroutine starts with a few kilobytes of stack and, if necessary, will grow.

The technique that the runtime uses to manage a goroutine’s stack relies on each goroutine keeping track of its current stack usage. During the function preamble, a check is made to ensure there is enough stack space for the function to run. If not, the code traps into the runtime to grow the current stack allocation.

"".fn t=1 size=120 args=0x0 locals=0x80 
        0x0000 00000 (main.go:5)  TEXT    "".fn(SB), $128-0
         0x0000 00000 (main.go:5)  MOVQ    (TLS), CX
         0x0009 00009 (main.go:5)  CMPQ    SP, 16(CX)
         0x000d 00013 (main.go:5)  JLS     113

Now, this preamble is quite small, as we see it’s only a few instructions on x86.

A load from an offset of the current g register, which holds a pointer to the current goroutine.
A compare against the stack usage for this function, which is a constant known at compile time.
And a branch to the slow path, which is rare and easily predictable.

But sometimes even this overhead is unacceptable, and occasionally, unsafe, if you’re the runtime package itself. So a mechanism exists, via an annotation in the compiled form of the function to skip the stack check preamble. It should also be noted that the stack check is inserted by the linker, not the compiler, so it applies to assembly functions and, while they existed, C functions.

Up until Go 1.4, the runtime was implemented in a mix of Go, C and assembly.

// All reads and writes of g's status go through readgstatus, casgstatus 
// castogscanstatus, casfromgscanstatus.
 #pragma textflag NOSPLIT
 uint32 runtime·readgstatus(G *gp) {
         return runtime·atomicload(&gp->atomicstatus); 
}

In this example, runtime.readgstatus, we can see the C style #pragma textflag NOSPLIT. ⁴

When the runtime was rewritten in Go, a way to say that a particular function should not have the stack split check was still required. This was often needed as taking a stack split inside the runtime was forbidden because a stack split implicitly needs to allocate memory, which would lead to recursive behaviour. Hence #pragma textflag NOSPLIT became //go:nosplit.

// All reads and writes of g's status go through 
// readgstatus, casgstatus, castogscanstatus,
 // casfrom_Gscanstatus.
 //go:nosplit 
funcreadgstatus(gp *g) uint32 {
         return atomic.Load(&gp.atomicstatus)
 }

But this leads to a problem, what happens if you run out of stack with //go:nosplit?

If a function, written in Go or otherwise, uses //go:nosplit to say “I don’t want to grow the stack at this point”, the compiler still has to ensure it’s safe to run the function. Go is a memory safe language, we cannot let functions use more stack than they are allowed just because they want to avoid the overhead of the stack check. They will almost certainly corrupt the heap or another goroutine’s memory.

To do this, the compiler maintains a buffer called the redzone, a 768 byte allocation ⁵ at the bottom of each goroutines’ stack frame which is guaranteed to be available.

The compiler keeps track of the stack requirements of each function. When it encounters a nosplit function it accumulates that function’s stack allocation against the redzone. In this way, carefully written nosplit functions can execute safely against the redzone buffer while avoiding stack growth at inconvenient times.

This program uses nosplit to attempt to avoid stack splitting,

package main

type T [256]byte // a large stack allocated type

//go:nosplit
func A(t T) {
        B(t)
}

//go:nosplit
func B(t T) {
        C(t)
}

//go:nosplit
func C(t T) {
        D(t)
}

//go:nosplit
//go:noinline
func D(t T) {}

func main() {
        var t T
        A(t)
}

But will not compile because the compiler detects the redzone would be exhausted.

# command-line-arguments
main.C: nosplit stack overflow
        744     assumed on entry to main.A (nosplit)
        480     after main.A (nosplit) uses 264
        472     on entry to main.B (nosplit)
        208     after main.B (nosplit) uses 264
        200     on entry to main.C (nosplit)
        -64     after main.C (nosplit) uses 264

We occasionally hit this in the -N (no optimisation) build on the dashboard as the redzone is sufficient when optimisations are on, generally inlining small functions, but when inlining is disabled, stack frames are deeper and contain more allocations which are not optimised away.

Can you use //go:nosplit in your own code?

Can you use //go:nosplit in your own functions? Yes, I just showed you that you can. But it’s probably not necessary. Small functions would benefit most from this optimisation are already good candidates for inlining, and inlining is far more effective at eliminating the overhead of function calls than //go:nosplit.

You’ll note in the example above I showed I had to use //go:noinline to disable inlining which otherwise would have detected that D() actually did nothing, so the compiler would optimise away the entire call tree.

Of all the pragmas this one is the safest to use, as it will get spotted at compile time, and should generally not affect the correctness of your program, only the performance.

//go:noinline

This leads us to inlining.

Inlining ameliorates the cost of the stack check preamble, and in fact all the overheads of a function call, by copying the code of the inlined function into its caller. It’s a small trade off of possibly increased program size against reduced runtime by avoiding the function call overhead. Inlining is the key compiler optimisation because it unlocks many other optimisations.

Inlining is most effective with small, simple, functions as they do relatively little work compared to their overhead. For large functions, inlining offers less benefit as the overhead of the function call is small compared to the time spent doing work. However, what if you don’t want a function inlined? It turned out this was the case when developing the new SSA backend, as inlining would cause the nascent compiler to crash. I’ll let Keith Randall explain.

We particularly need this feature on the SSA branch because if a function is inlined, the code contained in that function might switch from being SSA-compiled to old-compiler-compiled. Without some sort of noinline mark the SSA-specific tests might not be testing the SSA backend at all.

The decision to control what can be inlined is made by a function inside the compiler called, ishairy. Hairy statements are things like closures, for loops, range loops, select, switch, and defer. If you wanted to write a small function that you do not want to be inlined, and don’t want the to add any overhead to the function, which of those would you use? It turns out, the answer is switch.

Prior to the SSA compiler, switch {} would prevent a function being inlined, whilst also optimising to nothing, and this was used heavily in compiler test fixtures to isolate individual operations.

func f3a_ssa(x int) *int {
         switch {
         }
         return &x
 }

With the introduction of the SSA compiler, switch was no longer considered hairy as switch is logically the same as a list of if ... else if statements, so switch{} stopped being a placeholder to prevent inlining. The compiler developers debated how to represent the construct “please don’t inline this function, ever”, and settled on a new pragma, //go:noinline.

Can you use //go:noinline in your own code?

Absolutely, although I cannot think of any reason to do so off hand, save silly examples like this article.

But what about …

But wait, there are many more pragmas that Go supports that aren’t part of this set we’re discussing.

+build is implemented by the Go tool, not the compiler, to filter files passed to the compiler for build or test

//go:generate uses the same syntax as a pragma, but is only recognised by the generate tool.

package pdf // import "rsc.io/pdf"

What about the canonical import pragma added in Go 1.4, to force the go tool to refuse to compile packages not imported by their “canonical” name

//line /foo/bar.go:123

What about the //line directive that can renumber the line numbers in stack traces?

Wrapping up

Pragmas in Go have a rich history. I hope the retelling of this history has been interesting to you.

The wider arc of Go’s pragmas is they are used inside the standard library to gain a foothold to implement the runtime, including the garbage collector, in Go itself. Pragmas allowed the runtime developers to extend, the language just enough to meet the requirements of the problem. You’ll find pragmas used, sparingly, inside the standard library, although you’ll never find them listed in godoc.

Should you use these pragmas in your own programs? Possibly //go:noescape is useful when writing assembly glue, which is done quite often in the crypto packages. For the other pragmas, outside demos and presentations like this, I don’t think there is much call for using them.

But please remember, magic comments are not part of the language spec, if you use GopherJS, or llgo, or gccgo, your code will still compile, but may operate differently. So please use this advice sparingly.

Caveat emptor.

If aligned memory writes are atomic, why do we need the sync/atomic package?

This is a post inspired by a question on the Go Forum. The question, paraphrased, was “If properly aligned writes are guaranteed to be atomic by the processor, why does the race detector complain?”

The answer is, there are two uses of the word atomic in play here. The first, the one the OP references, is a property of most microprocessors that, as long as the address of the write is naturally aligned–if it’s a 32-bit value, say, then it is always written to an address which is a multiple of four–then nothing will observe a half written value.

To explain what that means, consider the opposite, an unaligned write where a 32-bit value is written to an address whose bottom two bits are not zero. In this case the processor has to split the write into two, spanning the boundary. This is known as a torn write as an observer on the bus could see this partially updated value.⁶

These words comes from a time before multiple processors were common. At that time the observers of a torn read or write would most likely be other agents on the ISA, VESA, or PCI bus like disk controllers or video cards. However, we now live in the multi-core age so we need to talk about caches and visibility.

Since almost the beginning of computing, the CPU has run faster than main memory. That is to say, the performance of a computer is strongly related to the performance of its memory. This is known as the processor/memory gap. To bridge this gap processors have adopted caches which store recently accessed memory in a small, fast, store, closer to the processor.⁷ Because caches also buffer writes back to main memory, while the property that an aligned address will be atomic remains, when that write occurs has become less deterministic.⁸ This is the domain of second use of the word atomic, the one implemented by the sync/atomic package.

In a modern multiprocessor system, a write to main memory will be buffered in multiple levels of caches before hitting main memory. This is done to to hide the latency of main memory, but in doing so it means that communicating between processors using main memory is now imprecise; a value read from memory may have already been overwritten by one processor, however the new value has not made its way through the various caches yet.

To solve this ambiguity you need to use a memory fence, also known as a memory barrier. A memory write barrier operation tells the processor that it has to wait until all the outstanding operations in its pipeline, specifically writes, have been flushed to main memory. This operation also invalidates the caches

⁹

held by other processors, forcing them to retrieve the new value directly from memory. The same is true for reads, you use a memory read barrier to tell the processor to stop and synchronise with any outstanding writes to memory.

In terms of Go, read and write memory barrier operations are handled by the sync/atomic package, specifically the family of atomic.Load and atomic.Store functions respectively.

¹⁰

In answer to the OP’s question: to safely use a value in memory as a communication channel between two goroutines, the race detector will complain unless the sync/atomic package is used.

I’m talking about Go at DevFest Siberia 2017

In September i’ll be speaking about Go at events in Russia and Taiwan.

DevFest Siberia 2017, September 23rd and 24th

I’ve been accepted to give two presentations at the GDG Novosibirsk DevFest Siberia 2017 event in Russia.

High performance servers without the event loop

Conventional wisdom suggests that the key to high performance servers are native threads, or more recently event loops. Neither solution is without downside. Threads carry a high overhead in terms of scheduling cost and memory footprint. Event loops lessen those costs, but introduce their own requirements for a complex callback driven style.

Go is a general purpose programming language in use in a wide range of domains and is well suited to writing network software. Go was introduced in 2009 with the explicit goal of helping programmers write programs that could solve problems of Google’s scale, and that means writing high performance servers.

This talk will focus on the features of the Go language and runtime environment, that allow programmers to write simple, high performance network services without resorting to native threads or event loop-driven callbacks.

Workshop: Exploring the Go execution tracer

As a complement to my conference talk I’ll be teaching a workshop on the Go execution tracer. This workshop follows on from my GolangUK presentation from last year and my High Performance Go workshop, and specifically focuses on the Go execution tracer,

The execution tracer is a new profiling and tracing facility integrated into Go since version 1.5. Unlike “external” profiling tools like pprof, valgrind, or perf, the execution tracer is integrated directly into the Go runtime, giving it detailed knowledge of the scheduler, the network poller, and the garbage collector.

In this workshop I will explain the operation of the execution tracer, how to collect, then analyse, the results of a trace. The audience will step through a set of problems, framed as the trace output of unknown programs to learn how to interpret the results from the execution tracer, improve our code to address performance or scalability bottlenecks, and verify the results.

You can find more information and purchase tickets for the event at the DevFest 2017 website.

Go Taiwan Meetup, Taipei, September 26th

I’ll be visiting the Go meetup in Taipei, Taiwan on the 26th of September. You can find details of the meetup soon on the GolangTW website.

Russian translation by Elena Grahovac
В сентябре я расскажу о Go на мероприятиях в России и Тайване.

DevFest Siberia 2017, Новосибирск, 23-24 сентября

Оргкомитет конференции DevFest Siberia 2017, которая пройдет в Новосибирске (Россия), принял мои заявки на два выступления.

Высокопроизводительные серверы без цикла событий

Бытует мнение, что ключом к написанию высокопроизводительных серверов является использование собственных потоков (native threads), место которых в последнее время занимают циклы событий (event loops). Однако, у обоих этих решений есть свои недостатки. Потоки, с точки зрения затрат на планирование и объем памяти, несут высокие накладные расходы. Циклы событий уменьшают эти затраты, но ставят определенные требования к витиеватым принципам разработки, основанной на callback’ах.

Go – это универсальный язык программирования, который используется в широком диапазоне областей и отлично подходит для написания сетевого программного обеспечения. Go был представлен в 2009 году, его цель – помочь разработчикам писать программы, которые могли бы решать задачи масштаба Google, то есть задачи написания высокопроизводительных серверов.

В этом докладе будут рассмотрены особенности языка и среды выполнения (runtime) Go, которые позволяют программистам писать простые высокопроизводительные сетевые сервисы, не прибегая к собственным потокам или callback’ам, связанным с циклом событий.

Мастер-класс: Изучаем трассировщик выполнения Go

В качестве дополнения к докладу я проведу мастер-класс по трассировщику выполнения (execution tracer) Go. Этот мастер-класс вытекает из моего доклада «Семь способов профилирования программы, написанной на Go» с прошлогодней конференции GolangUK и из моего мастер-класса «Высокая производительность Go». Новый мастер-класс фокусируется на трассировщике выполнения Go.

Трассировщик выполнения – это новое средство профилирования и трассировки, интегрированное в Go, начиная с версии 1.5. В отличие от «внешних» инструментов профилирования, таких как pprof, valgrind или perf, трассировщик выполнения интегрируется непосредственно в среду выполнения Go, предоставляя подробные сведения о планировщике (scheduler), сетевом поллере (network poller) и сборщике мусора (garbage collector).

В рамках мастер-класса я объясню, как работает трассировщик выполнения, и расскажу о том, как собрать, а затем проанализировать результаты трассировки. Шаг за шагом участники пройдут через набор задач, оформленных как вывод трассировки неизвестных программ, и узнают, как интерпретировать результаты трассировщика, улучшить код, устранить узкие места производительности или масштабируемости и проверить результаты.

Найти больше информации и приобрести билеты можно на сайте DevFest Siberia 2017.

Go Taiwan Meetup, Тайбэй, 26-е сентября

Я приеду на Go-митап в Тайбее (Тайвань) 26-го сентября. Детали мероприятия скоро появятся на сайте GolangTW.

Context isn’t for cancellation

This is an experience report about the use of, and difficulties with, the context.Context facility in Go.

Many authors, including myself, have written about the use of, misuse of, and how they would change, context.Context in a future iteration of Go. While opinions differs on many aspects of context.Context, one thing is clear–there is almost unanimous agreement that the Context.WithValue method on the context.Context interface is orthogonal to the type’s role as a mechanism to control the lifetime of request scoped resources.

Many proposals have emerged to address this apparent overloading of context.Context with a copy on write bag of values. Most approximate thread local storage so are unlikely to be accepted on ideological grounds.

This post explores the relationship between context.Context and lifecycle management and asks the question, are attempts to fix Context.WithValue solving the wrong problem?

Context is a request scoped paradigm

The documentation for the context package strongly recommends that context.Context is only for request scoped values:

Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx:
func DoSomething(ctx context.Context, arg Arg) error {
        // ... use ctx ...
}

Specifically context.Context values should only live in function arguments, never stored in a field or global. This makes context.Context applicable only to the lifetime of resources in a request’s scope. Given Go’s lineage on the server, this is a compelling use case. However, there exist other use cases for cancellation where the lifetime of the resource extends beyond a single request. For example, a background goroutine as part of an agent or pipeline.

Context as a hook for cancellation

The stated goal of the context package is:

Package context defines the Context type, which carries deadlines, cancelation signals, and other request-scoped values across API boundaries and between processes.

Which sounds great, but belies its catch-all nature. context.Context is used in three independent, yet sometimes conflated, scenarios:

Cancellation via context.WithCancel.
Timeout via context.WithDeadline.
A bag of values via context.WithValue.

At any point, a context.Context value can represent any one, or all three of these independent concerns. However, context.Context‘s most important facility, broadcasting a cancellation signal, is incomplete as there is no way to wait for the signal to be acknowledged.

Looking to the past

As this is an experience report, it would be germane to highlight some actual experience. In 2012 Gustavo Niemeyer wrote a package for goroutine lifecycle management called tomb which is used by Juju for the management of the worker goroutines within the various agents in the Juju system.

tomb.Tombs are concerned only with lifecycle management. Importantly, this is a generic notion of a lifecycle, not tied exclusively to a request, or a goroutine. The scope of the resource’s lifetime is defined simply by holding a reference to the tomb value.

A tomb.Tomb value has three properties:

The ability to signal the owner of the tomb to shut down.
The ability to wait until that signal has been acknowledged.
A way to capture a final error value.

However, tomb.Tombs have one drawback, they cannot be shared across multiple goroutines. Consider this prototypical network server where a tomb.Tomb cannot replace the use of sync.WaitGroup.

func serve(l net.Listener) error {
        var wg sync.WaitGroup
        var conn net.Conn
        var err error
        for {
                conn, err = l.Accept()
                if err != nil {
                        break
                }
                wg.Add(1)
                go func(c net.Conn) {
                        defer wg.Done()
                        handle(c)
                }(conn)
        }
        wg.Wait()
        return err
}

To be fair, context.Context cannot do this either as it provides no built in mechanism to acknowledge cancellation. What is needed is a form of sync.WaitGroup that allows cancellation, as well as waiting for its participants to call wg.Done.

Context should become, well, just context

The purpose of the context.Context type is in it’s name:

context /kɒntɛkst/ noun
The circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood.

I propose context.Context becomes just that; a request scoped association list of copy on write values.

Decoupling lifetime management from context.Context as a store of request scoped values will hopefully highlight that request context and lifecycle management are orthogonal concerns.

Best of all, we don’t need to wait til Go 2.0 to explore these ideas like Gustavo’s tomb package.

Typed nils in Go 2

This is an experience report about a gotcha in Go that catches every Go programmer at least once. The following program is extracted from a larger version that caused my co-workers to lose several hours today.

package main

import "fmt"

type T struct{}

func (t T) F() {}

type P interface {
          F()
}

func newT() *T { return new(T) }

type Thing struct {
        P
}

func factory(p P) *Thing {
        return &Thing{P: p}
}

const ENABLE_FEATURE = false

func main() {
        t := newT()
        t2 := t
        if !ENABLE_FEATURE {
                t2 = nil
        }
        thing := factory(t2)
        fmt.Println(thing.P == nil)
}

This distilled version of the program in question, while non-sensical, contains all the attributes of the original. Take some time to study the program and ask yourself, does the program print true or false?

nil != nil

Not to spoil the surprise, but the program prints false. The reason is, while nil is assigned to t2, when t2 is passed to factory it is “boxed” into an variable of type P; an interface. Thus, thing.P does not equal nil because while the value of P was nil, its concrete type was *T.

Typed `nil`

You’ve probably realised the cause of this problem is the dreaded typed nil, a gotcha that has its own entry in the Go FAQ. The typed nil emerges as a result of the definition of a interface type; a structure which contains the concrete type of the value stored in the interface, and the value itself. This structure can’t be expressed in pure Go, but can be visualised with this example:

var n int = 200
var i interface{} = n

The interface value i is assigned a copy of the value of n, so i‘s type slot holds n‘s type; int, and it’s data slot holds the value 200. We can write this more concisely as (int, 200).

In the original program we effectively have the following:

var t2 *T = nil
var p P = t2

Which results in p, using our nomenclature, holding the value (*T, nil). So then, why does the expression p == nil evaluate to false? The explanation I prefer is:

nil is a compile time constant which is converted to whatever type is required, just as constant literals like 200 are converted to the required integer type automatically.
Given the expression p == nil, both arguments must be of the same type, therefore nil is converted to the same type as p, which is an interface type. So we can rewrite the expression as (*T, nil) == (nil, nil).
As equality in Go almost always operates as a bitwise comparison it is clear that the memory bits which hold the interface value (*T, nil) are different to the bits that hold (nil, nil) thus the expression evaluates to false.

Put simply, an interface value is only equal to nil if both the type and the value stored inside the interface are both nil.

For a detailed explanation of the mechanics behind Go’s interface implementation, Russ Cox has a great post on his blog.

The future of typed `nil`s in Go 2

Typed nils are an entirely logical result of the way dynamic types, aka interfaces, are implemented, but are almost never what the programmer wanted. To tie this back to Russ’s GopherCon keynote, I believe typed nils are an example where Go fails to scale for programming teams.

This explanation has consumed 700 words–and several hours over chat today–to explain, and in the end my co-workers were left with a bad taste in their mouths. The clarity of interfaces was soured by a suspicion that gotchas like this were lurking in their codebase. As an experienced Go programmer I’ve learnt to be wary of the possibility of a typed nil during code review, but it is unfortunate that they remain something that each Go programmer has to learn the hard way.

For Go 2.0 I’d like to start the discussion of what it would mean if comparing an interface value to nil considered the value portion of the interface such that the following evaluated to true:

var b *bytes.Buffer
var r io.Reader = b
fmt.Println(r == nil)

There are obviously some subtleties that this pithy demand fails to capture, but a desire to make this seemingly straight forward comparison less error prone would, at least in my mind, make Go 2 easier to scale to larger development teams.

Should Go 2.0 support generics?

A long time ago, someone–I normally attribute this to David Symonds, but I can’t be sure he was the first to say it–said that the reason for adding generics to Go would be the reason for calling it Go 2.0. That is to say, adding generics to the language would be half baked if they were not used throughout the standard library. I wrote about this in a series of blog posts where I explored what I felt would be the repercussions of integrating templated types into Go.

Do I think Go should have generics? Well, there are really two answers to that question.

As I argued in my Simplicity Debt posts, mainstream programmers in 2017 expect a set of features in their languages. Many of us work in polyglot environments. Even if we want to be writing in Go as much as possible, there’s usually some Javascript, some CSS, some Python, maybe some Java, Swift, C#, PHP or even C++ in the project. Maybe this will change in the future, but right now, if you’re a commercial programmer working for a crust, every day you’ll touch a bunch of languages, so their differences tend to rub against one another.

Mainstream programmers expect static typing, not for performance, but for readability and maintainability–just look at what Typescript and Dart are bringing to Javascript, and Python’s formative efforts with optional typing.
Mainstream programmers expect concurrency. They expect to be able to do more than one thing at a time–just look at node.js and the compromises programmers were prepared to make to move away from heavy-weight thread per connection models. Go is obviously well positioned here.
Mainstream programmers expect some form of templated types because they’re used to it in the other languages they interact with alongside Go.

So my first answer is: Go should have some form of generics because it is a mainstream, imperative, block scoped language and it is expected these days.

My second answer is if the designers of the language choose not to add templated types or parameterised functions–and keep in mind that I am not one of the language designers, only an exuberant fan–because, as I wrote in my series of posts, the repercussions for the simplicity and readability of the language may prove too jarring. If that were to happen, my recommendation would be that Go should own that decision.

What do I mean by that? Well, the best explanation I can give is a counterexample. Let’s look at Haskell. Haskell is what most functional programmers consider to be the baseline for a real FP language, and thus it looks pretty much like nothing programmers schooled in side effect ridden, block structured, imperative languages are used to. But Haskell programmers own that, they own their difference. A Haskell programmer doesn’t see a reason to make their language work more like PHP, or C++, or Rust, or even Go, and they are happy to explain the Haskell way of doing things to anyone who asks. My point is that if Go is not going to have a story for templated types, then we need to own it, just like Haskell programmers own their decisions.

This isn’t simply a case of saying “nope, sorry, no generics for Go 2.0, maybe in another 5 years”, but a more fundamental statement that they are not something that will be implemented in Go because we believe there is a better way to solve the underlying problem. Note that I did not say a better way to implement a templated type or parameterised function, but a better way to solve the underlying business problem. There is a difference.

This isn’t without precedent, Go was one of the first C style languages to eschew type inheritance, a decision which lead to a radical simplification of the language and a focus on the mantras of communicating intent via interfaces, and encapsulation over inheritance. Before Go, it was assumed that a mainstream language would have classes and a type hierarchy, nowadays that is less true.

So, should Go 2.0 have generics? If the decision is to add them then I’m sure it can be done, after all the syntax is the least important part of the decision, and there is a wealth of prior art in other languages to guide us. However, if the decision is not to add templated types, then it should be made so explicitly. Then it is incumbent upon all Go programmers to explain the Go Way of solving problems.

How to find out which Go version built your binary

This is a short post describing the procedure for discovering which version of Go was used to compile a Go binary.

This procedure relies on the fact that each Go program includes a copy of the version string reported by runtime.Version() . Linker magic ensures that this value will be present in the final binary irrespective of whether runtime.Version() is called by the resulting program. The value in question is stored in the runtime.buildVersion variable and can be recovered by a debugger.

The rest of this post describes the mechanisms for recovering the contents of runtime.buildVersion on various platforms.

Linux/FreeBSD/OpenBSD/NetBSD

If you’re on a Linux or *BSD platform, you can recover the binary build version with gdb.

% gdb $HOME/bin/godoc
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
(gdb) p 'runtime.buildVersion'
$1 = 0xa9ceb8 "go1.8.3"

Darwin

The debugging situation on OS X isn’t great, but here are several options.

gdb

gdb was removed from the XCode toolchain following the switch from gcc to llvm. If you are running a version of XCode that has gdb, you should used the instructions from the previous section.

Delve

Delve can be used to print the value of runtime.buildVersion.

% dlv exec $HOME/bin/godoc
Type 'help' for list of commands.
(dlv) b main.main
Breakpoint 1 set at 0x15596eb for main.main() ./golang.org/x/tools/cmd/godoc/main.go:156
(dlv) c
> main.main() ./golang.org/x/tools/cmd/godoc/main.go:156 (hits goroutine(1):1 total:1) (PC: 0x15596eb)
   151:                 }
   152:         }
   153:         log.Fatalf("too many redirects")
   154: }
   155:
=> 156: func main() {
   157:         flag.Usage = usage
   158:         flag.Parse()
   159:
   160:         playEnabled = *showPlayground
   161:
(dlv) p runtime.buildVersion
"go1.8.1"

lldb

Christian Witts reports on Twitter that XCode 8.3.3 ships with a version of lldb, version 370.0.42, that can interpret the Go string syntax.

$ lldb $HOME/bin/godoc
(lldb) b main.main
(lldb) run
(lldb) p runtime.buildVersion

I’ve tested earlier versions of lldb and found they do not work. Instread, use delve

Windows

Good news, everyone. Brian Ketelsen of GopherCon and GoTime.fm fame, reports that delve works perfectly on Windows for recovering this binaries’ build version.

PS C:\Users\bkete\go\src\http://github.com \derekparker\delve\cmd\dlv> dlv exec C:\Users\bkete\go\bin\dlv.exe
Type 'help' for list of commands.
(dlv) b main.main
Breakpoint 1 set at 0x8ec666 for main.main() c:/Users/bkete/go/src/github.com/derekparker/delve/cmd/dlv/main.go:11
(dlv) c
> main.main() c:/Users/bkete/go/src/github.com/derekparker/delve/cmd/dlv/main.go:11 (hits goroutine(1):1 total:1) (PC: 0x8ec666)
     6: )
     7:
     8: // Build is the git sha of this binaries build.
     9: var Build string
    10:
=>  11: func main() {
    12:         http://version.DelveVersion.Build  = Build
    13:         http://cmds.New ().Execute()
    14: }
(dlv) p runtime.buildVersion
"go1.8.1"

If someone wants to figure out the correct WinDbg or Visual Studio Debugger incantation, please let me know and I’ll link to you from this post.

Simplicity Debt Redux

In my previous post I discussed my concerns the additional complexity adding generics or immutability would bring to a future Go 2.0. As it was an opinion piece, I tried to keep it around 500 words. This post is an exploration of the most important (and possibly overlooked) point of that post.

Indeed, the addition of [generics and/or immutability] would have a knock-on effect that would profoundly alter the way error handling, collections, and concurrency are implemented.

Specifically, what I believe would be the possible knock-on effect of adding generics or immutability to the language.

Error handling

A powerful motivation for adding generic types to Go is to enable programmers to adopt a monadic error handling pattern. My concerns with this approach have little to do with the notion of the maybe monad itself. Instead I want to explore the question of how this additional form of error handling might be integrated into the stdlib, and thus the general population of Go programmers.

Right now, to understand how io.Reader works you need to know how slices work, how interfaces work, and know how nil works. If the if err != nil { return err } idiom was replaced by an option type or maybe monad, then everyone who wanted to do basic things like read input or write output would have to understand how option types or maybe monads work in addition to discussion of what templated types are, and how they are implemented in Go.

Obviously it’s not impossible to learn, but it is more complex than what we have today. Newcomers to the language would have to integrate more concepts before they could understand basic things, like reading from a file.

The next question is, would this monadic form become the single way errors are handled? It seems confusing, and gives unclear guidiance to newcomers to Go 2.0, to continue to support both the error interface model and a new monadic maybe type. Also, if some form of templated maybe type was added, would it be a built in, like error, or would it have to be imported in almost every package. Note: we’ve been here before with os.Error.

What began as the simple request to create the ability to write a templated maybe or option type has ballooned into a set of question that would affect every single Go package ever written.

Collections

Another reason to add templated types to Go is to facilitate custom collection types without the need for interface{} boxing and type assertions.

On the surface this sounds like a grand idea, especially as these types are leaking into the standard library anyway. But that leaves the question of what to do with the built in slice and map types. Should slices and maps co-exist with user defined collections, or should they be removed in favour of defining everything as a generic type?

To keep both sounds redundant and confusing, as all Go developers would have to be fluent in both and develop a sophisticated design sensibility about when and where to choose one over the other. But to remove slices and maps in favour of collection types provided by a library raises other questions.

Slicing

For example, if there is no slice type, only types like a vector or linked list, what happens to slicing? Does it go away, if so, how would that impact common operations like handling the result a call to io.Reader.Read? If slicing doesn’t go away, would that require the addition of operator overloading so that user defined collection types can implement a slice operator?

Then there are questions on how to marry the built in map type with a user defined map or set. Should user defined maps support the index and assignment operators? If so, how could a user defined map offer both the one and two return value forms of lookup without requiring polymophic dispatch based on the number of return arguments? How would those operators work in the presence of set operations which have no value, only a key?

Which types could use the delete function? Would delete need to be modified to work with types that implement some kind of Deleteable interface? The same questions apply to append, len, cap, and copy.

What about addressability? Values in the built in map type are not addressable, but should that be permitted or disallowed for user defined map types? How would that interact with operator overloading designed to make user defined maps look more like the built in map?

What sounded like a good idea on paper—make it possible for programmers to define their own efficient collection data types—has highlighted how deeply integrated the built in map and slice are and spawned not only a requirement for templated types, but operator overloading, polymorphic dispatch, and some kind of return value addressability semantics.

How could you implement a vector?

So, maybe you make the argument that now we have templated types we can do away with the built in slice and map, and replace them with a Java-esque list of collection types.

Go’s Pascal-like array type has a fixed size known at compile time. How could you implement a growable vector without resorting to unsafe hacks? I’ll leave that as an exercise to the reader. But I put it to you that if you cannot implement simple templated vector type with the memory safety we enjoy today with slices, then that is a very strong design smell.

Iteration

I’ll admit that the inability to use the for ... range statement over my own types was something that frustrated me for a long time when I came to Go, as I was accustomed to the flexibility of the iterator types in the Java collections library.

But iterating over in-memory data structures is boring—what you really want to be able to do is compose iterators over database results and network requests. In short, data from outside your process—and when data is outside your process, retrieving it might fail. In that case you have a choice, does your Iterable interface return a value, a value and an error, or perhaps you go down the option type route. Each would require a new form of range loop semantic sugar in an area which already contains its share of footguns.

You can see that adding the ability to write template collection types sounds great on paper, but in practice it would perpetuate a situation where the built in collection types live on in addition to their user defined counterparts. Each would have their strengths and weaknesses, and a Go developer would have to become proficient in both. This is something that Go developers just don’t have to think about today as slices and maps are practically ubiquitous.

Immutability

Russ wrote at the start of the year that a story for reference immutability was an important area of exploration for the future of Go. Having surveyed hundreds of Go packages and found few which are written with an understanding of the problem of data races—let alone actually tried running their tests under the race detector—it is tempting to agree with Russ that the ‘after the fact’ model of checking for races at run time has some problems.

On balance, after thinking about the problems of integrating templated types into Go, I think if I had to choose between generics and immutability, I’d choose the latter.

But the ability to mark a function parameter as const is insufficient, because while it restricts the receiver from mutating the value, it does not prohibit the caller from doing so, which is the majority of the data races I see in Go programs today. Perhaps what Go needs is not immutability, but ownership semantics.

While the Rust ownership model is undoubtedly correct—iff your program complies, it has no data races—nobody can argue that the ownership model is simple or easy for newcomers. Nor would adding an extra dimension of immutability to every variable declaration in Go be simple as it would force every user of the language to write their programs from the most pessimistic standpoint of assuming every variable will be shared and will be mutated concurrently.

In conclusion

These are some of the knock on effects that I see of adding generics or immutability to Go. To be clear, I’m not saying that it should not be done, in fact in my previous post I argued the opposite.

What I want to make clear is adding generics or immutability has nothing to do with the syntax of those features, little to do with their underlying implementation, and everything to do with the impact on the overall complexity budget of the language and its libraries, that these features would unlock.

David Symonds argued years ago that there would be no benefit in adding generics to Go if they were not used heavily in the stdlib. The question, and concern, I have is; would the result be more complex than what we have today with our quaint built in slice, map, and error types?

I think it is worth keeping in mind the guiding principals of the language—simplicity and readability. The design of Go does not follow the accretive model of C++ or Java The goal is not to reinvent those languages, minus the semicolons.

Simplicity Debt

Fifteen years ago Python’s GIL wasn’t a big issue. Concurrency was something dismissed as probably unnecessary. What people really was needed was a faster interpreter, after all, who had more than one CPU? But, slowly, as the requirement for concurrency increased, the problems with the GIL increased.

By the time this decade rolled around, Node.js and Go had arrived on the scene, highlighting the need for concurrency as a first class concept. Various async contortions papered over the single threaded cracks of Python programs, but it was too late. Other languages had shown that concurrency must be a built-in facility, and Python had missed the boat.

When Go launched in 2009, it didn’t have a story for templated types. First we said they were important, but we didn’t know how to implement them. Then we argued that you probably didn’t need them, instead Go programmers should focus on interfaces, not types. Meanwhile Rust, Nim, Pony, Crystal, and Swift showed that basic templated types are a useful, and increasingly, expected feature of any language—just like concurrency.

There is no question that templated types and immutability are on their way to becoming mandatory in any modern programming language. But there is equally no question that adding these features to Go would make it more complex.

Just as efforts to improve Go’s dependency management situation have made it easier to build programs that consume larger dependency graphs, producing larger and more complex pieces of software, efforts to add templated types and immutability to the language would unlock the ability to write more complex, less readable software. Indeed, the addition of these features would have a knock on effect that would profoundly alter the way error handling, collections, and concurrency are implemented.

I have no doubt that adding templated types to Go will make it a more complicated language, just as I have no doubt that not adding them would be a mistake–lest Go find itself, like Python, on the wrong side of history. But, no matter how important and useful templated types and immutability would be, integrating them into a hypothetical Go 2 would decrease its readability and increase compilation times—two things which Go was designed to address. They would, in effect, impose a simplicity debt.

If you want generics, immutability, ownership semantics, option types, etc, those are already available in other languages. There is a reason Go programmers choose to program in Go, and I believe that reason stems from our core tenets of simplicity and readability. The question is, how can we pay down the cost in complexity of adding templated types or immutability to Go?

Go 2 isn’t here yet, but its arrival is a lot more certain than previously believed. As it stands now, generics or immutability can’t just be added to Go and still call it simple. As important as the discussions on how to add these features to Go 2 would be, equal weight must be given to the discussion of how to first offset their inherent complexity.

We have to build up a bankroll to spend on the complexity generics and immutability would add, otherwise Go 2 will start its life in simplicity debt.

Next: Simplicity Debt Redux

What is a map function?

Go’s map is a hashmap

The hash function

Important properties of a hash function

The hashmap data structure

Four properties of a hash map

Hashmaps in other languages

C++

Java

Tradeoffs

C++ templated std::unordered_map

Advantages

Disadvantages

Java util Hashmap

Advantages

Disadvantages

Go’s hashmap implementation

Does the Go runtime use interface{} ?

Does the compiler use code generation?

Compile time rewriting

Only one copy of the map code

Conclusion

But first, a history lesson

Does Go have pragmas?

A word of caution ?

//go:noescape

Escape analysis

os.File.Read

Can you use //go:noescape in your code?

//go:norace

Should you use //go:norace in your own code?

//go:nosplit

Can you use //go:nosplit in your own code?

//go:noinline

Can you use //go:noinline in your own code?

But what about …

Wrapping up

DevFest Siberia 2017, September 23rd and 24th

High performance servers without the event loop

Workshop: Exploring the Go execution tracer

Go Taiwan Meetup, Taipei, September 26th

DevFest Siberia 2017, Новосибирск, 23-24 сентября

Высокопроизводительные серверы без цикла событий

Мастер-класс: Изучаем трассировщик выполнения Go

Go Taiwan Meetup, Тайбэй, 26-е сентября

Context is a request scoped paradigm

Context as a hook for cancellation

Looking to the past

Context should become, well, just context

nil != nil

Typed nil

The future of typed nils in Go 2

Linux/FreeBSD/OpenBSD/NetBSD

Darwin

gdb

Delve

lldb

Windows

Error handling

Collections

Slicing

How could you implement a vector?

Iteration

Immutability

In conclusion

C++ templated `std::unordered_map`

Does the Go runtime use interface{} ?

Typed `nil`

The future of typed `nil`s in Go 2