Are Go maps sensitive to data races ?

Panic messages from unexpected program crashes are often reported on the Go issue tracker. An overwhelming number of these panics are caused by data races, and an overwhelming number of those reports centre around Go’s built in map type.

unexpected fault address 0x0
fatal error: fault
[signal 0x7 code=0x80 addr=0x0 pc=0x40873b]

goroutine 97699 [running]:
runtime.throw(0x17f5cc0, 0x5)
    /usr/local/go/src/runtime/panic.go:527
runtime.sigpanic()
    /usr/local/go/src/runtime/sigpanic_unix.go:21
runtime.mapassign1(0x12c6fe0, 0xc88283b998, 0xc8c9b63c68, 0xc8c9b63cd8)
    /usr/local/go/src/runtime/hashmap.go:446

Why is this so ? Why is a map commonly involved with a crash ? Is Go’s map implementation inherently fragile ?

To cut to the chase: no, there is nothing wrong with Go’s map implementation. But if there is nothing wrong with the implementation, why do maps and panic reports commonly find themselves in close proximity ?

There are three reasons that I can think of.

Maps are often used for shared state

Maps are fabulously useful data structures and this makes them perfect for tasks such as a shared cache of precomputed data or a lookup table of outstanding requests. The common theme here is the map is being used to store data shared across multiple goroutines.

Maps are more complex structures

Compared to the other built in data types like channels and slices, Go maps are more complex — they aren’t just views onto a backing array of elements. Go maps contain significant internal state, and map iterators (for k, v := range m) contain even more.

Go maps are not goroutine safe, you must use a sync.Mutex, sync.RWMutex or other memory barrier primitive to ensure reads and writes are properly synchronised. Getting your locking wrong will corrupt the internal structure of the map.

Maps move things

Of all of Go’s built in data structures, maps are the only ones that move data internally. When you insert or delete entries, the map may need to rebalance itself to retain its O(1) guarantee. This is why map values are not addressable.

Without proper synchronisation different CPUs will have different representations of the map’s internal structure in their caches. Although the language lawyers will tell you that a program with a data race exhibits undefined behaviour, it’s easy to see how having a stale copy of a map’s internal structure can lead to following a stale pointer to oblivion.

Please use the race detector

Go ships with a data race detector that works on Windows, Linux, FreeBSD and OSX. The race detector will spot this issue, and many more.

Please use it when testing your code.

Ice cream makers and data races

This is a post about data races. The code for this post lives on Github, github.com/davecheney/benandjerry.

The example program simulates two Ice cream makers, Ben and Jerry, who greet their customers randomly.

package main

import "fmt"

type IceCreamMaker interface {
        // Hello greets a customer
        Hello()
}

type Ben struct {
        name string
}

func (b *Ben) Hello() {
        fmt.Printf("Ben says, \"Hello my name is %s\"\n", b.name)
}

type Jerry struct {
        name string
}

func (j *Jerry) Hello() {
        fmt.Printf("Jerry says, \"Hello my name is %s\"\n", j.name)
}

func main() {
        var ben = &Ben{"Ben"}
        var jerry = &Jerry{"Jerry"}
        var maker IceCreamMaker = ben

        var loop0, loop1 func()

        loop0 = func() {
                maker = ben
                go loop1()
        }

        loop1 = func() {
                maker = jerry
                go loop0()
        }

        go loop0()

        for {
                maker.Hello()
        }
}

It’s a data race, silly

Most programmers should easily spot the data race in this program.

The loop functions are changing the value of maker without using a lock, so it is undefined which implementation of Hello will be called when maker.Hello() is executed by the for loop in the main function.

Some programmers appear to be happy with this data race; either Ben or Jerry will greet the customer, it doesn’t matter which.

Lets run this code, and see what happens.

% env GOMAXPROCS=2 go run main.go
...
Ben says, "Hello my name is Ben"
Jerry says, "Hello my name is Jerry"
Jerry says, "Hello my name is Jerry"
Ben says, "Hello my name is Jerry"
Ben says, "Hello my name is Ben"
...

What! Hold up. Ben sometimes thinks that he is Jerry. How is this possible?

Interface values

The key to understanding this race is to understand how interface values are represented in memory.

An interface is conceptually a struct with two fields.
Interface
If we were to describe an interface in Go, it would look something like this.

type interface struct {
       Type uintptr     // points to the type of the interface implementation
       Data uintptr     // holds the data for the interface's receiver
}

Type points to a structure that describes the type of the value that implements this interface. Data points to the value of the implementation itself. The contents of Data are passed as the receiver of any method called via the interface.

For the statement var maker IceCreamMaker = ben, the compiler will generate code that does the following.
var marker IceCreamMaker = ben
The interface’s Type field is set to point to the definition of the *Ben type, and the Data field contains a copy of ben, that is, a pointer to a Ben value.

When loop1() executes the statement, maker = jerry, both fields of the interface value must be updated.
marker = jerry

Type now points to the definition of a *Jerry and Data contains a pointer to an instance of Jerry.

The Go memory model says that writes to a single machine word will be atomic, but interfaces are two word values. It is possible that another goroutine may observe the contents of the interface value while it is being changed. In this case it may see something like this
Data race in progress
And so Jerry‘s Hello() function is called with ben as the receiver.

Conclusion

There is no such thing as a safe data race. Your program either has no data races, or its operation is undefined.

In this example, the layout of the Ben and Jerry structs were identical in memory, so they were in some sense compatible. Imagine the chaos that would occur if they had different memory representations (this is left as an exercise to the reader).

The Go race detector will spot this error, and many others, and is as simple to use as adding the -race flag to your go test, build, or install command.

Bonus question

In the example code, the Hello method is declared on Ben or Jerry‘s pointer receiver. If this were instead declared as a method on the Ben or Jerry value, would this solve the data race ?

Futher reading