Tag Archives: data race

If aligned memory writes are atomic, why do we need the sync/atomic package?

This is a post inspired by a question on the Go Forum. The question, paraphrased, was “If properly aligned writes are guaranteed to be atomic by the processor, why does the race detector complain?”

The answer is, there are two uses of the word atomic in play here. The first, the one the OP references, is a property of most microprocessors that, as long as the address of the write is naturally aligned–if it’s a 32-bit value, say, then it is always written to an address which is a multiple of four–then nothing will observe a half written value.

To explain what that means, consider the opposite, an unaligned write where a 32-bit value is written to an address whose bottom two bits are not zero. In this case the processor has to split the write into two, spanning the boundary. This is known as a torn write as an observer on the bus could see this partially updated value.¹

These words comes from a time before multiple processors were common. At that time the observers of a torn read or write would most likely be other agents on the ISA, VESA, or PCI bus like disk controllers or video cards. However, we now live in the multi-core age so we need to talk about caches and visibility.

Since almost the beginning of computing, the CPU has run faster than main memory. That is to say, the performance of a computer is strongly related to the performance of its memory. This is known as the processor/memory gap. To bridge this gap processors have adopted caches which store recently accessed memory in a small, fast, store, closer to the processor.² Because caches also buffer writes back to main memory, while the property that an aligned address will be atomic remains, when that write occurs has become less deterministic.³ This is the domain of second use of the word atomic, the one implemented by the sync/atomic package.

In a modern multiprocessor system, a write to main memory will be buffered in multiple levels of caches before hitting main memory. This is done to to hide the latency of main memory, but in doing so it means that communicating between processors using main memory is now imprecise; a value read from memory may have already been overwritten by one processor, however the new value has not made its way through the various caches yet.

To solve this ambiguity you need to use a memory fence, also known as a memory barrier. A memory write barrier operation tells the processor that it has to wait until all the outstanding operations in its pipeline, specifically writes, have been flushed to main memory. This operation also invalidates the caches

⁴

held by other processors, forcing them to retrieve the new value directly from memory. The same is true for reads, you use a memory read barrier to tell the processor to stop and synchronise with any outstanding writes to memory.

In terms of Go, read and write memory barrier operations are handled by the sync/atomic package, specifically the family of atomic.Load and atomic.Store functions respectively.

⁵

In answer to the OP’s question: to safely use a value in memory as a communication channel between two goroutines, the race detector will complain unless the sync/atomic package is used.

Are Go maps sensitive to data races ?

Panic messages from unexpected program crashes are often reported on the Go issue tracker. An overwhelming number of these panics are caused by data races, and an overwhelming number of those reports centre around Go’s built in map type.

unexpected fault address 0x0
fatal error: fault
[signal 0x7 code=0x80 addr=0x0 pc=0x40873b]

goroutine 97699 [running]:
runtime.throw(0x17f5cc0, 0x5)
    /usr/local/go/src/runtime/panic.go:527
runtime.sigpanic()
    /usr/local/go/src/runtime/sigpanic_unix.go:21
runtime.mapassign1(0x12c6fe0, 0xc88283b998, 0xc8c9b63c68, 0xc8c9b63cd8)
    /usr/local/go/src/runtime/hashmap.go:446

Why is this so ? Why is a map commonly involved with a crash ? Is Go’s map implementation inherently fragile ?

To cut to the chase: no, there is nothing wrong with Go’s map implementation. But if there is nothing wrong with the implementation, why do maps and panic reports commonly find themselves in close proximity ?

There are three reasons that I can think of.

Maps are often used for shared state

Maps are fabulously useful data structures and this makes them perfect for tasks such as a shared cache of precomputed data or a lookup table of outstanding requests. The common theme here is the map is being used to store data shared across multiple goroutines.

Maps are more complex structures

Compared to the other built in data types like channels and slices, Go maps are more complex — they aren’t just views onto a backing array of elements. Go maps contain significant internal state, and map iterators (for k, v := range m) contain even more.

Go maps are not goroutine safe, you must use a sync.Mutex, sync.RWMutex or other memory barrier primitive to ensure reads and writes are properly synchronised. Getting your locking wrong will corrupt the internal structure of the map.

Maps move things

Of all of Go’s built in data structures, maps are the only ones that move data internally. When you insert or delete entries, the map may need to rebalance itself to retain its O(1) guarantee. This is why map values are not addressable.

Without proper synchronisation different CPUs will have different representations of the map’s internal structure in their caches. Although the language lawyers will tell you that a program with a data race exhibits undefined behaviour, it’s easy to see how having a stale copy of a map’s internal structure can lead to following a stale pointer to oblivion.

Please use the race detector

Go ships with a data race detector that works on Windows, Linux, FreeBSD and OSX. The race detector will spot this issue, and many more.

Please use it when testing your code.

Ice cream makers and data races

This is a post about data races. The code for this post lives on Github, github.com/davecheney/benandjerry.

The example program simulates two Ice cream makers, Ben and Jerry, who greet their customers randomly.

package main

import "fmt"

type IceCreamMaker interface {
        // Hello greets a customer
        Hello()
}

type Ben struct {
        name string
}

func (b *Ben) Hello() {
        fmt.Printf("Ben says, \"Hello my name is %s\"\n", b.name)
}

type Jerry struct {
        name string
}

func (j *Jerry) Hello() {
        fmt.Printf("Jerry says, \"Hello my name is %s\"\n", j.name)
}

func main() {
        var ben = &Ben{"Ben"}
        var jerry = &Jerry{"Jerry"}
        var maker IceCreamMaker = ben

        var loop0, loop1 func()

        loop0 = func() {
                maker = ben
                go loop1()
        }

        loop1 = func() {
                maker = jerry
                go loop0()
        }

        go loop0()

        for {
                maker.Hello()
        }
}

It’s a data race, silly

Most programmers should easily spot the data race in this program.

The loop functions are changing the value of maker without using a lock, so it is undefined which implementation of Hello will be called when maker.Hello() is executed by the for loop in the main function.

Some programmers appear to be happy with this data race; either Ben or Jerry will greet the customer, it doesn’t matter which.

Lets run this code, and see what happens.

% env GOMAXPROCS=2 go run main.go
...
Ben says, "Hello my name is Ben"
Jerry says, "Hello my name is Jerry"
Jerry says, "Hello my name is Jerry"
Ben says, "Hello my name is Jerry"
Ben says, "Hello my name is Ben"
...

What! Hold up. Ben sometimes thinks that he is Jerry. How is this possible?

Interface values

The key to understanding this race is to understand how interface values are represented in memory.

An interface is conceptually a struct with two fields.

If we were to describe an interface in Go, it would look something like this.

type interface struct {
       Type uintptr     // points to the type of the interface implementation
       Data uintptr     // holds the data for the interface's receiver
}

Type points to a structure that describes the type of the value that implements this interface. Data points to the value of the implementation itself. The contents of Data are passed as the receiver of any method called via the interface.

For the statement var maker IceCreamMaker = ben, the compiler will generate code that does the following.

The interface’s Type field is set to point to the definition of the *Ben type, and the Data field contains a copy of ben, that is, a pointer to a Ben value.

When loop1() executes the statement, maker = jerry, both fields of the interface value must be updated.

Type now points to the definition of a *Jerry and Data contains a pointer to an instance of Jerry.

The Go memory model says that writes to a single machine word will be atomic, but interfaces are two word values. It is possible that another goroutine may observe the contents of the interface value while it is being changed. In this case it may see something like this

And so Jerry‘s Hello() function is called with ben as the receiver.

Conclusion

There is no such thing as a safe data race. Your program either has no data races, or its operation is undefined.

In this example, the layout of the Ben and Jerry structs were identical in memory, so they were in some sense compatible. Imagine the chaos that would occur if they had different memory representations (this is left as an exercise to the reader).

The Go race detector will spot this error, and many others, and is as simple to use as adding the -race flag to your go test, build, or install command.

Bonus question

In the example code, the Hello method is declared on Ben or Jerry‘s pointer receiver. If this were instead declared as a method on the Ben or Jerry value, would this solve the data race ?

Futher reading

Russ Cox’s original piece on Go interfaces, and while you’re there, you should read the piece Russ wrote explaining this issue.
The Go race detector blog post.

Dave Cheney

The acme of foolishness