Context isn’t for cancellation

This is an experience report about the use of, and difficulties with, the context.Context facility in Go.

Many authors, including myself, have written about the use of, misuse of, and how they would changecontext.Context in a future iteration of Go. While opinions differs on many aspects of context.Context, one thing is clear–there is almost unanimous agreement that the Context.WithValue method on the context.Context interface is orthogonal to the type’s role as a mechanism to control the lifetime of request scoped resources.

Many proposals have emerged to address this apparent overloading of context.Context with a copy on write bag of values. Most approximate thread local storage so are unlikely to be accepted on ideological grounds.

This post explores the relationship between context.Context and lifecycle management and asks the question, are attempts to fix Context.WithValue solving the wrong problem?

Context is a request scoped paradigm

The documentation for the context package strongly recommends that context.Context is only for request scoped values:

Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx:

func DoSomething(ctx context.Context, arg Arg) error {
        // ... use ctx ...
}

Specifically context.Context values should only live in function arguments, never stored in a field or global. This makes context.Context applicable only to the lifetime of resources in a request’s scope. Given Go’s lineage on the server, this is a compelling use case. However, there exist other use cases for cancellation where the lifetime of the resource extends beyond a single request. For example, a background goroutine as part of an agent or pipeline.

Context as a hook for cancellation

The stated goal of the context package is:

Package context defines the Context type, which carries deadlines, cancelation signals, and other request-scoped values across API boundaries and between processes.

Which sounds great, but belies its catch-all nature. context.Context is used in three independent, yet sometimes conflated, scenarios:

  • Cancellation via context.WithCancel.
  • Timeout via context.WithDeadline.
  • A bag of values via context.WithValue.

At any point, a context.Context value can represent any one, or all three of these independent concerns. However, context.Context‘s most important facility, broadcasting a cancellation signal, is incomplete as there is no way to wait for the signal to be acknowledged.

Looking to the past

As this is an experience report, it would be germane to highlight some actual experience. In 2012 Gustavo Niemeyer wrote a package for goroutine lifecycle management called tomb which is used by Juju for the management of the worker goroutines within the various agents in the Juju system.

tomb.Tombs are concerned only with lifecycle management. Importantly, this is a generic notion of a lifecycle, not tied exclusively to a request, or a goroutine. The scope of the resource’s lifetime is defined simply by holding a reference to the tomb value.

A tomb.Tomb value has three properties:

  1. The ability to signal the owner of the tomb to shut down.
  2. The ability to wait until that signal has been acknowledged.
  3. A way to capture a final error value.

However, tomb.Tombs have one drawback, they cannot be shared across multiple goroutines. Consider this prototypical network server where a tomb.Tomb cannot replace the use of sync.WaitGroup.

func serve(l net.Listener) error {
        var wg sync.WaitGroup
        var conn net.Conn
        var err error
        for {
                conn, err = l.Accept()
                if err != nil {
                        break
                }
                wg.Add(1)
                go func(c net.Conn) {
                        defer wg.Done()
                        handle(c)
                }(conn)
        }
        wg.Wait()
        return err
}

To be fair, context.Context cannot do this either as it provides no built in mechanism to acknowledge cancellation. What is needed is a form of sync.WaitGroup that allows cancellation, as well as waiting for its participants to call wg.Done.

Context should become, well, just context

The purpose of the context.Context type is in it’s name:

context /kɒntɛkst/ noun
The circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood.

I propose context.Context becomes just that; a request scoped association list of copy on write values.

Decoupling lifetime management from context.Context as a store of request scoped values will hopefully highlight that request context and lifecycle management are orthogonal concerns.

Best of all, we don’t need to wait til Go 2.0 to explore these ideas like Gustavo’s tomb package.

Typed nils in Go 2

This is an experience report about a gotcha in Go that catches every Go programmer at least once. The following program is extracted from a larger version that caused my co-workers to lose several hours today.

package main

import "fmt"

type T struct{}

func (t T) F() {}

type P interface {
F()
}

func newT() *T { return new(T) }

type Thing struct {
P
}

func factory(p P) *Thing {
return &Thing{P: p}
}

const ENABLE_FEATURE = false

func main() {
t := newT()
t2 := t
if !ENABLE_FEATURE {
t2 = nil
}
thing := factory(t2)
fmt.Println(thing.P == nil)
}

This distilled version of the program in question, while non-sensical, contains all the attributes of the original. Take some time to study the program and ask yourself, does the program print true or false?

nil != nil

Not to spoil the surprise, but the program prints false. The reason is, while nil is assigned to t2, when t2 is passed to factory it is “boxed” into an variable of type P; an interface. Thus, thing.P does not equal nil because while the value of P was nil, its concrete type was *T.

Typed nil

You’ve probably realised the cause of this problem is the dreaded typed nil, a gotcha that has its own entry in the Go FAQ. The typed nil emerges as a result of the definition of a interface type; a structure which contains the concrete type of the value stored in the interface, and the value itself. This structure can’t be expressed in pure Go, but can be visualised with this example:

var n int = 200
var i interface{} = n

The interface value i is assigned a copy of the value of n, so i‘s type slot holds n‘s type; int, and it’s data slot holds the value 200. We can write this more concisely as (int, 200).

In the original program we effectively have the following:

var t2 *T = nil
var p P = t2

Which results in p, using our nomenclature, holding the value (*T, nil). So then, why does the expression p == nil evaluate to false? The explanation I prefer is:

  • nil is a compile time constant which is converted to whatever type is required, just as constant literals like 200 are converted to the required integer type automatically.
  • Given the expression p == nil, both arguments must be of the same type, therefore nil is converted to the same type as p, which is an interface type. So we can rewrite the expression as (*T, nil) == (nil, nil).
  • As equality in Go almost always operates as a bitwise comparison it is clear that the memory bits which hold the interface value (*T, nil) are different to the bits that hold (nil, nil) thus the expression evaluates to false.

Put simply, an interface value is only equal to nil if both the type and the value stored inside the interface are both nil.

For a detailed explanation of the mechanics behind Go’s interface implementation, Russ Cox has a great post on his blog.

The future of typed nils in Go 2

Typed nils are an entirely logical result of the way dynamic types, aka interfaces, are implemented, but are almost never what the programmer wanted. To tie this back to Russ’s GopherCon keynote, I believe typed nils are an example where Go fails to scale for programming teams.

This explanation has consumed 700 words–and several hours over chat today–to explain, and in the end my co-workers were left with a bad taste in their mouths. The clarity of interfaces was soured by a suspicion that gotchas like this were lurking in their codebase. As an experienced Go programmer I’ve learnt to be wary of the possibility of a typed nil during code review, but it is unfortunate that they remain something that each Go programmer has to learn the hard way.

For Go 2.0 I’d like to start the discussion of what it would mean if comparing an interface value to nil considered the value portion of the interface such that the following evaluated to true:

var b *bytes.Buffer
var r io.Reader = b
fmt.Println(r == nil)

There are obviously some subtleties that this pithy demand fails to capture, but a desire to make this seemingly straight forward comparison less error prone would, at least in my mind, make Go 2 easier to scale to larger development teams.

Should Go 2.0 support generics?

A long time ago, someone–I normally attribute this to David Symonds, but I can’t be sure he was the first to say it–said that the reason for adding generics to Go would be the reason for calling it Go 2.0. That is to say, adding generics to the language would be half baked if they were not used throughout the standard library. I wrote about this in a series of blog posts where I explored what I felt would be the repercussions of integrating templated types into Go.

Do I think Go should have generics? Well, there are really two answers to that question.

As I argued in my Simplicity Debt posts, mainstream programmers in 2017 expect a set of features in their languages. Many of us work in polyglot environments. Even if we want to be writing in Go as much as possible, there’s usually some Javascript, some CSS, some Python, maybe some Java, Swift, C#, PHP or even C++ in the project. Maybe this will change in the future, but right now, if you’re a commercial programmer working for a crust, every day you’ll touch a bunch of languages, so their differences tend to rub against one another.

  • Mainstream programmers expect static typing, not for performance, but for readability and maintainability–just look at what Typescript and Dart are bringing to Javascript, and Python’s formative efforts with optional typing.
  • Mainstream programmers expect concurrency. They expect to be able to do more than one thing at a time–just look at node.js and the compromises programmers were prepared to make to move away from heavy-weight thread per connection models. Go is obviously well positioned here.
  • Mainstream programmers expect some form of templated types because they’re used to it in the other languages they interact with alongside Go.

So my first answer is: Go should have some form of generics because it is a mainstream, imperative, block scoped language and it is expected these days.

My second answer is if the designers of the language choose not to add templated types or parameterised functions–and keep in mind that I am not one of the language designers, only an exuberant fan–because, as I wrote in my series of posts, the repercussions for the simplicity and readability of the language may prove too jarring. If that were to happen, my recommendation would be that Go should own that decision.

What do I mean by that? Well, the best explanation I can give is a counterexample. Let’s look at Haskell. Haskell is what most functional programmers consider to be the baseline for a real FP language, and thus it looks pretty much like nothing programmers schooled in side effect ridden, block structured, imperative languages are used to. But Haskell programmers own that, they own their difference. A Haskell programmer doesn’t see a reason to make their language work more like PHP, or C++, or Rust, or even Go, and they are happy to explain the Haskell way of doing things to anyone who asks. My point is that if Go is not going to have a story for templated types, then we need to own it, just like Haskell programmers own their decisions.

This isn’t simply a case of saying “nope, sorry, no generics for Go 2.0, maybe in another 5 years”, but a more fundamental statement that they are not something that will be implemented in Go because we believe there is a better way to solve the underlying problem. Note that I did not say a better way to implement a templated type or parameterised function, but a better way to solve the underlying business problem. There is a difference.

This isn’t without precedent, Go was one of the first C style languages to eschew type inheritance, a decision which lead to a radical simplification of the language and a focus on the mantras of communicating intent via interfaces, and encapsulation over inheritance. Before Go, it was assumed that a mainstream language would have classes and a type hierarchy, nowadays that is less true.

So, should Go 2.0 have generics? If the decision is to add them then I’m sure it can be done, after all the syntax is the least important part of the decision, and there is a wealth of prior art in other languages to guide us. However, if the decision is not to add templated types, then it should be made so explicitly. Then it is incumbent upon all Go programmers to explain the Go Way of solving problems.

How to find out which Go version built your binary

This is a short post describing the procedure for discovering which version of Go was used to compile a Go binary.

This procedure relies on the fact that each Go program includes a copy of the version string reported by runtime.Version() . Linker magic ensures that this value will be present in the final binary irrespective of whether runtime.Version() is called by the resulting program. The value in question is stored in the runtime.buildVersion variable and can be recovered by a debugger.

The rest of this post describes the mechanisms for recovering the contents of runtime.buildVersion on various platforms.

Linux/FreeBSD/OpenBSD/NetBSD

If you’re on a Linux or *BSD platform, you can recover the binary build version with gdb.

% gdb $HOME/bin/godoc
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
(gdb) p 'runtime.buildVersion'
$1 = 0xa9ceb8 "go1.8.3"

Darwin

The debugging situation on OS X isn’t great, but here are several options.

gdb

gdb was removed from the XCode toolchain following the switch from gcc to llvm. If you are running a version of XCode that has gdb, you should used the instructions from the previous section.

Delve

Delve can be used to print the value of runtime.buildVersion.

% dlv exec $HOME/bin/godoc
Type 'help' for list of commands.
(dlv) b main.main
Breakpoint 1 set at 0x15596eb for main.main() ./golang.org/x/tools/cmd/godoc/main.go:156
(dlv) c
> main.main() ./golang.org/x/tools/cmd/godoc/main.go:156 (hits goroutine(1):1 total:1) (PC: 0x15596eb)
   151:                 }
   152:         }
   153:         log.Fatalf("too many redirects")
   154: }
   155:
=> 156: func main() {
   157:         flag.Usage = usage
   158:         flag.Parse()
   159:
   160:         playEnabled = *showPlayground
   161:
(dlv) p runtime.buildVersion
"go1.8.1"

lldb

Christian Witts reports on Twitter that XCode 8.3.3 ships with a version of lldb, version 370.0.42, that can interpret the Go string syntax.

$ lldb $HOME/bin/godoc
(lldb) b main.main
(lldb) run
(lldb) p runtime.buildVersion

I’ve tested earlier versions of lldb and found they do not work. Instread, use delve

Windows

Good news, everyone. Brian Ketelsen of GopherCon and GoTime.fm fame, reports that delve works perfectly on Windows for recovering this binaries’ build version.

PS C:\Users\bkete\go\src\http://github.com \derekparker\delve\cmd\dlv> dlv exec C:\Users\bkete\go\bin\dlv.exe
Type 'help' for list of commands.
(dlv) b main.main
Breakpoint 1 set at 0x8ec666 for main.main() c:/Users/bkete/go/src/github.com/derekparker/delve/cmd/dlv/main.go:11
(dlv) c
> main.main() c:/Users/bkete/go/src/github.com/derekparker/delve/cmd/dlv/main.go:11 (hits goroutine(1):1 total:1) (PC: 0x8ec666)
     6: )
     7:
     8: // Build is the git sha of this binaries build.
     9: var Build string
    10:
=>  11: func main() {
    12:         http://version.DelveVersion.Build  = Build
    13:         http://cmds.New ().Execute()
    14: }
(dlv) p runtime.buildVersion
"go1.8.1"

If someone wants to figure out the correct WinDbg or Visual Studio Debugger incantation, please let me know and I’ll link to you from this post.

Simplicity Debt Redux

In my previous post I discussed my concerns the additional complexity adding generics or immutability would bring to a future Go 2.0. As it was an opinion piece, I tried to keep it around 500 words. This post is an exploration of the most important (and possibly overlooked) point of that post.

Indeed, the addition of [generics and/or immutability] would have a knock-on effect that would profoundly alter the way error handling, collections, and concurrency are implemented. 

Specifically, what I believe would be the possible knock-on effect of adding generics or immutability to the language.

Error handling

A powerful motivation for adding generic types to Go is to enable programmers to adopt a monadic error handling pattern. My concerns with this approach have little to do with the notion of the maybe monad itself. Instead I want to explore the question of how this additional form of error handling might be integrated into the stdlib, and thus the general population of Go programmers.

Right now, to understand how io.Reader works you need to know how slices work, how interfaces work, and know how nil works. If the if err != nil { return err } idiom was replaced by an option type or maybe monad, then everyone who wanted to do basic things like read input or write output would have to understand how option types or maybe monads work in addition to discussion of what templated types are, and how they are implemented in Go.

Obviously it’s not impossible to learn, but it is more complex than what we have today. Newcomers to the language would have to integrate more concepts before they could understand basic things, like reading from a file.

The next question is, would this monadic form become the single way errors are handled? It seems confusing, and gives unclear guidiance to newcomers to Go 2.0, to continue to support both the error interface model and a new monadic maybe type. Also, if some form of templated maybe type was added, would it be a built in, like error, or would it have to be imported in almost every package. Note: we’ve been here before with os.Error.

What began as the simple request to create the ability to write a templated maybe or option type has ballooned into a set of question that would affect every single Go package ever written.

Collections

Another reason to add templated types to Go is to facilitate custom collection types without the need for interface{} boxing and type assertions.

On the surface this sounds like a grand idea, especially as these types are leaking into the standard library anyway. But that leaves the question of what to do with the built in slice and map types. Should slices and maps co-exist with user defined collections, or should they be removed in favour of defining everything as a generic type?

To keep both sounds redundant and confusing, as all Go developers would have to be fluent in both and develop a sophisticated design sensibility about when and where to choose one over the other. But to remove slices and maps in favour of collection types provided by a library raises other questions.

Slicing

For example, if there is no slice type, only types like a vector or linked list, what happens to slicing? Does it go away, if so, how would that impact common operations like handling the result a call to io.Reader.Read? If slicing doesn’t go away, would that require the addition of operator overloading so that user defined collection types can implement a slice operator?

Then there are questions on how to marry the built in map type with a user defined map or set. Should user defined maps support the index and assignment operators? If so, how could a user defined map offer both the one and two return value forms of lookup without requiring polymophic dispatch based on the number of return arguments? How would those operators work in the presence of set operations which have no value, only a key?

Which types could use the delete function? Would delete need to be modified to work with types that implement some kind of Deleteable interface? The same questions apply to append, lencap, and copy.

What about addressability? Values in the built in map type are not addressable, but should that be permitted or disallowed for user defined map types? How would that interact with operator overloading designed to make user defined maps look more like the built in map?

What sounded like a good idea on paper—make it possible for programmers to define their own efficient collection data types—has highlighted how deeply integrated the built in map and slice are and spawned not only a requirement for templated types, but operator overloading, polymorphic dispatch, and some kind of return value addressability semantics.

How could you implement a vector?

So, maybe you make the argument that now we have templated types we can do away with the built in slice and map, and replace them with a Java-esque list of collection types.

Go’s Pascal-like array type has a fixed size known at compile time. How could you implement a growable vector without resorting to unsafe hacks? I’ll leave that as an exercise to the reader. But I put it to you that if you cannot implement simple templated vector type with the memory safety we enjoy today with slices, then that is a very strong design smell.

Iteration

I’ll admit that the inability to use the for ... range statement over my own types was something that frustrated me for a long time when I came to Go, as I was accustomed to the flexibility of the iterator types in the Java collections library.

But iterating over in-memory data structures is boring—what you really want to be able to do is compose iterators over database results and network requests. In short, data from outside your process—and when data is outside your process, retrieving it might fail. In that case you have a choice, does your Iterable interface return a value, a value and an error, or perhaps you go down the option type route. Each would require a new form of range loop semantic sugar in an area which already contains its share of footguns.

You can see that adding the ability to write template collection types sounds great on paper, but in practice it would perpetuate a situation where the built in collection types live on in addition to their user defined counterparts. Each would have their strengths and weaknesses, and a Go developer would have to become proficient in both. This is something that Go developers just don’t have to think about today as slices and maps are practically ubiquitous.

Immutability

Russ wrote at the start of the year that a story for reference immutability was an important area of exploration for the future of Go. Having surveyed hundreds of Go packages and found few which are written with an understanding of the problem of data races—let alone actually tried running their tests under the race detector—it is tempting to agree with Russ that the ‘after the fact’ model of checking for races at run time has some problems.

On balance, after thinking about the problems of integrating templated types into Go, I think if I had to choose between generics and immutability, I’d choose the latter.

But the ability to mark a function parameter as const is insufficient, because while it restricts the receiver from mutating the value, it does not prohibit the caller from doing so, which is the majority of the data races I see in Go programs today. Perhaps what Go needs is not immutability, but ownership semantics.

While the Rust ownership model is undoubtedly correctiff your program complies, it has no data races—nobody can argue that the ownership model is simple or easy for newcomers. Nor would adding an extra dimension of immutability to every variable declaration in Go be simple as it would force every user of the language to write their programs from the most pessimistic standpoint of assuming every variable will be shared and will be mutated concurrently.

In conclusion

These are some of the knock on effects that I see of adding generics or immutability to Go. To be clear, I’m not saying that it should not be done, in fact in my previous post I argued the opposite.

What I want to make clear is adding generics or immutability has nothing to do with the syntax of those features, little to do with their underlying implementation, and everything to do with the impact on the overall complexity budget of the language and its libraries, that these features would unlock.

David Symonds argued years ago that there would be no benefit in adding generics to Go if they were not used heavily in the stdlib. The question, and concern, I have is; would the result be more complex than what we have today with our quaint built in slice, map, and error types?

I think it is worth keeping in mind the guiding principals of the language—simplicity and readability. The design of Go does not follow the accretive model of C++ or Java The goal is not to reinvent those languages, minus the semicolons.

Simplicity Debt

Fifteen years ago Python’s GIL wasn’t a big issue. Concurrency was something dismissed as probably unnecessary. What people really was needed was a faster interpreter, after all, who had more than one CPU? But, slowly, as the requirement for concurrency increased, the problems with the GIL increased.

By the time this decade rolled around, Node.js and Go had arrived on the scene, highlighting the need for concurrency as a first class concept. Various async contortions papered over the single threaded cracks of Python programs, but it was too late. Other languages had shown that concurrency must be a built-in facility, and Python had missed the boat.

When Go launched in 2009, it didn’t have a story for templated types. First we said they were important, but we didn’t know how to implement them. Then we argued that you probably didn’t need them, instead Go programmers should focus on interfaces, not types. Meanwhile Rust, Nim, Pony, Crystal, and Swift showed that basic templated types are a useful, and increasingly, expected feature of any language—just like concurrency.

There is no question that templated types and immutability are on their way to becoming mandatory in any modern programming language. But there is equally no question that adding these features to Go would make it more complex.

Just as efforts to improve Go’s dependency management situation have made it easier to build programs that consume larger dependency graphs, producing larger and more complex pieces of software, efforts to add templated types and immutability to the language would unlock the ability to write more complex, less readable software. Indeed, the addition of these features would have a knock on effect that would profoundly alter the way error handling, collections, and concurrency are implemented.

I have no doubt that adding templated types to Go will make it a more complicated language, just as I have no doubt that not adding them would be a mistake–lest Go find itself, like Python, on the wrong side of history. But, no matter how important and useful templated types and immutability would be, integrating them into a hypothetical Go 2 would decrease its readability and increase compilation times—two things which Go was designed to address. They would, in effect, impose a simplicity debt.

If you want generics, immutability, ownership semantics, option types, etc, those are already available in other languages. There is a reason Go programmers choose to program in Go, and I believe that reason stems from our core tenets of simplicity and readability. The question is, how can we pay down the cost in complexity of adding templated types or immutability to Go?

Go 2 isn’t here yet, but its arrival is a lot more certain than previously believed. As it stands now, generics or immutability can’t just be added to Go and still call it simple. As important as the discussions on how to add these features to Go 2 would be, equal weight must be given to the discussion of how to first offset their inherent complexity.

We have to build up a bankroll to spend on the complexity generics and immutability would add, otherwise Go 2 will start its life in simplicity debt.

Next: Simplicity Debt Redux

Go, without package scoped variables

This is a thought experiment, what would Go look like if we could no longer declare variables at the package level? What would be the impact of removing package scoped variable declarations, and what could we learn about the design of Go programs?

I’m only talking about expunging var, the other five top level declarations would still be permitted as they are effectively constant at compile time. You can, of course, continue to declare variables at the function or block scope.

Why are package scoped variables bad?

But first, why are package scoped variables bad? Putting aside the problem of globally visible mutable state in a heavily concurrent language, package scoped variables are fundamentally singletons, used to smuggle state between unrelated concerns, encourage tight coupling and makes the code that relies on them hard to test.

As Peter Bourgon wrote recently:

tl;dr: magic is bad; global state is magic → [therefore, you want] no package level vars; no func init.

Removing package scoped variables, in practice

To put this idea to the test I surveyed the most popular Go code base in existence; the standard library, to see how package scoped variables were used, and assessed the effect applying this experiment would have.

Errors

One of the most frequent uses of public package level var declarations are errors; io.EOF,
sql.ErrNoRowscrypto/x509.ErrUnsupportedAlgorithm, and so on. Removing the use of package scoped variables would remove the ability to use public variables for sentinel error values. But what could be used to replace them?

I’ve written previously that you should prefer behaviour over type or identity when inspecting errors. Where that isn’t possible, declaring error constants removes the potential for modification which retaining their identity semantics.

The remaining error variables are private declarations which give a symbolic name to an error message. These error values are unexported so they cannot be used for comparison by callers outside the package. Declaring them at the package level, rather than at the point they occur inside a function negates the opportunity to add additional context to the error. Instead I recommend using something like pkg/errors to capture a stack trace at the point the error occurs.

Registration

A registration pattern is followed by several packages in the standard library such as net/http, database/sql, flag, and to a lesser extent log. It commonly involves a package scoped private map or struct which is mutated by a public function—a textbook singleton.

Not being able to create a package scoped placeholder for this state would remove the side effects in the image, database/sql, and crypto packages to register image decoders, database drivers and cryptographic schemes. However, this is precisely the magic that Peter is referring to–importing a package for the side effect of changing some global state of your program is truly spooky action at a distance.

Registration also promotes duplicated business logic. The net/http/pprof package registers itself, via a side effect with net/http.DefaultServeMux, which is both a potential security issue—other code cannot use the default mux without exposing the pprof endpoints—and makes it difficult to convince the net/http/pprof package to register its handlers with another mux.

If package scoped variables were no longer used, packages like net/http/pprof could provide a function that registers routes on a supplied http.ServeMux, rather than relying on side effects to altering global state.

Removing the ability to apply the registry pattern would also solve the issues encountered when multiple copies of the same package are imported in the final binary and try to register themselves during startup.

Interface satisfaction assertions

The interface satisfaction idiom

var _ SomeInterface = new(SomeType)

occurred at least 19 times in the standard library. In my opinion these assertions are tests. They don’t need to be compiled, only to be eliminated, every time you build your package. Instead they should be moved to the corresponding _test.go file. But if we’re prohibiting package scoped variables, this prohibition also applies to tests, so how can we keep this test?

One option is to move the declaration from package scope to function scope, which will still fail to compile if SomeType stop implementing SomeInterface

func TestSomeTypeImplementsSomeInterface(t *testing.T) {
       // won't compile if SomeType does not implement SomeInterface
       var _ SomeInterface = new(SomeType)
}

But, as this is actually a test, it’s not hard to rewrite this idiom as a standard Go test.

func TestSomeTypeImplementsSomeInterface(t *testing.T) {
       var i interface{} = new(SomeType)
       if _, ok := i.(SomeInterface); !ok {
               t.Fatalf("expected %t to implement SomeInterface", i)
       }
}

As a side note, because the spec says that assignment to the blank identifier must fully evaluate the right hand side of the expression, there are probably a few suspicious package level initialisation constructs hidden in those var declarations.

It’s not all beer and skittles

The previous sections showed that avoiding package scoped variables might be possible, but there are some areas of the standard library which have proved more difficult to apply this idea.

Real singletons

While I think that the singleton pattern is generally overplayed, especially in its registration form, there are always some real singleton values in every program. A good example of this is  os.Stdout and friends.

package os 

var (
        Stdin  = NewFile(uintptr(syscall.Stdin), "/dev/stdin")
        Stdout = NewFile(uintptr(syscall.Stdout), "/dev/stdout")
        Stderr = NewFile(uintptr(syscall.Stderr), "/dev/stderr")
)

There are a few problems with this declaration. Firstly Stdin, Stdout, and Stderr are of type *os.File, not their respective io.Reader or io.Writer interfaces. This makes replacing them with alternatives problematic. However the notion of replacing them is exactly the kind of magic that this experiment seeks to avoid.

As the previous constant error example showed, we can retain the singleton nature of the standard IO file descriptors, such that packages like log and fmt can address them directly, but avoid declaring them as mutable public variables with something like this:

package main

import (
        "fmt"
        "syscall"
)

type readfd int

func (r readfd) Read(buf []byte) (int, error) {
        return syscall.Read(int(r), buf)
}

type writefd int

func (w writefd) Write(buf []byte) (int, error) {
        return syscall.Write(int(w), buf)
}

const (
        Stdin  = readfd(0)
        Stdout = writefd(1)
        Stderr = writefd(2)
)

func main() {
        fmt.Fprintf(Stdout, "Hello world")
}

Caches

The second most common use of unexported package scoped variables are caches. These come in two forms; real caches made out of maps (see the registration pattern above) and sync.Pool, and quasi constant variables that ameliorate the cost of a compilation.

As an example the crypto/ecsda package has a zr type whose Read method zeros any buffer passed to it. The package keeps a single instance of zr around because it is embedded in other structs as an io.Reader, potentially escaping to the heap each time it is instantiated.

package ecdsa 

type zr struct {
        io.Reader
}

// Read replaces the contents of dst with zeros.
func (z *zr) Read(dst []byte) (n int, err error) {
        for i := range dst {
                dst[i] = 0
        }
        return len(dst), nil
}

var zeroReader = &zr{}

However zr doesn’t embed an io.Reader, it is an io.Reader, so the unused zr.Reader field could be eliminated, giving zr a width of zero. In my testing this modified type can be created directly where it is used without performance regression.

        csprng := cipher.StreamReader{
                R: zr{},
                S: cipher.NewCTR(block, []byte(aesIV)),
        }

Perhaps some of the caching decision could be revisited as the inlining and escape analysis options available to the compiler have improved significantly since the standard library was first written.

Tables

The last major use of  common use of private package scoped variables is for tables, as seen in the unicode, crypto/*, and math packages. These tables either encode constant data in the form of arrays of integer types, or less commonly simple structs and maps.

Replacing package scoped variables with constants would require a language change along the lines of #20443. So, fundamentally, providing there is no way to modify those tables at run time, they are probably a reasonable exception to this proposal.

A bridge too far

Even though this post was just a thought experiment, it’s clear that forbidding all package scoped variables is too draconian to be workable as a language precept. Addressing the bespoke uses of private var usage may prove impractical from a performance standpoint, would be akin to pinning a “kick me” sign to ones back and inviting all the Go haters to take a free swing.

However, I believe there are a few concrete recommendations that can be drawn from this exercise, without going to the extreme of changing the language spec.

  • Firstly, public var declarations should be eschewed. This is not a controversial conclusion and not one that is unique to Go. The singleton pattern is discouraged, and an unadorned public variable that can be changed at any time by any party that knows its name should be a design, and concurrency, red flag.
  • Secondly, where public package var declarations are used, the type of those variables should be carefully constructed to expose as little surface area as possible. It should not be the default to take a type expected to be used on a per instance basis, and assign it to a package scoped variable.

Private variable declarations are more nuanced, but certain patterns can be observed:

  • Private variables with public setters, which I labelled registries, have the same effect on the overall program design as their public counterparts. Rather than registering dependencies globally, they should instead be passed in during declaration using a constructor function, compact literal, config structure, or option function.
  • Caches of []byte vars can often be expressed as consts at no performance cost.  Don’t forget the compiler is pretty good at avoiding string([]byte) conversions where they don’t escape the function call.
  • Private variables that hold tables, like the unicode package, are an unavoidable consequence of the lack of a constant array type. As long as they are unexported, and do not expose any way to mutate them, they can be considered effectively constant for the purpose of this discussion.

The bottom line; think long and hard about adding package scoped variables that are mutated during the operation of your program. It may be a sign that you’ve introduced magic global state.

If a map isn’t a reference variable, what is it?

In my previous post I showed that Go maps are not reference variables, and are not passed by reference. This leaves the question, if maps are not references variables, what are they?

For the impatient, the answer is:

A map value is a pointer to a runtime.hmap structure.

If you’re not satisfied with this explanation, read on.

What is the type of a map value?

When you write the statement

m := make(map[int]int)

The compiler replaces it with a call to runtime.makemap, which has the signature

// makemap implements a Go map creation make(map[k]v, hint)
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If bucket != nil, bucket can be used as the first bucket.
func makemap(t *maptype, hint int64, h *hmap, bucket unsafe.Pointer) *hmap

As you see, the type of the value returned from runtime.makemap is a pointer to a runtime.hmap structure. We cannot see this from normal Go code, but we can confirm that a map value is the same size as a uintptr–one machine word.

package main

import (
	"fmt"
	"unsafe"
)

func main() {
	var m map[int]int
	var p uintptr
	fmt.Println(unsafe.Sizeof(m), unsafe.Sizeof(p)) // 8 8 (linux/amd64)
}

If maps are pointers, shouldn’t they be *map[key]value?

It’s a good question that if maps are pointer values, why does the expression make(map[int]int) return a value with the type map[int]int. Shouldn’t it return a *map[int]int? Ian Taylor answered this recently in a golang-nuts thread1.

In the very early days what we call maps now were written as pointers, so you wrote *map[int]int. We moved away from that when we realized that no one ever wrote `map` without writing `*map`.

Arguably renaming the type from *map[int]int to map[int]int, while confusing because the type does not look like a pointer, was less confusing than a pointer shaped value which cannot be dereferenced.

Conclusion

Maps, like channels, but unlike slices, are just pointers to runtime types. As you saw above, a map is just a pointer to a runtime.hmap structure.

Maps have the same pointer semantics as any other pointer value in a Go program. There is no magic save the rewriting of map syntax by the compiler into calls to functions in runtime/hmap.go.


Notes

  1. If you look far enough back in the history of Go repository, you can find examples of maps created with the new operator.

There is no pass-by-reference in Go

My post on pointers provoked a lot of debate about maps and pass by reference semantics. This post is a response to those debates.

To be clear, Go does not have reference variables, so Go does not have pass-by-reference function call semantics.

What is a reference variable?

In languages like C++ you can declare an alias, or an alternate name to an existing variable. This is called a reference variable.

#include <stdio.h>

int main() {
        int a = 10;
        int &b = a;
        int &c = b;

        printf("%p %p %p\n", &a, &b, &c); // 0x7ffe114f0b14 0x7ffe114f0b14 0x7ffe114f0b14
        return 0;
}

You can see that a, b, and c all refer to the same memory location. A write to a will alter the contents of b and c. This is useful when you want to declare reference variables in different scopes–namely function calls.

Go does not have reference variables

Unlike C++, each variable defined in a Go program occupies a unique memory location.

package main

import "fmt"

func main() {
        var a, b, c int
        fmt.Println(&a, &b, &c) // 0x1040a124 0x1040a128 0x1040a12c
}

It is not possible to create a Go program where two variables share the same storage location in memory. It is possible to create two variables whose contents point to the same storage location, but that is not the same thing as two variables who share the same storage location.

package main

import "fmt"

func main() {
        var a int
        var b, c = &a, &a
        fmt.Println(b, c)   // 0x1040a124 0x1040a124
        fmt.Println(&b, &c) // 0x1040c108 0x1040c110
}

In this example, b and c hold the same value–the address of a–however, b and c themselves are stored in unique locations. Updating the contents of b would have no effect on c.

But maps and channels are references, right?

Wrong. Maps and channels are not references. If they were this program would print false.

package main

import "fmt"

func fn(m map[int]int) {
        m = make(map[int]int)
}

func main() {
        var m map[int]int
        fn(m)
        fmt.Println(m == nil)
}

If the map m was a C++ style reference variable, the m declared in main and the m declared in fn would occupy the same storage location in memory. But, because the assignment to m inside fn has no effect on the value of m in main, we can see that maps are not reference variables.

Conclusion

Go does not have pass-by-reference semantics because Go does not have reference variables.

Next: If a map isn’t a reference variable, what is it?

Understand Go pointers in less than 800 words or your money back

This post is for programmers coming to Go who are unfamiliar with the idea of pointers or a pointer type in Go.

What is a pointer?

Simply put, a pointer is a value which points to the address of another. This is the textbook explanation, but if you’re coming from a language that doesn’t let you talk about address of a variable, it could very well be written in Cuneiform.

Let’s break this down.

What is memory?

Computer memory, RAM, can be thought of as a sequence of boxes, placed one after another in a line. Each box, or cell, is labeled with a unique number, which increments sequentially; this is the address of the cell, its memory location.

Each cell holds a single value. If you know the memory address of a cell, you can go to that cell and read its contents. You can place a value in that cell; replacing anything that was in there previously.

That’s all there is to know about memory. Everything the CPU does is expressed as fetching and depositing values into memory cells.

What is a variable?

To write a program that retrieves the value stored in memory location 200, multiples it by 3 and deposits the result into memory location 201, we could write something like this in pseudocode:

  • retrieve the value stored in address 200 and place it in the CPU.
  • multiple the value stored in the CPU by 3.
  • deposit the value stored in the CPU into memory location 201.


This is exactly how early programs were written; programmers would keep a list of memory locations, who used it, when, and what the value stored there represented.

Obviously this was tedious and error prone, and meant every possible value stored in memory had to be assigned an address during the construction of the program. Worse, this arrangement made it difficult to allocate storage to variables dynamically as the program ran– just imagine if you had to write large programs using only global variables.

To address this, the notion of a variable was created. A variable is just a convenient, alphanumeric pseudonym for a memory location; a label, or nickname.

Now, rather than talking about memory locations, we can talk about variables, which are convenient names we give to memory locations. The previous program can now be expressed as:

  • Retrieve the value stored in variable a and place it in the CPU.
  • multiple it by 3
  • deposit the value into the variable b.

This is the same program, with one crucial improvement–because we no longer need to talk about memory locations directly, we no longer need to keep track of them–that drudgery is left to the compiler.

Now we can write a program like

var a = 6
var b = a * 3

And the compiler will make sure that the variables a and b are assigned unique memory locations to hold their value for as long as needed.

What is a pointer?

Now that we know that memory is just a series of numbered cells, and variables are just nicknames for a memory location assigned by the compiler, what is a pointer?

A pointer is a value that points to the memory address of another variable.

The pointer points to memory address of a variable, just as a variable represents the memory address of value.

Let’s have a look at this program fragment

func main() {
        a := 200
        b := &a
        *b++
        fmt.Println(a)
}

On the first line of main we declare a new variable a and assign it the value 200.

Next we declare a variable b and assign it the address a. Remember that we don’t know the exact memory location where a is stored, but we can still store a‘s address in b.

The third line is probably the most confusing, because of the strongly typed nature of Go. b contains the address of variable a, but we want to increment the value stored in a. To do this we must dereference b, follow the pointer from b to a.

Then we add one the value, and store it back in the memory location stored in b.

The final line prints the value of a, showing that it has increased to 201.

Conclusion

If you are coming from a language with no notion of pointers, or where every variable is implicitly a pointer don’t panic, forming a mental model of how variables and pointers relate takes time and practice. Just remember this rule:

A pointer is a value that points to the memory address of another variable.

Next: There is no pass-by-reference in Go