Category Archives: Go

Constant Time

This essay is a derived from my dotGo 2019 presentation about my favourite feature in Go.


Many years ago Rob Pike remarked,

“Numbers are just numbers, you’ll never see 0x80ULL in a .go source file”.

—Rob Pike, The Go Programming Language

Beyond this pithy observation lies the fascinating world of Go’s constants. Something that is perhaps taken for granted because, as Rob noted, is Go numbers–constants–just work.
In this post I intend to show you a few things that perhaps you didn’t know about Go’s const keyword.

What’s so great about constants?

To kick things off, why are constants good? Three things spring to mind:

  • Immutability. Constants are one of the few ways we have in Go to express immutability to the compiler.
  • Clarity. Constants give us a way to extract magic numbers from our code, giving them names and semantic meaning.
  • Performance. The ability to express to the compiler that something will not change is key as it unlocks optimisations such as constant folding, constant propagation, branch and dead code elimination.

But these are generic use cases for constants, they apply to any language. Let’s talk about some of the properties of Go’s constants.

A Challenge

To introduce the power of Go’s constants let’s try a little challenge: declare a constant whose value is the number of bits in the natural machine word.

We can’t use unsafe.SizeOf as it is not a constant expression. We could use a build tag and laboriously record the natural word size of each Go platform, or we could do something like this:

const uintSize = 32 << (^uint(0) >> 32 & 1)

There are many versions of this expression in Go codebases. They all work roughly the same way. If we’re on a 64 bit platform then the exclusive or of the number zero–all zero bits–is a number with all bits set, sixty four of them to be exact.

1111111111111111111111111111111111111111111111111111111111111111

If we shift that value thirty two bits to the right, we get another value with thirty two ones in it.

0000000000000000000000000000000011111111111111111111111111111111

Anding that with a number with one bit in the final position give us, the same thing, 1,

0000000000000000000000000000000011111111111111111111111111111111 & 1 = 1

Finally we shift the number thirty two one place to the right, giving us 641.

32 << 1 = 64

This expression is an example of a constant expression. All of these operations happen at compile time and the result of the expression is itself a constant. If you look in the in runtime package, in particular the garbage collector, you’ll see how constant expressions are used to set up complex invariants based on the word size of the machine the code is compiled on.

So, this is a neat party trick, but most compilers will do this kind of constant folding at compile time for you. Let’s step it up a notch.

Constants are values

In Go, constants are values and each value has a type. In Go, user defined types can declare their own methods. Thus, a constant value can have a method set. If you’re surprised by this, let me show you an example that you probably use every day.

const timeout = 500 * time.Millisecond
fmt.Println("The timeout is", timeout) // 500ms

In the example the untyped literal constant 500 is multiplied by time.Millisecond, itself a constant of type time.Duration. The rule for assignments in Go are, unless otherwise declared, the type on the left hand side of the assignment operator is inferred from the type on the right.500 is an untyped constant so it is converted to a time.Duration then multiplied with the constant time.Millisecond.

Thus timeout is a constant of type time.Duration which holds the value 500000000.
Why then does fmt.Println print 500ms, not 500000000?

The answer is time.Duration has a String method. Thus any time.Duration value, even a constant, knows how to pretty print itself.

Now we know that constant values are typed, and because types can declare methods, we can derive that constant values can fulfil interfaces. In fact we just saw an example of this. fmt.Println doesn’t assert that a value has a String method, it asserts the value implements the Stringer interface.

Let’s talk a little about how we can use this property to make our Go code better, and to do that I’m going to take a brief digression into the Singleton pattern.

Singletons

I’m generally not a fan of the singleton pattern, in Go or any language. Singletons complicate testing and create unnecessary coupling between packages. I feel the singleton pattern is often used not to create a singular instance of a thing, but instead to create a place to coordinate registration. net/http.DefaultServeMux is a good example of this pattern.

package http

// DefaultServeMux is the default ServeMux used by Serve.
var DefaultServeMux = &defaultServeMux

var defaultServeMux ServeMux

There is nothing singular about http.defaultServerMux, nothing prevents you from creating another ServeMux. In fact the http package provides a helper that will create as many ServeMux‘s as you want.

// NewServeMux allocates and returns a new ServeMux.
func NewServeMux() *ServeMux { return new(ServeMux) }

http.DefaultServeMux is not a singleton. Never the less there is a case for things which are truely singletons because they can only represent a single thing. A good example of this are the file descriptors of a process; 0, 1, and 2 which represent stdin, stdout, and stderr respectively.

It doesn’t matter what names you give them, 1 is always stdout, and there can only ever be one file descriptor 1. Thus these two operations are identical:

fmt.Fprintf(os.Stdout, "Hello dotGo\n")
syscall.Write(1, []byte("Hello dotGo\n"))

So let’s look at how the os package defines Stdin, Stdout, and Stderr:

package os

var (
        Stdin  = NewFile(uintptr(syscall.Stdin), "/dev/stdin")
        Stdout = NewFile(uintptr(syscall.Stdout), "/dev/stdout")
        Stderr = NewFile(uintptr(syscall.Stderr), "/dev/stderr")
)

There are a few problems with this declaration. Firstly their type is *os.File not the respective io.Reader or io.Writer interfaces. People have long complained that this makes replacing them with alternatives problematic. However the notion of replacing these variables is precisely the point of this digression. Can you safely change the value of os.Stdout once your program is running without causing a data race?

I argue that, in the general case, you cannot. In general, if something is unsafe to do, as programmers we shouldn’t let our users think that it is safe, lest they begin to depend on that behaviour.

Could we change the definition of os.Stdout and friends so that they retain the observable behaviour of reading and writing, but remain immutable? It turns out, we can do this easily with constants.

type readfd int

func (r readfd) Read(buf []byte) (int, error) {
       return syscall.Read(int(r), buf)
}

type writefd int

func (w writefd) Write(buf []byte) (int, error) {
        return syscall.Write(int(w), buf)
}

const (
        Stdin  = readfd(0)
        Stdout = writefd(1)
        Stderr = writefd(2)
)

func main() {
        fmt.Fprintf(Stdout, "Hello world")
}

In fact this change causes only one compilation failure in the standard library.2

Sentinel error values

Another case of things which look like constants but really aren’t, are sentinel error values. io.EOF, sql.ErrNoRows, crypto/x509.ErrUnsupportedAlgorithm, and so on are all examples of sentinel error values. They all fall into a category of expected errors, and because they are expected, you’re expected to check for them.

To compare the error you have with the one you were expecting, you need to import the package that defines that error. Because, by definition, sentinel errors are exported public variables, any code that imports, for example, the io package could change the value of io.EOF.

package nelson

import "io"

func init() {
        io.EOF = nil // haha!
}

I’ll say that again. If I know the name of io.EOF I can import the package that declares it, which I must if I want to compare it to my error, and thus I could change io.EOF‘s value. Historically convention and a bit of dumb luck discourages people from writing code that does this, but technically there is nothing to prevent you from doing so.

Replacing io.EOF is probably going to be detected almost immediately. But replacing a less frequently used sentinel error may cause some interesting side effects:

package innocent

import "crypto/rsa"

func init() {
        rsa.ErrVerification = nil // 🤔
}

If you were hoping the race detector will spot this subterfuge, I suggest you talk to the folks writing testing frameworks who replace os.Stdout without it triggering the race detector.

Fungibility

I want to digress for a moment to talk about the most important property of constants. Constants aren’t just immutable, its not enough that we cannot overwrite their declaration,
Constants are fungible. This is a tremendously important property that doesn’t get nearly enough attention.

Fungible means identical. Money is a great example of fungibility. If you were to lend me 10 bucks, and I later pay you back, the fact that you gave me a 10 dollar note and I returned to you 10 one dollar bills, with respect to its operation as a financial instrument, is irrelevant. Things which are fungible are by definition equal and equality is a powerful property we can leverage for our programs.

var myEOF = errors.New("EOF") // io/io.go line 38
fmt.Println(myEOF == io.EOF)  // false

Putting aside the effect of malicious actors in your code base the key design challenge with sentinel errors is they behave like singletons, not constants. Even if we follow the exact procedure used by the io package to create our own EOF value, myEOF and io.EOF are not equal. myEOF and io.EOF are not fungible, they cannot be interchanged. Programs can spot the difference.

When you combine the lack of immutability, the lack of fungibility, the lack of equality, you have a set of weird behaviours stemming from the fact that sentinel error values in Go are not constant expressions. But what if they were?

Constant errors

Ideally a sentinel error value should behave as a constant. It should be immutable and fungible. Let’s recap how the built in error interface works in Go.

type error interface {
        Error() string
}

Any type with an Error() string method fulfils the error interface. This includes user defined types, it includes types derived from primitives like string, and it includes constant strings. With that background, consider this error implementation:

type Error string

func (e Error) Error() string {
        return string(e)
}

We can use this error type as a constant expression:

const err = Error("EOF")

Unlike errors.errorString, which is a struct, a compact struct literal initialiser is not a constant expression and cannot be used.

const err2 = errors.errorString{"EOF"} // doesn't compile

As constants of this Error type are not variables, they are immutable.

const err = Error("EOF")
err = Error("not EOF")   // doesn't compile

Additionally, two constant strings are always equal if their contents are equal:

const str1 = "EOF"
const str2 = "EOF"
fmt.Println(str1 == str2) // true

which means two constants of a type derived from string with the same contents are also equal.

type Error string

const err1 = Error("EOF")
const err2 = Error("EOF")
fmt.Println(err1 == err2) // true```

Said another way, equal constant Error values are the same, in the way that the literal constant 1 is the same as every other literal constant 1.

Now we have all the pieces we need to make sentinel errors, like io.EOF, and rsa.ErrVerfication, immutable, fungible, constant expressions.

% git diff
diff --git a/src/io/io.go b/src/io/io.go
index 2010770e6a..355653b4b8 100644
--- a/src/io/io.go
+++ b/src/io/io.go
@@ -35,7 +35,12 @@ var ErrShortBuffer = errors.New("short buffer")
 // If the EOF occurs unexpectedly in a structured data stream,
 // the appropriate error is either ErrUnexpectedEOF or some other error
 // giving more detail.
-var EOF = errors.New("EOF")
+const EOF = ioError("EOF")
+
+type ioError string
+
+func (e ioError) Error() string { return string(e) }

This change is probably a bit of a stretch for the Go 1 contract, but there is no reason you cannot adopt a constant error pattern for your sentinel errors in the packages that you write.

In summary

Go’s constants are powerful. If you only think of them as immutable numbers, you’re missing out. Go’s constants let us compose programs that are more correct and harder to misuse.

Today I’ve outlined three ways to use constants that are more than your typical immutable number.

Now it’s over to you, I’m excited to see where you can take these ideas.

Prefer table driven tests

I’m a big fan of testing, specifically unit testing and TDD (done correctly, of course). A practice that has grown around Go projects is the idea of a table driven test. This post explores the how and why of writing a table driven test.

Let’s say we have a function that splits strings:

// Split slices s into all substrings separated by sep and
// returns a slice of the substrings between those separators.
func Split(s, sep string) []string {
var result []string
i := strings.Index(s, sep)
for i > -1 {
result = append(result, s[:i])
s = s[i+len(sep):]
i = strings.Index(s, sep)
}
return append(result, s)
}

In Go, unit tests are just regular Go functions (with a few rules) so we write a unit test for this function starting with a file in the same directory, with the same package name, strings.

package split

import (
"reflect"
"testing"
)

func TestSplit(t *testing.T) {
got := Split("a/b/c", "/")
want := []string{"a", "b", "c"}
if !reflect.DeepEqual(want, got) {
t.Fatalf("expected: %v, got: %v", want, got)
}
}

Tests are just regular Go functions with a few rules:

  1. The name of the test function must start with Test.
  2. The test function must take one argument of type *testing.T. A *testing.T is a type injected by the testing package itself, to provide ways to print, skip, and fail the test.

In our test we call Split with some inputs, then compare it to the result we expected.

Code coverage

The next question is, what is the coverage of this package? Luckily the go tool has a built in branch coverage. We can invoke it like this:

% go test -coverprofile=c.out
PASS
coverage: 100.0% of statements
ok split 0.010s

Which tells us we have 100% branch coverage, which isn’t really surprising, there’s only one branch in this code.

If we want to dig in to the coverage report the go tool has several options to print the coverage report. We can use go tool cover -func to break down the coverage per function:

% go tool cover -func=c.out
split/split.go:8: Split 100.0%
total: (statements) 100.0%

Which isn’t that exciting as we only have one function in this package, but I’m sure you’ll find more exciting packages to test.

Spray some .bashrc on that

This pair of commands is so useful for me I have a shell alias which runs the test coverage and the report in one command:

cover () {
local t=$(mktemp -t cover)
go test $COVERFLAGS -coverprofile=$t $@ \
&& go tool cover -func=$t \
&& unlink $t
}

Going beyond 100% coverage

So, we wrote one test case, got 100% coverage, but this isn’t really the end of the story. We have good branch coverage but we probably need to test some of the boundary conditions. For example, what happens if we try to split it on comma?

func TestSplitWrongSep(t *testing.T) {
got := Split("a/b/c", ",")
want := []string{"a/b/c"}
if !reflect.DeepEqual(want, got) {
t.Fatalf("expected: %v, got: %v", want, got)
}
}

Or, what happens if there are no separators in the source string?

func TestSplitNoSep(t *testing.T) {
got := Split("abc", "/")
want := []string{"abc"}
if !reflect.DeepEqual(want, got) {
t.Fatalf("expected: %v, got: %v", want, got)
}
}

We’re starting build a set of test cases that exercise boundary conditions. This is good.

Introducing table driven tests

However the there is a lot of duplication in our tests. For each test case only the input, the expected output, and name of the test case change. Everything else is boilerplate. What we’d like to to set up all the inputs and expected outputs and feel them to a single test harness. This is a great time to introduce table driven testing.

func TestSplit(t *testing.T) {
type test struct {
input string
sep string
want []string
}

tests := []test{
{input: "a/b/c", sep: "/", want: []string{"a", "b", "c"}},
{input: "a/b/c", sep: ",", want: []string{"a/b/c"}},
{input: "abc", sep: "/", want: []string{"abc"}},
}

for _, tc := range tests {
got := Split(tc.input, tc.sep)
if !reflect.DeepEqual(tc.want, got) {
t.Fatalf("expected: %v, got: %v", tc.want, got)
}
}
}

We declare a structure to hold our test inputs and expected outputs. This is our table. The tests structure is usually a local declaration because we want to reuse this name for other tests in this package.

In fact, we don’t even need to give the type a name, we can use an anonymous struct literal to reduce the boilerplate like this:

func TestSplit(t *testing.T) {
tests := []struct {
input string
sep string
want []string
}{
{input: "a/b/c", sep: "/", want: []string{"a", "b", "c"}},
{input: "a/b/c", sep: ",", want: []string{"a/b/c"}},
{input: "abc", sep: "/", want: []string{"abc"}},
}

for _, tc := range tests {
got := Split(tc.input, tc.sep)
if !reflect.DeepEqual(tc.want, got) {
t.Fatalf("expected: %v, got: %v", tc.want, got)
}
}
}

Now, adding a new test is a straight forward matter; simply add another line the tests structure. For example, what will happen if our input string has a trailing separator?

{input: "a/b/c", sep: "/", want: []string{"a", "b", "c"}},
{input: "a/b/c", sep: ",", want: []string{"a/b/c"}},
{input: "abc", sep: "/", want: []string{"abc"}},
{input: "a/b/c/", sep: "/", want: []string{"a", "b", "c"}}, // trailing sep

But, when we run go test, we get

% go test
--- FAIL: TestSplit (0.00s)
split_test.go:24: expected: [a b c], got: [a b c ]

Putting aside the test failure, there are a few problems to talk about.

The first is by rewriting each test from a function to a row in a table we’ve lost the name of the failing test. We added a comment in the test file to call out this case, but we don’t have access to that comment in the go test output.

There are a few ways to resolve this. You’ll see a mix of styles in use in Go code bases because the table testing idiom is evolving as people continue to experiment with the form.

Enumerating test cases

As tests are stored in a slice we can print out the index of the test case in the failure message:

func TestSplit(t *testing.T) {
tests := []struct {
input string
sep . string
want []string
}{
{input: "a/b/c", sep: "/", want: []string{"a", "b", "c"}},
{input: "a/b/c", sep: ",", want: []string{"a/b/c"}},
{input: "abc", sep: "/", want: []string{"abc"}},
{input: "a/b/c/", sep: "/", want: []string{"a", "b", "c"}},
}

for i, tc := range tests {
got := Split(tc.input, tc.sep)
if !reflect.DeepEqual(tc.want, got) {
t.Fatalf("test %d: expected: %v, got: %v", i+1, tc.want, got)
}
}
}

Now when we run go test we get this

% go test
--- FAIL: TestSplit (0.00s)
split_test.go:24: test 4: expected: [a b c], got: [a b c ]

Which is a little better. Now we know that the fourth test is failing, although we have to do a little bit of fudging because slice indexing—​and range iteration—​is zero based. This requires consistency across your test cases; if some use zero base reporting and others use one based, it’s going to be confusing. And, if the list of test cases is long, it could be difficult to count braces to figure out exactly which fixture constitutes test case number four.

Give your test cases names

Another common pattern is to include a name field in the test fixture.

func TestSplit(t *testing.T) {
tests := []struct {
name string
input string
sep string
want []string
}{
{name: "simple", input: "a/b/c", sep: "/", want: []string{"a", "b", "c"}},
{name: "wrong sep", input: "a/b/c", sep: ",", want: []string{"a/b/c"}},
{name: "no sep", input: "abc", sep: "/", want: []string{"abc"}},
{name: "trailing sep", input: "a/b/c/", sep: "/", want: []string{"a", "b", "c"}},
}

for _, tc := range tests {
got := Split(tc.input, tc.sep)
if !reflect.DeepEqual(tc.want, got) {
t.Fatalf("%s: expected: %v, got: %v", tc.name, tc.want, got)
}
}
}

Now when the test fails we have a descriptive name for what the test was doing. We no longer have to try to figure it out from the output—​also, now have a string we can search on.

% go test
--- FAIL: TestSplit (0.00s)
split_test.go:25: trailing sep: expected: [a b c], got: [a b c ]

We can dry this up even more using a map literal syntax:

func TestSplit(t *testing.T) {
tests := map[string]struct {
input string
sep string
want []string
}
{
"simple": {input: "a/b/c", sep: "/", want: []string{"a", "b", "c"}},
"wrong sep": {input: "a/b/c", sep: ",", want: []string{"a/b/c"}},
"no sep": {input: "abc", sep: "/", want: []string{"abc"}},
"trailing sep": {input: "a/b/c/", sep: "/", want: []string{"a", "b", "c"}},
}

for name, tc := range tests {
got := Split(tc.input, tc.sep)
if !reflect.DeepEqual(tc.want, got) {
t.Fatalf("%s: expected: %v, got: %v", name, tc.want, got)
}
}
}

Using a map literal syntax we define our test cases not as a slice of structs, but as map of test names to test fixtures. There’s also a side benefit of using a map that is going to potentially improve the utility of our tests.

Map iteration order is undefined 1 This means each time we run go test, our tests are going to be potentially run in a different order.

This is super useful for spotting conditions where test pass when run in statement order, but not otherwise. If you find that happens you probably have some global state that is being mutated by one test with subsequent tests depending on that modification.

Introducing sub tests

Before we fix the failing test there are a few other issues to address in our table driven test harness.

The first is we’re calling t.Fatalf when one of the test cases fails. This means after the first failing test case we stop testing the other cases. Because test cases are run in an undefined order, if there is a test failure, it would be nice to know if it was the only failure or just the first.

The testing package would do this for us if we go to the effort to write out each test case as its own function, but that’s quite verbose. The good news is since Go 1.7 a new feature was added that lets us do this easily for table driven tests. They’re called sub tests.

func TestSplit(t *testing.T) {
tests := map[string]struct {
input string
sep string
want []string
}{
"simple": {input: "a/b/c", sep: "/", want: []string{"a", "b", "c"}},
"wrong sep": {input: "a/b/c", sep: ",", want: []string{"a/b/c"}},
"no sep": {input: "abc", sep: "/", want: []string{"abc"}},
"trailing sep": {input: "a/b/c/", sep: "/", want: []string{"a", "b", "c"}},
}

for name, tc := range tests {
t.Run(name, func(t *testing.T) {
got := Split(tc.input, tc.sep)
if !reflect.DeepEqual(tc.want, got) {
t.Fatalf("expected: %v, got: %v", tc.want, got)
}
})

}
}

As each sub test now has a name we get that name automatically printed out in any test runs.

% go test
--- FAIL: TestSplit (0.00s)
--- FAIL: TestSplit/trailing_sep (0.00s)
split_test.go:25: expected: [a b c], got: [a b c ]

Each subtest is its own anonymous function, therefore we can use t.Fatalft.Skipf, and all the other testing.Thelpers, while retaining the compactness of a table driven test.

Individual sub test cases can be executed directly

Because sub tests have a name, you can run a selection of sub tests by name using the go test -run flag.

% go test -run=.*/trailing -v
=== RUN TestSplit
=== RUN TestSplit/trailing_sep
--- FAIL: TestSplit (0.00s)
--- FAIL: TestSplit/trailing_sep (0.00s)
split_test.go:25: expected: [a b c], got: [a b c ]

Comparing what we got with what we wanted

Now we’re ready to fix the test case. Let’s look at the error.

--- FAIL: TestSplit (0.00s)
--- FAIL: TestSplit/trailing_sep (0.00s)
split_test.go:25: expected: [a b c], got: [a b c ]

Can you spot the problem? Clearly the slices are different, that’s what reflect.DeepEqual is upset about. But spotting the actual difference isn’t easy, you have to spot that extra space after c. This might look simple in this simple example, but it is any thing but when you’re comparing two complicated deeply nested gRPC structures.

We can improve the output if we switch to the %#v syntax to view the value as a Go(ish) declaration:

got := Split(tc.input, tc.sep)
if !reflect.DeepEqual(tc.want, got) {
t.Fatalf("expected: %#v, got: %#v", tc.want, got)
}

Now when we run our test it’s clear that the problem is there is an extra blank element in the slice.

% go test
--- FAIL: TestSplit (0.00s)
--- FAIL: TestSplit/trailing_sep (0.00s)
split_test.go:25: expected: []string{"a", "b", "c"}, got: []string{"a", "b", "c", ""}

But before we go to fix our test failure I want to talk a little bit more about choosing the right way to present test failures. Our Split function is simple, it takes a primitive string and returns a slice of strings, but what if it worked with structs, or worse, pointers to structs?

Here is an example where %#v does not work as well:

func main() {
type T struct {
I int
}
x := []*T{{1}, {2}, {3}}
y := []*T{{1}, {2}, {4}}
fmt.Printf("%v %v\n", x, y)
fmt.Printf("%#v %#v\n", x, y)
}

The first fmt.Printfprints the unhelpful, but expected slice of addresses; [0xc000096000 0xc000096008 0xc000096010] [0xc000096018 0xc000096020 0xc000096028]. However our %#v version doesn’t fare any better, printing a slice of addresses cast to *main.T;[]*main.T{(*main.T)(0xc000096000), (*main.T)(0xc000096008), (*main.T)(0xc000096010)} []*main.T{(*main.T)(0xc000096018), (*main.T)(0xc000096020), (*main.T)(0xc000096028)}

Because of the limitations in using any fmt.Printf verb, I want to introduce the go-cmp library from Google.

The goal of the cmp library is it is specifically to compare two values. This is similar to reflect.DeepEqual, but it has more capabilities. Using the cmp pacakge you can, of course, write:

func main() {
type T struct {
I int
}
x := []*T{{1}, {2}, {3}}
y := []*T{{1}, {2}, {4}}
fmt.Println(cmp.Equal(x, y)) // false
}

But far more useful for us with our test function is the cmp.Diff function which will produce a textual description of what is different between the two values, recursively.

func main() {
type T struct {
I int
}
x := []*T{{1}, {2}, {3}}
y := []*T{{1}, {2}, {4}}
diff := cmp.Diff(x, y)
fmt.Printf(diff)
}

Which instead produces:

% go run
{[]*main.T}[2].I:
-: 3
+: 4

Telling us that at element 2 of the slice of Ts the Ifield was expected to be 3, but was actually 4.

Putting this all together we have our table driven go-cmp test

func TestSplit(t *testing.T) {
tests := map[string]struct {
input string
sep string
want []string
}{
"simple": {input: "a/b/c", sep: "/", want: []string{"a", "b", "c"}},
"wrong sep": {input: "a/b/c", sep: ",", want: []string{"a/b/c"}},
"no sep": {input: "abc", sep: "/", want: []string{"abc"}},
"trailing sep": {input: "a/b/c/", sep: "/", want: []string{"a", "b", "c"}},
}

for name, tc := range tests {
t.Run(name, func(t *testing.T) {
got := Split(tc.input, tc.sep)
diff := cmp.Diff(tc.want, got)
if diff != "" {
t.Fatalf(diff)
}

})
}
}

Running this we get

% go test
--- FAIL: TestSplit (0.00s)
--- FAIL: TestSplit/trailing_sep (0.00s)
split_test.go:27: {[]string}[?->3]:
-: <non-existent>
+: ""
FAIL
exit status 1
FAIL split 0.006s

Using cmp.Diff our test harness isn’t just telling us that what we got and what we wanted were different. Our test is telling us that the strings are different lengths, the third index in the fixture shouldn’t exist, but the actual output we got an empty string, “”. From here fixing the test failure is straight forward.

You shouldn’t name your variables after their types for the same reason you wouldn’t name your pets “dog” or “cat”

The name of a variable should describe its contents, not the type of the contents. Consider this example:

var usersMap map[string]*User

What are some good properties of this declaration? We can see that it’s a map, and it has something to do with the *User type, so that’s probably good. But usersMapis a map and Go, being a statically typed language, won’t let us accidentally use a map where a different type is required, so the Map suffix as a safety precaution is redundant.

Now, consider what happens if we declare other variables using this pattern:

var (
companiesMap map[string]*Company
productsMap map[string]*Products
)

Now we have three map type variables in scope, usersMapcompaniesMap, and productsMap, all mapping strings to different struct types. We know they are maps, and we also know that their declarations prevent us from using one in place of another—​the compiler will throw an error if we try to use companiesMap where the code is expecting a map[string]*User. In this situation it’s clear that the Map suffix does not improve the clarity of the code, its just extra boilerplate to type.

My suggestion is avoid any suffix that resembles the type of the variable. Said another way, if users isn’t descriptive enough, then usersMap won’t be either.

This advice also applies to function parameters. For example:

type Config struct {
//
}

func WriteConfig(w io.Writer, config *Config)

Naming the *Config parameter config is redundant. We know it’s a pointer to a Config, it says so right there in the declaration. Instead consider if conf will do, or maybe just c if the lifetime of the variable is short enough.

This advice is more than just a desire for brevity. If there is more that one *Config in scope at any one time, calling them config1 and config2 is less descriptive than calling them original and updated . The latter are less likely to be accidentally transposed—something the compiler won’t catch—while the former differ only in a one character suffix.

Finally, don’t let package names steal good variable names. The name of an imported identifier includes its package name. For example the Context type in the context package will be known as context.Context when imported into another package . This makes it impossible to use context as a variable or type, unless of course you rename the import, but that’s throwing good after bad. This is why the local declaration for context.Context types is traditionally ctx. eg.

func WriteLog(ctx context.Context, message string)

A variable’s name should be independent of its type. You shouldn’t name your variables after their types for the same reason you wouldn’t name your pets “dog” or “cat”. You shouldn’t include the name of your type in the name of your variable for the same reason.

Eliminate error handling by eliminating errors

Go 2 aims to improve the overhead of error handling, but do you know what is better than an improved syntax for handling errors? Not needing to handle errors at all. Now, I’m not saying “delete your error handling code”, instead I’m suggesting changing your code so you don’t have as many errors to handle.

This article draws inspiration from a chapter in John Ousterhout’s, A philosophy of Software Design, “Define Errors Out of Existence”. I’m going to try to apply his advice to Go.


Here’s a function to count the number of lines in a file,

func CountLines(r io.Reader) (int, error) {
var (
br = bufio.NewReader(r)
lines int
err error
)

for {
_, err = br.ReadString('\n')
lines++
if err != nil {
break
}
}

if err != io.EOF {
return 0, err
}
return lines, nil
}

We construct a bufio.Reader, then sit in a loop calling the ReadString method, incrementing a counter until we reach the end of the file, then we return the number of lines read. That’s the code we wanted to write, instead CountLines is made more complicated by its error handling. For example, there is this strange construction:

                _, err = br.ReadString('\n')
lines++
if err != nil {
break
}

We increment the count of lines before checking the error—​that looks odd. The reason we have to write it this way is ReadString will return an error if it encounters an end-of-file—io.EOF—before hitting a newline character. This can happen if there is no trailing newline.

To address this problem, we rearrange the logic to increment the line count, then see if we need to exit the loop.1

But we’re not done checking errors yet. ReadString will return io.EOF when it hits the end of the file. This is expected, ReadString needs some way of saying stop, there is nothing more to read. So before we return the error to the caller of CountLine, we need to check if the error was not io.EOF, and in that case propagate it up, otherwise we return nil to say that everything worked fine. This is why the final line of the function is not simply

return lines, err

I think this is a good example of Russ Cox’s observation that error handling can obscure the operation of the function. Let’s look at an improved version.

func CountLines(r io.Reader) (int, error) {
sc := bufio.NewScanner(r)
lines := 0

for sc.Scan() {
lines++
}

return lines, sc.Err()
}

This improved version switches from using bufio.Reader to bufio.Scanner. Under the hood bufio.Scanner uses bufio.Reader adding a layer of abstraction which helps remove the error handling which obscured the operation of our previous version of CountLines 2

The method sc.Scan() returns true if the scanner has matched a line of text and has not encountered an error. So, the body of our for loop will be called only when there is a line of text in the scanner’s buffer. This means our revised CountLines correctly handles the case where there is no trailing newline, It also correctly handles the case where the file is empty.

Secondly, as sc.Scan returns false once an error is encountered, our for loop will exit when the end-of-file is reached or an error is encountered. The bufio.Scanner type memoises the first error it encounters and we recover that error once we’ve exited the loop using the sc.Err() method.

Lastly, buffo.Scanner takes care of handling io.EOF and will convert it to a nil if the end of file was reached without encountering another error.


My second example is inspired by Rob Pikes’ Errors are values blog post.

When dealing with opening, writing and closing files, the error handling is present but not overwhelming as, the operations can be encapsulated in helpers like ioutil.ReadFile and ioutil.WriteFile. However, when dealing with low level network protocols it often becomes necessary to build the response directly using I/O primitives, thus the error handling can become repetitive. Consider this fragment of a HTTP server which is constructing a HTTP/1.1 response.

type Header struct {
Key, Value string
}

type Status struct {
Code int
Reason string
}

func WriteResponse(w io.Writer, st Status, headers []Header, body io.Reader) error {
_, err := fmt.Fprintf(w, "HTTP/1.1 %d %s\r\n", st.Code, st.Reason)
if err != nil {
return err
}

for _, h := range headers {
_, err := fmt.Fprintf(w, "%s: %s\r\n", h.Key, h.Value)
if err != nil {
return err
}
}

if _, err := fmt.Fprint(w, "\r\n"); err != nil {
return err
}

_, err = io.Copy(w, body)
return err
}

First we construct the status line using fmt.Fprintf, and check the error. Then for each header we write the header key and value, checking the error each time. Lastly we terminate the header section with an additional \r\n, check the error, and copy the response body to the client. Finally, although we don’t need to check the error from io.Copy, we do need to translate it from the two return value form that io.Copy returns into the single return value that WriteResponse expects.

Not only is this a lot of repetitive work, each operation—fundamentally writing bytes to an io.Writer—has a different form of error handling. But we can make it easier on ourselves by introducing a small wrapper type.

type errWriter struct {
io.Writer
err error
}

func (e *errWriter) Write(buf []byte) (int, error) {
if e.err != nil {
return 0, e.err
}

var n int
n, e.err = e.Writer.Write(buf)
return n, nil
}

errWriter fulfils the io.Writer contract so it can be used to wrap an existing io.WritererrWriter passes writes through to its underlying writer until an error is detected. From that point on, it discards any writes and returns the previous error.

func WriteResponse(w io.Writer, st Status, headers []Header, body io.Reader) error {
ew := &errWriter{Writer: w}
fmt.Fprintf(ew, "HTTP/1.1 %d %s\r\n", st.Code, st.Reason)

for _, h := range headers {
fmt.Fprintf(ew, "%s: %s\r\n", h.Key, h.Value)
}

fmt.Fprint(ew, "\r\n")
io.Copy(ew, body)

return ew.err
}

Applying errWriter to WriteResponse dramatically improves the clarity of the code. Each of the operations no longer needs to bracket itself with an error check. Reporting the error is moved to the end of the function by inspecting the ew.err field, avoiding the annoying translation from io.Copy’s return values.


When you find yourself faced with overbearing error handling, try to extract some of the operations into a helper type.

Avoid package names like base, util, or common

Writing a good Go package starts with its name. Think of your package’s name as an elevator pitch, you have to describe what it does using just one word.

A common cause of poor package names are utility packages. These are packages where helpers and utility code congeal. These packages contain an assortment of unrelated functions, as such their utility is hard to describe in terms of what the package provides. This often leads to a package’s name being derived from what the package contains—utilities.

Package names like utils or helpers are commonly found in projects which have developed deep package hierarchies and want to share helper functions without introducing import loops. Extracting utility functions to new package breaks the import loop, but as the package stems from a design problem in the project, its name doesn’t reflect its purpose, only its function in breaking the import cycle.

[A little] duplication is far cheaper than the wrong abstraction.

Sandy Metz

My recommendation to improve the name of utils or helpers packages is to analyse where they are imported and move the relevant functions into the calling package. Even if this results in some code duplication this is preferable to introducing an import dependency between two packages. In the case where utility functions are used in many places, prefer multiple packages, each focused on a single aspect with a correspondingly descriptive name.

Packages with names like base or common are often found when functionality common to two or more related facilities, for example common types between a client and server or a server and its mock, has been refactored into a separate package. Instead the solution is to reduce the number of packages by combining client, server, and common code into a single package named after the facility the package provides.

For example, the net/http package does not have client and server packages, instead it has client.go and server.go files, each holding their respective types. transport.go holds for the common message transport code used by both HTTP clients and servers.

Name your packages after what they provide, not what they contain.

The office coffee model of concurrent garbage collection

Garbage collection is a field with its own terminology. Concepts like like mutators, card marking, and write barriers create a hurdle to understanding how garbage collectors work. Here’s an analogy to explain the operations of a concurrent garbage collector using everyday items found in the workplace.

Before we discuss the operation of concurrent garbage collection, let’s introduce the dramatis personae. In offices around the world you’ll find one of these:

In the workplace coffee is a natural resource. Employees visit the break room and fill their cups as required. That is, until the point someone goes to fill their cup only to discover the pot is empty!

Immediately the office is thrown into chaos. Meeting are called. Investigations are held. The perpetrator who took the last cup without refilling the machine is found and reprimanded. Despite many passive aggressive notes the situation keeps happening, thus a committee is formed to decide if a larger coffee pot should be requisitioned. Once the coffee maker is again full office productivity slowly returns to normal.

This is the model of stop the world garbage collection. The various parts of your program proceed through their day consuming memory, or in our analogy coffee, without a care about the next allocation that needs to be made. Eventually one unlucky attempt to allocate memory is made only to find the heap, or the coffee pot, exhausted, triggering a stop the world garbage collection.


Down the road at a more enlightened workplace, management have adopted a different strategy for mitigating their break room’s coffee problems. Their policy is simple: if the pot is more than half full, fill your cup and be on your way. However, if the pot is less than half full, before filling your cup, you must add a little coffee and a little water to the top of the machine. In this way, by the time the next person arrives for their re-up, the level in the pot will hopefully have risen higher than when the first person found it.

This policy does come at a cost to office productivity. Rather than filling their cup and hoping for the best, each worker may, depending on the aggregate level of consumption in the office, have to spend a little time refilling the percolator and topping up the water. However, this is time spent by a person who was already heading to the break room. It costs a few extra minutes to maintain the coffee machine, but does not impact their officemates who aren’t in need of caffeination. If several people take a break at the same time, they will all find the level in the pot below the half way mark and all proceed to top up the coffee maker–the more consumption, the greater the rate the machine will be refilled, although this takes a little longer as the break room becomes congested.

This is the model of concurrent garbage collection as practiced by the Go runtime (and probably other language runtimes with concurrent collectors). Rather than each heap allocation proceeding blindly until the heap is exhausted, leading to a long stop the world pause, concurrent collection algorithms spread the work of walking the heap to find memory which is no longer reachable over the parts of the program allocating memory. In this way the parts of the program which allocate memory each pay a small cost–in terms of latency–for those allocations rather than the whole program being forced to halt when the heap is exhausted.

Lastly, in keeping with the office coffee model, if the rate of coffee consumption in the office is so high that management discovers that their staff are always in the break room trying desperately to refill the coffee machine, it’s time to invest in a machine with a bigger pot–or in garbage collection terms, grow the heap.

Internets of Interest #7: Ian Cooper on Test Driven Development

As the tech lead on non SaaS product I spend a lot of my time worrying about testing. Specifically we have tests that cover code, but what is covering the tests? Tests are important to give you certainty that what your product says on the tin is what it will do when people take it home and unwrap it, but what’s backstopping the tests? Testing lets you refactor with impunity, but what if you want to refactor your tests?

This presentation by Ian Cooper takes a little while to get going but is worth persisting with. Cooper’s observations that the unit of the unit test is not a type, or a class, but the API–in Go terms, the public API of a package–was revelatory for me.

Bonus: Michael Feathers’ YOW ! 2016 presentation; Testing Patience.

Maybe adding generics to Go IS about syntax after all

This is a short response to the recently announced Go 2 generics draft proposals

Update: This proposal is incomplete. It cannot replace two common use cases. The first is ensuring that several formal parameters are of the same type:

contract comparable(t T) {
t > t
}

func max(type T comparable)(a, b T) T

Here a, and b must be the same parameterised type — my suggestion would only assert that they had at least the same contract.

Secondly the it would not be possible to parameterise the type of return values:

contract viaStrings(t To, f From) {
var x string = f.String()
t.Set(string(""))
}

func SetViaStrings(type To, From viaStrings)(s []From) []To

Thanks to Ian Dawes and Sam Whited for their insight.

Bummer.


My lasting reaction to the Generics proposal is the proliferation of parenthesis in function declarations.

Although several of the Go team suggested that generics would probably be used sparingly, and the additional syntax would only be burden for the writer of the generic code, not the reader, I am sceptical that this long requested feature will be sufficiently niche as to be unnoticed by most Go developers.

It is true that type parameters can be inferred from their arguments, the declaration of generic functions and methods require a clumsy (type parameter declaration in place of the more common <T> syntaxes found in C++ and Java.

The reason for (type, it was explained to me, is Go is designed to be parsed without a symbol table. This rules out both <T> and [T] syntaxes as the parser needs ahead of time what kind of declaration a T is to avoid interpreting the angle or square braces as comparison or indexing operators respectively.

Contract as a superset of interfaces

The astute Roger Peppe quickly identified that contracts represent a superset of interfaces

Any behaviour you can express with an interface, you can do so and more, with a contract.

The remainder of this post are my suggestions for an alternative generic function declaration syntax that avoids add additional parenthesis by leveraging Roger’s observation.

Contract as a kind of type

The earlier Type Functions proposal showed that a type declaration can support a parameter. If this is correct, then the proposed contract declaration could be rewritten from

contract stringer(x T) {
var s string = x.String()
}

to

type stringer(x T) contract {
var s string = x.String()
}

This supports Roger’s observation that a contract is a superset of an interface. type stringer(x T) contract { ... } introduces a new contract type in the same way type stringer interface { ... }introduces a new interface type.

If you buy my argument that a contract is a kind of type is debatable, but if you’re prepared to take it on faith then the remainder of the syntax introduced in the generics proposal could be further simplified.

If contracts are types, use them as types

If a contract is an identifier then we can use a contract anywhere that a built-in type or interface is used. For example

func Stringify(type T stringer)(s []T) (ret []string) {
for _, v := range s {
ret = append(ret, v.String())
} return ret
}

Could be expressed as

func Stringify(s []stringer) (ret []string) {
for _, v := range s {
ret = append(ret, v.String())
} return ret
}

That is, in place of explicitly binding T to the contract stringer only for T to be referenced seven characters later, we bind the formal parameter s to a slice of stringer s directly. The similarity with the way this would previously be done with a stringer interface emphasises Roger’s observation.

1

Unifying unknown type parameters

The first example in the design proposal introduces an unknown type parameter.

func Print(type T)(s []T) {
for _, v := range s {
fmt.Println(v)
}
}

The operations on unknown types are limited, they are in some senses values that can only be read. Again drawing on Roger’s observation above, the syntax could potentially be expressed as:

func Print(s []contract{}) {
for _, v := range s {
fmt.Println(v)
}
}

Or maybe even

type T contract {} func Print(s []T) {
for _, v := range s {
fmt.Println(v)
}
}

In essence the literal contract{} syntax defines an anonymous unknown type analogous to interface{}‘s anonymous interface type.

Conclusion

The great irony is, after years of my bloviation that “adding generics to Go has nothing to do with the syntax”

2

, it turns out that, actually, yes, the syntax is crucial.

Using Go modules with Travis CI

In my previous post I converted httpstat to use Go 1.11’s upcoming module support. In this post I continue to explore integrating Go modules into a continuous integration workflow via Travis CI.

Life in mixed mode

The first scenario is probably the most likely for existing Go projects, a library or application targeting Go 1.10 and Go 1.11. httpstat has an existing CI story–I’m using Travis CI for my examples, if you use something else, please blog about your experience–and I wanted to test against the current and development versions of Go.

GO111MODULE

The straddling of two worlds is best accomplished via the GO111MODULE environment variable. GO111MODULE dictates when the Go module behaviour will be preferred over the Go 1.5-1.10’s vendor/ directory behaviour. In Go 1.11 the Go module behaviour is disabled by default for packages within $GOPATH (this also includes the default $GOPATH introduced in Go1.8). Thus, without additional configuration, Go1.11 inside Travis CI will behave like Go 1.10.

In my previous post I chose the working directory ~/devel/httpstat to ensure I was not working within a $GOPATH workspace. However CI vendors have worked hard to make sure that their CI bots always check out of the branch under test inside a working $GOPATH.

Fortunately there is a simple workaround for this, add env GO111MODULE=on before any go buildor test invocations in your .travis.yml to force Go module behaviour and ignore any vendor/ directories that may be present inside your repo.

1
language: go
go:
- 1.10.x
- master
os:
- linux
- osx
dist: trusty
sudo: false
install: true
script:
- env GO111MODULE=on go build
- env GO111MODULE=on go test

Creating a go.mod on the fly

You’ll note that I didn’t check in the go.mod module manifest I created in my previous post. This was initially an accident on my part, but one that turned out to be beneficial. By not checking in the go.mod file, the source of truth for dependencies remained httpstat’s Gopkg.toml file. When the call to env GO111MODULE=on go build executes on the Travis CI builder, the go tool converts my Gopkg.toml on the fly, then uses it to fetch dependencies before building.

$ env GO111MODULE=on go build
go: creating new go.mod: module github.com/davecheney/httpstat
go: copying requirements from Gopkg.lock
go: finding github.com/fatih/color v1.5.0
go: finding golang.org/x/sys v0.0.0-20170922123423-429f518978ab
go: finding golang.org/x/net v0.0.0-20170922011244-0744d001aa84
go: finding golang.org/x/text v0.0.0-20170915090833-1cbadb444a80
go: finding github.com/mattn/go-colorable v0.0.9
go: finding github.com/mattn/go-isatty v0.0.3
go: downloading github.com/fatih/color v1.5.0
go: downloading github.com/mattn/go-colorable v0.0.9
go: downloading github.com/mattn/go-isatty v0.0.3
go: downloading golang.org/x/net v0.0.0-20170922011244-0744d001aa84
go: downloading golang.org/x/text v0.0.0-20170915090833-1cbadb444a80

If you’re not using a dependency management tool that go mod knows how to convert from this advice may not work for you and you may have to maintain a go.mod manifest in parallel with you previous dependency management solution.

A clean slate

The second option I investigated, but ultimately did not pursue, was to treat the Travis CI builder, like my fresh Ubuntu 18.04 install, as a blank canvas. Rather than working around Travis CI’s attempts to check the branch out inside a working $GOPATH I experimented with treating the build as a C project

2

then invoking gimme directly. This also required me to check in my go.mod file as without Travis’ language: go support, the checkout was not moved into a $GOPATH folder. The latter seems like a reasonable approach if your project doesn’t intend to be compatible with Go 1.10 or earlier.

language: c
os:
- linux
- osx
dist: trusty
sudo: false
install:
- eval "$(curl -sL https://raw.githubusercontent.com/travis-ci/gimme/master/gimme | GIMME_GO_VERSION=master bash)"
script:
- go build
- go test

You can see the output from this branch here.

Sadly when run in this mode gimme is unable to take advantage of the caching provided by the language: go environment and must build Go 1.11 from source, adding three to four minutes delay to the install phase of the build. Once Go 1.11 is released and gimme can source a binary distribution this will hopefully address the setup latency.

Ultimately this option may end up being redundant if GO111MODULE=on becomes the default behaviour in Go 1.12 and the location Travis places the checkout becomes immaterial.