Category Archives: Small ideas

The story of the one line fix

Picture yourself, an engineer working at the hottest distributed microservices de jour, assigned to fix a bug. You jump into an unfamiliar codebase and quickly locate the line where the problem occurred. The fix is simple, just return early or substitute a default value in the case that one cannot be determined from your input. Boom, problem solved. Code compiles, fix goes out to production, and the card’s in the done pile in time for Tuesday drinks at the pub.

Now consider this alternative scenario. You find the problematic line and, before you make the fix, you decide to add a test so you will know that your fix worked, and hopefully someone will not accidentally revert it in the future.

You’ve figured out that some weird edge case causes that line to be called without being able to determine the right value for y. You can see it clearly, y ends up being zero so the program crashes. To write a test, you need to get the conditions that caused y to be zero to occur on command. There are only two parameters passed into this function, this shouldn’t be hard. Oh, but this is a method, not a function, so you need to construct an instance of the object in the right state to trigger the bug. Hmm, this language uses constructors to make sure people can’t just monkey up the bits of the object they need for a test, you’ll have to find, mock, stub or build instances of all the objects dependencies (and their dependencies, and so on) then run the object through the precise series of operations to put the it in the state that causes y to be empty. Then you can write the test.

At this point Tuesday night drinks are fading before you eyes. Without being able to write a test for your fix, how will you convince a reviewer that it actually fixed what it fixed? All you can prove was your “simple” change didn’t break any behaviour that was covered by existing tests.

Fast forward to Friday and your one line change has finally been merged, along with a few new test cases to make sure it doesn’t happen again. Next week you’re going to sit down and try to figure out a bunch of weird crashes you caused while trying to stub out calls to the database. Eventually you gave up and copied some code from another test that uses a copy of production data. It’s much slower than it needs to be, but it worked, and was the only way to put a reasonable estimate on a job that had already consumed your entire week.

The moral of the story; was the test the reason it took nearly a week to make a one line fix?

The Zen of Go

This article was derived from my GopherCon Israel 2020 presentation. It’s also quite long. If you’d prefer a shorter version, head over to the-zen-of-go.netlify.com.

A recording of the presentation is available on YouTube.

How should I write good code?

Something that I’ve been thinking about a lot recently, when reflecting on the body of my own work, is a common subtitle, how should I write good code? Given nobody actively seeks to write bad code, this leads to the question; how do you know when you’ve written good Go code?

If there’s a continuum between good and bad, how to do we know what the good parts are? What are its properties, its attributes, its hallmarks, its patterns, and its idioms?

Idiomatic Go

Which brings me to idiomatic Go. To say that something is idiomatic is to say that it follows the style of the time. If something is not idiomatic, it is not following the prevailing style. It is unfashionable.

More importantly, to say to someone that their code is not idiomatic does not explain why it’s not idiomatic. Why is this? Like all truths, the answer is found in the dictionary.

idiom (noun): a group of words established by usage as having a meaning not deducible from those of the individual words.

Idioms are hallmarks of shared values. Idiomatic Go is not something you learn from a book, it’s something that you acquire by being part of a community.

My concern with the mantra of idiomatic Go is, in many ways, it can be exclusionary. It’s saying “you can’t sit with us.” After all, isn’t that what we mean when critique of someone’s work as non-idiomatic? They didn’t do It right. It doesn’t look right. It doesn’t follow the style of time.

I offer that idiomatic Go is not a suitable mechanism for teaching how to write good Go code because it is defined, fundamentally, by telling someone they did it wrong. Wouldn’t it be better if the advice we gave didn’t alienate the author right at the point they were most willing to accept it?

Proverbs

Stepping away problematic idioms, what other cultural artefacts do Gophers have? Perhaps we can turn to Rob Pike’s wonderful Go Proverbs. Are these suitable teaching tools? Will these tell newcomers how to write good Go code?

In general, I don’t think so. This is not to dismiss Pike’s work, it is just that the Go Proverbs, like Segoe Kensaku’s original, are observations, not statements of value. Again, the dictionary comes to the rescue:

proverb (noun): a short, well-known pithy saying, stating a general truth or piece of advice.

The goal of the Go Proverbs are to reveal a deeper truth about the design of the language, but how useful is advice like the empty interface says nothing to a novice from a language that doesn’t have structural typing?

It’s important to recognise that, in a growing community, at any time the people learning Go far outnumber those who claim to have mastered the language. Thus proverbs are perhaps not the best teaching tool in this scenario.

Engineering Values

Dan Luu found an old presentation by Mark Lucovsky about the engineering culture of the windows team around the windows NT-windows 2000 timeframe. The reason I mention it is Lukovsky’s description of a culture as a common way of evaluating designs and making tradeoffs.

There are many ways of discussing culture, but with respect to an engineering culture Lucovsky’s description is apt. The central idea is values guide decisions in an unknown design space. The values of the NT team were; portability, reliability, security, and extensibility. Engineering values are, crudely translated, the way things are done around here.

Go’s values

What are the explicit values of Go? What are the core beliefs or philosophy that define the way a Go programmer interprets the world? How are they promulgated? How are they taught? How are they enforced? How do they change over time?

How will you, as a newly minted Go programmer, inculcate the engineering values of Go? Or, how will you, a seasoned Go professional promulgate your values to a future generations? And just so we’re clear, this process of knowledge transfer is not optional. Without new blood and new ideas, our community become myopic and wither.

The values of other languages

To set the scene for what I’m getting at we can look to other languages we see examples of their engineering values.

For example, C++ (and by extension Rust) believe that a programmer should not have to pay for a feature they do not use. If a program does not use some computationally expensive feature of the language, then it shouldn’t be forced to shoulder the cost of that feature. This value extends from the language, to its standard library, and is used as a yardstick for judging the design of all code written in C++.

In Java, and Ruby, and Smalltalk, the core value that everything is an object drives the design of programs around message passing, information hiding, and polymorphism. Designs that shoehorn a procedural style, or even a functional style, into these languages are considered to be wrong–or as Gophers would say, non idiomatic.

Turning to our own community, what are the engineering values that bind Go programmers? Discourse in our community is often fractious, so deriving a set of values from first principles would be a formidable challenge. Consensus is critical, but exponentially more difficult as the number of contributors to the discussion increases. But what if someone had done the hard work for us.

The Zen of Python Go

Several decades ago Tim Peters sat down and penned PEP-20, the Zen of Python. Peters’ attempted to document the engineering values that he saw Guido van Rossum apply in his role as BDFL for Python.

For the remainder of this article, I’m going to look towards the Zen of Python and ask, is there anything that can inform the engineering values of Go programmers?

A good package starts with a good name

Let’s start with something spicy,

“Namespaces are one honking great idea–let’s do more of those!”
The Zen of Python, Item 19

This is pretty unequivocal, Python programmers should use namespaces. Lots of them.

In Go parlance a namespace is a package. I doubt there is any question that grouping things into packages is good for design and potentially reuse. But there might be some confusion, especially if you’re coming with a decade of experience in another language, about the right way to do this.

In Go each package should have a purpose, and the best way to know a package’s purpose is by its name—a noun. A package’s name describes what it provides. So too reinterpret Peters’ words, every Go package should have a single purpose.

This is not a new idea, I’ve been saying this a while, but why should you do this rather than approach where packages are used for fine grained taxonomy? Why, because change.

“Design is the art of arranging code to work today, and be changeable forever.”
Sandi Metz

Change is the name of the game we’re in. What we do as programmers is manage change. When we do that well we call it design, or architecture. When we do it badly we call it technical debt, or legacy code.

If you are writing a program that works perfectly, one time, for one fixed set of inputs then nobody cares if the code is good or bad because ultimately the output of the program is all the business cares about.

But this is never true. Software has bugs, requirements change, inputs change, and very few programs are written solely to be executed once, thus your program will change over time. Maybe it’s you who’ll be tasked with this, more likely it will be someone else, but someone has to change that code. Someone has to maintain that code.

So, how can we make it easy to for programs to change? Interfaces everywhere? Make everything mockable? Pernicious dependency injection? Well, maybe, for some classes of programs, but not many, those techniques will be useful. However, for the majority of programs, designing something to be flexible up front is over engineering.

What if, instead, we take a position that rather than enhancing components, we replace them. Then the best way to know when something needs to be replaced, is when it doesn’t do what it says on the tin.

A good package starts with choosing a good name. Think of your package’s name as an elevator pitch, using just one word, to describe what it provides. When the name no longer matches the requirement, find a replacement.

Simplicity matters

“Simple is better than complex.”
The Zen of Python, Item 3

PEP-20 says simple is better than complex, I couldn’t agree more. A couple of years ago I made this tweet;

https://twitter.com/davecheney/status/539576755254611968

My observation, at least at the time, was that I couldn’t think of a language introduced in my life time that didn’t purport to be simple. Each new language offered as a justification, and an enticement, their inherent simplicity. But as I researched, I found that simplicity was not a core value of the many of the languages considered Go’s contemporaries. ¹ Maybe this is just a cheap shot, but could it be that either these languages aren’t simple, or they don’t think of themselves as being simple. They don’t consider simplicity to be a core value.

Call me old fashioned, but when did being simple fall out of style? Why does the commercial software development industry continually, gleefully, forget this fundamental truth?

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”
C. A. R. Hoare, The Emperor’s Old Clothes, 1980 Turing Award Lecture

Simple does not mean easy, we know that. Often it is more work to make something simple to use, than easy to build.

“Simplicity is prerequisite for reliability.”
Edsger W Dijkstra, EWD498, 18 June 1975

Why should we strive for simplicity? Why is important that Go programs be simple? Simple doesn’t mean crude, it means readable and maintainable. Simple doesn’t mean unsophisticated, it means reliable, relatable, and understandable.

“Controlling complexity is the essence of computer programming.”
Brian W. Kernighan, Software Tools (1976)

Whether Python abides by its mantra of simplicity is a matter for debate, but Go holds simplicity as a core value. I think that we can all agree that when it comes to Go, simple code is preferable to clever code.

Avoid package level state

“Explicit is better than implicit.”
The Zen of Python, Item 2

This is a place where I think Peters’ was more aspirational than factual. Many things in Python are not explicit; decorators, dunder methods, and so on. Without doubt they are powerful, there’s a reason those features exists. Each feature is something someone cared enough about to do the work to implement it, especially the complicated ones. But heavy use of those features makes is harder for the reader to predict the cost of an operation.

The good news is we have a choice, as Go programmers, to choose to make our code explicit. Explicit could mean many things, perhaps you may be thinking explicit is just a nice way of saying bureaucratic and long winded, but that’s a superficial interpretation. It’s a misnomer to focus only on the syntax on the page, to fret about line lengths and DRYing up expressions. The more valuable, in my opinon, place to be explicit are to do with coupling and with state.

Coupling is a measure of the amount one thing depends on another. If two things are tightly coupled, they move together. An action that affects one is directly reflected in another. Imagine a train, each carriage joined–ironically the correct word is coupled–together; where the engine goes, the carriages follow.

Another way to describe coupling is the word cohesion. Cohesion measures how well two things naturally belong together. We talk about a cohesive argument, or a cohesive team; all their parts fit together as if they were designed that way.

Why does coupling matter? Because just like trains, when you need to change a piece of code, all the code that is tightly coupled to it must change. A prime example, someone release a new version of their API and now your code doesn’t compile.

APIs are an unavoidable source of coupling but there are more insidious forms of coupling. Clearly everyone knows that if an API’s signature changes the data passing into and out of that call changes. It’s right there in the signature of the function; I take values of these types and return values of other types. But what if the API passed data another way? What if every time you called this API the result was based on the previous time you called that API even though you didn’t change your parameters.

This is state, and management of state is the problem in computer science.

package counter

var count int

func Increment(n int) int {
        count += n
        return count
}

Suppose we have this simple counter package. You can call Increment to increment the counter, you can even get the value back if you Increment with a value of zero.

Suppose you had to test this code, how would you reset the counter after each test? Suppose you wanted to run those tests in parallel, could you do it? Now suppose that you wanted to count more than one thing per program, could you do it?

No, of course not. Clearly the answer is to encapsulate the count variable in a type.

package counter

type Counter struct {
        count int
}

func (c *Counter) Increment(n int) int {
        c.count += n
        return c.count
}

Now imagine that this problem isn’t restricted to just counters, but your applications main business logic. Can you test it in isolation? Can you test it in parallel? Can you use more than one instance at a time? If the answer those question is no, the reason is package level state.

Avoid package level state. Reduce coupling and spooky action at a distance by providing the dependencies a type needs as fields on that type rather than using package variables.

Plan for failure, not success

“Errors should never pass silently.”
The Zen of Python, Item 10

It’s been said of languages that favour exception handling follow the Samurai principle; return victorious or not at all. In exception based languages functions only return valid results. If they don’t succeed then control flow takes an entirely different path.

Unchecked exceptions are clearly an unsafe model to program in. How can you possibly write code that is robust in the presence of errors when you don’t know which statements could throw an exception? Java tried to make exceptions safer by introducing the notion of a checked exception which, to the best of my knowledge, has not been repeated in another mainstream language. There are plenty of languages which use exceptions but they all, with the singular exception of Java, do so in the unchecked variety.

Obviously Go chose a different path. Go programmers believe that robust programs are composed from pieces that handle the failure cases before they handle the happy path. In the space that Go was designed for; server programs, multi threaded programs, programs that handle input over the network, dealing with unexpected data, timeouts, connection failures and corrupted data must be front and centre of the programmer’s mind if they are to produce robust programs.

“I think that error handling should be explicit, this should be a core value of the language.”
Peter Bourgon, GoTime #91

I want to echo Peter’s assertion, as it was the impetus for this article. I think so much of the success of Go is due to the explicit way errors are handled. Go programmers thinks about the failure case first. We solve the “what if…” case first. This leads to programs where failures are handled at the point of writing, rather than the point they occur in production.

The verbosity of

if err != nil {
    return err
}

is outweighed by the value of deliberately handling each failure condition at the point at which they occur. Key to this is the cultural value of handling each and every error explicitly.

Return early rather than nesting deeply

“Flat is better than nested.”
The Zen of Python, Item 5

This is sage advice coming from a language where indentation is the primary form of control flow. How can we interpret this advice in terms of Go? gofmt controls the overall whitespace of a Go program so there’s not thing doing there.

I wrote earlier about package names, and there is probably some advice here about avoiding a complicated package hierarchy. In my experience the more a programmer tries to subdivide and taxonimise their Go codebase the more they risk hitting the dead end that is package import loops.

I think the best application of item 5’s advice is the control flow within a function. Simply put, avoid control flow that requires deep indentation.

“Line of sight is a straight line along which an observer has unobstructed vision.”
May Ryer, Code: Align the happy path to the left edge

Mat Ryer describes this idea as line of sight coding. Light of sight coding means things like:

Using guard clauses to return early if a precondition is not met.
Placing the successful return statement at the end of the function rather than inside a conditional block.
Reducing the overall indentation level of the function by extracting functions and methods.

Key to this advice is the thing that you care about, the thing that the function does, is never in danger of sliding out of sight to the right of your screen. This style has a bonus side effect that you’ll avoid pointless arguments about line lengths on your team.

Every time you indent you add another precondition to the programmers stack, consuming one of their 7 ±2 short term memory slots. Rather than nesting deeply, keep the successful path of the function close to the left hand side of your screen.

If you think it’s slow, prove it with a benchmark

“In the face of ambiguity, refuse the temptation to guess.”
The Zen of Python, Item 12

Programming is based on mathematics and logic, two concepts which rarely involve the element of chance. But there are many things we, as programmers, guess about every day. What does this variable do? What does this parameter do? What happens if I pass nil here? What happens if I call Register twice? There’s actually a lot of guesswork in modern programming, especially when it comes to using libraries you didn’t write.

“APIs should be easy to use and hard to misuse.”
Josh Bloch

One of the best ways I know to help a programmer avoid having to guess is to, when building an API, focus on the default use case. Make it as easy as you can for the caller to do the most common thing. However, I’ve written and talked a lot about API design in the past, so instead my interpretation of item 12 is; don’t guess about performance.

Despite how you may feel about Knuth’s advice, one of the drivers of Go’s success is its efficient execution. You can write efficient programs in Go and thus people will choose Go because of this. There are a lot of misconceptions about performance, so my request is, when you’re looking to performance tune your code or you’re facing some dogmatic advice like defer is slow, CGO is expensive, or always use atomics not mutexes, don’t guess.

Don’t complicate your code because of outdated dogma, and, if you think something is slow, first prove it with a benchmark. Go has excellent benchmarking and profiling tools that come in the distribution for free. Use them to find your bottlenecks.

Before you launch a goroutine, know when it will stop

At this point I think I think I’ve mined the valuable points from PEP-20 and possibly stretched its reinterpretation beyond the point of good taste. I think that’s fine, because although this was a useful rhetorical device, ultimately we are talking about two different languages.

“You type g o, a space, and then a function call. Three keystrokes, you can’t make it much shorter than that. Three keystrokes and you’ve just started a sub process.”
Rob Pike, Simplicity is Complicated, dotGo 2015

The next two suggestions I’ll dedicate to goroutines. Goroutines are the signature feature of the language, our answer for first class concurrency. They are so easy to use, just put the word go in front of the statement and you’ve launched that function asynchronously. It’s so simple, no threads, no stack sizes, no thread pool executors, no ID’s, no tracking completion status.

Goroutines are cheap. Because of the runtime’s ability to multiplex goroutines onto a small pool of threads (which you don’t have to manage), hundreds of thousands, millions of goroutines are easily accommodated. This opens up designs that would be not be practical under competing concurrency models like threads or evented callbacks.

But as cheap as goroutines are, they’re not free. At a minimum there’s a few kilobytes for their stack, which, when you’re getting up into the 10^6 goroutines, does start to add up. This is not to say you shouldn’t use millions of goroutines if that is what the design calls for, but when you do, it’s critical that you keep track of them because 10^6 of anything can consume a non trivial amount of resources in aggregate.

Goroutines are the key to resource ownership in Go. To be useful a goroutine has to do something, and that means it almost always holds reference to, or ownership of, a resource; a lock, a network connection, a buffer with data, the sending end of a channel. While that goroutine is alive, the lock is held, the network connection remains open, the buffer retained and the receivers of the channel will continue to wait for more data.

The simplest way to free those resources is to tie them to the lifetime of the goroutine–when the goroutine exits, the resource has been freed. So while it’s near trivial to start a goroutine, before you write those three letters, g o and a space, make sure you have an answer to these questions:

Under what condition will a goroutine stop? Go doesn’t have a way to tell a goroutine to exit. There is no stop or kill function, for good reason. If we cannot command a goroutine to stop, we must instead ask it, politely. Almost always this comes down to a channel operation. Range loops over a channel exit when the channel is closed. A channel will become selectable if it is closed. The signal from one goroutine to another is best expressed as a closed channel.
What is required for that condition to arise? If channels are both the vehicle to communicate between goroutines and the mechanism for them to signal completion, the next question to the programmer becomes, who will close the channel, when will that happen?
What signal will you use to know the goroutine has stopped? When you signal a goroutine to stop, that stopping will happen at some time in the future relative to the goroutine’s frame of reference. It might happen quickly in terms of human perception, but computers execute billions of instructions every second, and from the point of view of each goroutine, their execution of instructions is unsynchronised. The solution is often to use a channel to signal back or a waitgroup where a fan in approach is needed.

Leave concurrency to the caller

It is likely that in any serious Go program you write there will be concurrency involved. This raises the problem, many of the libraries and code that we write fall into this a one goroutine per connection, or worker pattern. How will you manage the lifetime of those goroutines?

net/http is a prime example. Shutting down the server owning the listening socket is relatively straight forward, but what about a goroutines spawned from that accepting socket? net/http does provide a context object inside the request object which can be used to signal–to code that is listening–that the request should be canceled, thereby terminating the goroutine, however it is less clear how to know when all of these things have been done. It’s one thing to call context.Cancel, its another to know that the cancellation has completed.²

The point I want to make about net/http is that its a counter example to good practice. Because each connection is handled by a goroutine spawned inside the net/http.Server type, the program, living outside the net/http package, does not have an ability to control the goroutines spawned for the accepting socket.

This is an area of design that is still evolving, with efforts like go-kit’s run.Group and the Go team’s ErrGroup which provide a framework to execute, cancel and wait on functions run asynchronously.

The bigger design maxim here is for library writers, or anyone writing code that could be run asynchronously, leave the responsibility of starting to goroutine to your caller. Let the caller choose how they want to start, track, and wait on your functions execution.

Write tests to lock in the behaviour of your package’s API

Perhaps you were hoping to read an article from me where I didn’t rant about testing. Sadly, today is not that day.

Your tests are the contract about what your software does and does not do. Unit tests at the package level should lock in the behaviour of the package’s API. They describe, in code, what the package promises to do. If there is a unit test for each input permutation, you have defined the contract for what the code will do in code, not documentation.

This is a contract you can assert as simply as typing go test. At any stage, you can know with a high degree of confidence, that the behaviour people relied on before your change continues to function after your change.

Tests lock in api behaviour. Any change that adds, modifies or removes a public api must include changes to its tests.

Moderation is a virtue

Go is a simple language, only 25 keywords. In some ways this makes the features that are built into the language stand out. Equally these are the features that the language sells itself on, lightweight concurrency, structural typing.

I think all of us have experienced the confusion that comes from trying to use all of Go’s features at once. Who was so excited to use channels that they used them as much as they could, as often as they could? Personally for me I found the result was hard to test, fragile, and ultimately overcomplicated. Am I alone?

I had the same experience with goroutines, attempting to break the work into tiny units I created a hard to manage hurd of Goroutines and ultimately missed the observation that most of my goroutines were always blocked waiting for their predecessor– the code was ultimately sequential and I had added a lot of complexity for little real world benefit. Who has experienced something like this?

I had the same experience with embedding. Initially I mistook it for inheritance. Then later I recreated the fragile base class problem by composing complicated types, which already had several responsibilities, into more complicated mega types.

This is potentially the least actionable piece of advice, but one I think is important enough to mention. The advice is always the same, all things in moderation, and Go’s features are no exception. If you can, don’t reach for a goroutine, or a channel, or embed a struct, anonymous functions, going overboard with packages, interfaces for everything, instead prefer simpler approach rather than the clever approach.

Maintainability counts

I want to close with one final item from PEP-20,

“Readability Counts.”
The Zen of Python, Item 7

So much has been said, about the importance of readability, not just in Go, but all programming languages. People like me who stand on stages advocating for Go use words like simplicity, readability, clarity, productivity, but ultimately they are all synonyms for one word–maintainability.

The real goal is to write maintainable code. Code that can live on after the original author. Code that can exist not just as a point in time investment, but as a foundation for future value. It’s not that readability doesn’t matter, maintainability matters more.

Go is not a language that optimises for clever one liners. Go is not a language which optimises for the least number of lines in a program. We’re not optimising for the size of the source code on disk, nor how long it takes to type the program into an editor. Rather, we want to optimise our code to be clear to the reader. Because its the reader who’s going to have to maintain this code.

If you’re writing a program for yourself, maybe it only has to run once, or you’re the only person who’ll ever see it, then do what ever works for you. But if this is a piece of software that more than one person will contribute to, or that will be used by people over a long enough time that requirements, features, or the environment it runs in may change, then your goal must be for your program to be maintainable. If software cannot be maintained, then it will be rewritten; and that could be the last time your company will invest in Go.

Can the thing you worked hard to build be maintained after you’re gone? What can you do today to make it easier for someone to maintain your code tomorrow?

the-zen-of-go.netlify.com

Dynamically scoped variables in Go

This is a thought experiment in API design. It starts with the classic Go unit testing idiom:

func TestOpenFile(t *testing.T) {
        f, err := os.Open("notfound")
        if err != nil {
                t.Fatal(err)
        }

        // ...
}

What’s the problem with this code? The assertion. if err != nil { ... } is repetitive and in the case where multiple conditions need to be checked, somewhat error prone if the author of the test uses t.Error not t.Fatal, eg:

        f, err := os.Open("notfound")
        if err != nil {
                t.Error(err)
        }
        f.Close() // boom!

What’s the solution? DRY it up, of course, by moving the repetitive assertion logic to a helper:

func TestOpenFile(t *testing.T) {
        f, err := os.Open("notfound")
        check(t, err)

        // ...
}
 
func check(t *testing.T, err error) {
       if err != nil {
                t.Helper()
                t.Fatal(err)
        }
}

Using the check helper the code is a little cleaner, and clearer, check the error, and hopefully the indecision between t.Error and t.Fatal has been solved. The downside of abstracting the assertion to a helper function is now you need to pass a testing.T into each and every invocation. Worse, you need to pass a *testing.T to everything that needs to call check, transitively, just in case.

This is ok, I guess, but I will make the observation that the t variable is only needed when the assertion fails — and even in a testing scenario, most of the time, most of the tests pass, so that means reading, and writing, all these t‘s is a constant overhead for the relatively rare occasion that a test fails.

What about if we did something like this instead?

func TestOpenFile(t *testing.T) {
        f, err := os.Open("notfound")
        check(err)
 
        // ...
}
 
func check(err error) {
        if err != nil {
                panic(err.Error())
        }
}

Yeah, that’ll work, but it has a few problems

% go test
--- FAIL: TestOpenFile (0.00s)
panic: open notfound: no such file or directory [recovered]
        panic: open notfound: no such file or directory

goroutine 22 [running]:
testing.tRunner.func1(0xc0000b4400)
        /Users/dfc/go/src/testing/testing.go:874 +0x3a3
panic(0x111b040, 0xc0000866f0)
        /Users/dfc/go/src/runtime/panic.go:679 +0x1b2
github.com/pkg/expect_test.check(...)
        /Users/dfc/src/github.com/pkg/expect/expect_test.go:18
github.com/pkg/expect_test.TestOpenFile(0xc0000b4400)
        /Users/dfc/src/github.com/pkg/expect/expect_test.go:10 +0xa1
testing.tRunner(0xc0000b4400, 0x115ac90)
        /Users/dfc/go/src/testing/testing.go:909 +0xc9
created by testing.(*T).Run
        /Users/dfc/go/src/testing/testing.go:960 +0x350
exit status 2

Let’s start with the good; we didn’t have to pass a testing.T every place we call check, the test fails immediately, and we get a nice message in the panic — albeit twice. But where the assertion failed is hard to see. It occurred on expect_test.go:11 but you’d be forgiven for not knowing that.

So panic isn’t really a good solution, but there’s something in this stack trace that is — can you see it? Here’s a hint, github.com/pkg/expect_test.TestOpenFile(0xc0000b4400).

TestOpenFile has a t value, it was passed to it by tRunner, so there’s a testing.T in memory at address 0xc0000b4400. What if we could get access to that t inside check? Then we could use it to call t.Helper and t.Fatal. Is that possible?

Dynamic scoping

What we want is to be able to access a variable whose declaration is neither global, or local to the function, but somewhere higher in the call stack. This is called dynamic scoping. Go doesn’t support dynamic scoping, but it turns out, for restricted cases, we can fake it. I’ll skip to the chase:

// getT returns the address of the testing.T passed to testing.tRunner
// which called the function which called getT. If testing.tRunner cannot
// be located in the stack, say if getT is not called from the main test
// goroutine, getT returns nil.
func getT() *testing.T {
        var buf [8192]byte
        n := runtime.Stack(buf[:], false)
        sc := bufio.NewScanner(bytes.NewReader(buf[:n]))
        for sc.Scan() {
                var p uintptr
                n, _ := fmt.Sscanf(sc.Text(), "testing.tRunner(%v", &p)
                if n != 1 {
                        continue
                }
                return (*testing.T)(unsafe.Pointer(p))
        }
        return nil
}

We know that each Test is called by the testing package in its own goroutine (see the stack trace above). The testing package launches the test via a function called tRunner which takes a *testing.T and a func(*testing.T) to invoke. Thus we grab a stack trace of the current goroutine, scan through it for the line beginning with testing.tRunner — which can only be the testing package as tRunner is a private function — and parse the address of the first parameter, which is a pointer to a testing.T. With a little unsafe we convert the raw pointer back to a *testing.T and we’re done.

If the search fails then it is likely that getT wasn’t called from a Test. This is actually ok because the reason we needed the *testing.T was to call t.Fatal and the testing package already requires that t.Fatal be called from the main test goroutine.

import "github.com/pkg/expect"

func TestOpenFile(t *testing.T) {
        f, err := os.Open("notfound")
        expect.Nil(err)
 
        // ...
}

Putting it all together we’ve eliminated the assertion boilerplate and possibly made the expectation of the test a little clearer to read, after opening the file err is expected to be nil.

Is this fine?

At this point you should be asking, is this fine? And the answer is, no, this is not fine. You should be screaming internally at this point. But it’s probably worth introspecting those feelings of revulsion.

Apart from the inherent fragility of scrobbling around in a goroutine’s call stack, there are some serious design issues:

The expect.Nil‘s behaviour now depends on who called it. Provided with the same arguments it may have different behaviour depending on where it appears in the call stack — this is unexpected.
Taken to the extreme dynamic scoping effective brings into the scope of a single function all the variables passed into any function that preceded it. It is a side channel for passing data in to and out of functions that is not explicitly documented in function declaration.

Ironically these are precisely the critiques I have of context.Context. I’ll leave it to you to decide if they are justified.

A final word

This is a bad idea, no argument there. This is not a pattern you should ever use in production code. But, this isn’t production code, it’s a test, and perhaps there are different rules that apply to test code. After all, we use mocks, and stubs, and monkey patching, and type assertions, and reflection, and helper functions, and build flags, and global variables, all so we can test our code effectively. None of those, uh, hacks will ever show up in the production code path, so is it really the end of the world?

If you’ve read this far perhaps you’ll agree with me that as unconventional as this approach is, not having to pass a *testing.T into every function that could possibly need to assert something transitively, makes for clearer test code.

So maybe, in this case, the ends do justify the means.

If you’re interested, I’ve put together a small assertion library using this pattern. Caveat emptor.

Complementary engineering indicators

Last year I had the opportunity to watch Cat Swetel’s presentation The Development Metrics You Should Use (but Don’t). The information that could be gleaned from just tracking the start and finish date of work items was eye opening. If you’re using an issue tracker this information is probably already (perhaps with some light data munging) available — no need for TPS reports. Additionally, statistics obtained by data mining your project’s issue tracker are, perhaps, less likely to be juked.

Around the time I saw Cat’s presentation I finished reading Andy Grove’s High Output Management. The hidden gem in this book (assuming becoming a meeting powerhouse isn’t your bag) was Grove’s notion of indicator pairs. An example of a paired indicator might be the number of sales deals closed paired with the customer retention rate. The underling principle being optimising for one indicator will have an adverse impact on the other. In the example, overly aggressive or deceptive tactics could superficially raise the number of sales made, but would be reflected in a dip in the retention rate as customers returned the product or terminated their service prematurely.

These ideas lead me to thinking about indicators you could use for a team delivering a software product. Could those indicators be derived cheaply from the hand to hand combat of software delivery? Could they be structured in a way that aggressively pursuing one metric would be reflected negatively in another? I think so.

These are the three metrics that I’ve been using to track the health of the project that I lead.

Date; was the software done when we said it would be done. If you prefer this indicator as a scalar, how many days difference is there between the ship date agreed on at the start of the sprint/milestone/whatever and what was the actual date that you considered it done.
Completeness; when the software is done, how many of the things we said we’re going to do actually got delivered in that release.
Defects reported; once the software is in the field, what is the rate of bugs reported.

It is relatively easy, for example, to hit a delivery date if you aggressively descope anything risky or simply don’t do it. But in doing so this lack of promised functionality would impact the completeness metric.

Conversely, it’s straight forward to hit your milestone’s completeness target if you let the release date slip and slip. Bringing both the metics into line requires good estimation skills to judge how much can be attempted in milestone and provide direct feedback if your estimation skills needed work.

The third indicator, defects reported in the field, acts as a check on the other two. It would be easy to consistent hit your delivery date with 100% feature completion if your team does a shoddy job. The high fives and :tada: emojis will be short lived if each release brings with it a swathe of high priority bug reports. This indicator also tends to have a second order effect, rushed features to meet a deadline tend to generate remedial work in the following milestones, crowding out promised work or blowing later deadlines.

I consider these to be complementary metrics, they should be considered together, as a group, rather than individually. Ideally your team should be delivering what you promised, when you promised it, with a low defect rate. But more importantly, if that isn’t the case, if one of the indicators is unhealthy, addressing it shouldn’t result in the problem moving to another.

Clear is better than clever

This article is based on my GopherCon Singapore 2019 presentation. In the presentation I referenced material from my post on declaring variables and my GolangUK 2017 presentation on SOLID design. For brevity those parts of the talk have been elided from this article. If you prefer, you can watch the recording of the talk.

Readability is often cited as one of Go’s core tenets, I disagree. In this article I’ll discuss the differences between clarity and readability, show you what I mean by clarity and how it applies to Go code, and argue that Go programmers should strive for clarity–not just readability–in their programs.

Why would I read your code?

Before I pick apart the difference between clarity and readability, perhaps the question to ask is, “why would I read your code?” To be clear, when I say I, I don’t mean me, I mean you. And when I say your code I also mean you, but in the third person. So really what I’m asking is, “why would you read another person’s code?”

I think Russ Cox, paraphrasing Titus Winters, put it best:

Software engineering is what happens to programming when you add time and other programmers.
–Russ Cox, GopherCon Singapore 2018

The answer to the question, “why would I read your code” is, because we have to work together. Maybe we don’t work in the same office, or live in the same city, maybe we don’t even work at the same company, but we do collaborate on a piece of software, or more likely consume it as a dependency.

This is the essence of Russ and Titus’ observation; software engineering is the collaboration of software engineers over time. I have to read your code, and you read mine, so that I can understand it, so that you can maintain it, and in short, so that any programmer can change it.

Russ is making the distinction between software programming and software engineering. The former is a program you write for yourself, the latter is a program, a project, a service, a product, that many people will contribute to over time. Engineers will come and go, teams will grow and shrink, requirements will change, features will be added and bugs fixed. This is the nature of software engineering.

We don’t read code, we decode it

It was sometime after that presentation that I finally realized the obvious: Code is not literature. We don’t read code, we decode it.
–Peter Seibel

The author Peter Seibel suggests that programs are not read, but are instead decoded. In hindsight this is obvious, after all we call it source code, not source literature. The source code of a program is an intermediary form, somewhere between our concept–what’s inside our heads–and the computer’s executable notation.

In my experience, the most common complaint when faced with a foreign codebase written by someone, or some team, is the code is unreadable. Perhaps you agree with me?

But readability as a concept is subjective. Readability is nit picking about line length and variable names. Readability is holy wars about brace position. Readability is the hand to hand combat of style guides and code review guidelines that regulate the use of whitespace.

Clarity ≠ Readability

Clarity, on the other hand, is the property of the code on the page. Clear code is independent of the low level details of function names and indentation because clear code is concerned with what the code is doing, not just how it is written down.

When you or I say that a foreign codebase is unreadable, what I think what we really mean is, I don’t understand it. For the remainder of this article I want to try to explore the difference between clear code and code that is simply readable, because the goal is not how quickly you can read a piece of code, but how quickly you can grasp its meaning.

Keep to the left

Go programs are traditionally written in a style that favours guard clauses and preconditions. This encourages the successful path to proceed down the page rather than indented inside a conditional block. Mat Ryer calls this line of sight coding, because, the active part of your function is not at risk of sliding out of sight beyond the right hand margin of your screen.

By keeping conditional blocks short, and for the exceptional condition, we avoid nested blocks and potentially complex value shadowing. The successful flow of control continues down the page. At every point in the sequence of statements, if you’ve arrived at that point, you are confident that a growing set of preconditions holds true.

func ReadConfig(path string) (*Config, error) {
        f, err := os.Open(path)
        if err != nil {
                return nil, err
        }
        defer f.Close()
        // ...
 }

The canonical example of this is the classic Go error check idiom; if err != nil then return it to the caller, else continue with the function. We can generalise this pattern a little and in pseudocode we have:

if some condition {
        // true: cleanup
        return
 }
 // false: continue

If some condition is true, then return to the caller, else continue onwards towards the end of the function.

This form holds true for all preconditions, error checks, map lookups, length checks, and so forth. The exact form of the precondition’s check changes, but the pattern is always the same; the cleanup code is inside the block, terminating with a return, the success condition lies outside the block, and is only reachable if the precondition is false.

Even if you are unsure what the preceding and succeeding code does, how the precondition is formed, and how the cleanup code works, it is clear to the reader that this is a guard clause.

Structured programming

Here we have a comp function that takes two ints and returns an int;

func comp(a, b int) int {
        if a < b {
                return -1
        }
        if a > b {
                return 1
        }
        return 0
}

The comp function is written in a similar form to guard clauses from earlier. If a is less than b, the return -1 path is taken. If a is greater than b, the return 1 path is taken. Else, a and b are by induction equal, so the final return 0 path is taken.

func comp(a, b int) int {
        if condition A {
                body A
        }
        if condition B {
                 body B
        }
        return 0
}

The problem with comp as written is, unlike the guard clause, someone maintaining this function has to read all of it. To understand when 0 is returned, the reader has to consult the conditions and the body of each clause. This is reasonable when you’re dealing with functions which fit on a slide, but in the real world complicated functions–the ones we’re paid for our expertise to maintain–are rarely slide sized, and their conditions and bodies are rarely simple.

Let’s address the problem of making it clear under which condition 0 will be returned:

func comp(a, b int) int {
        if a < b {
                return -1
        } else if a > b {
                return 1
        } else {
                return 0
        }
}

Now, although this code is not what anyone would argue is readable–long chains of if else if statements are broadly discouraged in Go–it is clearer to the reader that zero is only returned if none of the conditions are met.

How do we know this? The Go spec declares that each function that returns a value must end in a terminating statement. This means that the body of all conditions must return a value. Thus, this does not compile:

func comp(a, b int) int {
        if a > b {
                a = b // does not compile
        } else if a < b {
                return 1
        } else {
                return 0
        }
}

Further, it is now clear to the reader that this code isn’t actually a series of conditions. This is an example of selection. Only one path can be taken regardless of the operation of the condition blocks. Based on the inputs one of -1, 0, or 1 will always be returned.

func comp(a, b int) int {
        if a < b {
                return -1
        } else if a > b {
                return 1
        } else {
                return 0
        }
}

However this code is hard to read as each of the conditions is written differently, the first is a simple if a < b, the second is the unusual else if a > b, and the last conditional is actually unconditional.

But it turns out there is a statement which we can use to make our intention much clearer to the reader; switch.

func comp(a, b int) int {
        switch {
        case a < b:
                return -1
        case a > b:
                return 1
        default:
                return 0
        }
}

Now it is clear to the reader that this is a selection. Each of the selection conditions are documented in their own case statement, rather than varying else or else if clauses.

By moving the default condition inside the switch, the reader only has to consider the cases that match their condition, as none of the cases can fall out of the switch block because of the default clause.³

Structured programming submerges structure and emphasises behaviour
–Richard Bircher, The limits of software

I found this quote recently and I think it is apt. My arguments for clarity are in truth arguments intended to emphasise the behaviour of the code, rather than be side tracked by minutiae of the structure itself. Said another way, what is the code trying to do, not how is it is trying to do it.

Guiding principles

I opened this article with a discussion of readability vs clarity and hinted that there were other principles of well written Go code. It seems fitting to close on a discussion of those other principles.

Last year Bryan Cantrill gave a wonderful presentation on operating system principles, wherein he highlighted that different operating systems focus on different principles. It is not that they ignore the principles that differ between their competitors, just that when the chips are down, they prioritise a core set. So what is that core set of principles for Go?

Clarity

If you were going to say readability, hopefully I’ve provided you with an alternative.

Programs must be written for people to read, and only incidentally for machines to execute.
–Hal Abelson and Gerald Sussman. Structure and Interpretation of Computer Programs

Code is read many more times than it is written. A single piece of code will, over its lifetime, be read hundreds, maybe thousands of times. It will be read hundreds or thousands of times because it must be understood. Clarity is important because all software, not just Go programs, is written by people to be read by other people. The fact that software is also consumed by machines is secondary.

The most important skill for a programmer is the ability to effectively communicate ideas.
–Gastón Jorquera

Legal documents are double spaced to aide the reader, but to the layperson that does nothing to help them comprehend what they just read. Readability is a property of how easy it was to read the words on the screen. Clarity, on the other hand, is the answer to the question “did you understand what you just read?”.

The first step towards writing maintainable code is making sure intent of the code is clear.

Simplicity

The next principle is obviously simplicity. Some might argue the most important principle for any programming language, perhaps the most important principle full stop.

Why should we strive for simplicity? Why is important that Go programs be simple?

The ability to simplify means to eliminate the unnecessary so that the necessary may speak
–Hans Hofmann

We’ve all been in a situation where we say “I can’t understand this code”. We’ve all worked on programs we were scared to make a change because we worried that it’ll break another part of the program; a part you don’t understand and don’t know how to fix.

This is complexity. Complexity turns reliable software in unreliable software. Complexity is what leads to unmaintainable software. Complexity is what kills software projects. Clarity and simplicity are interlocking forces that lead to maintainable software.

Productivity

The last Go principle I want to highlight is productivity. Developer productivity boils down to this; how much time do you spend doing useful work verses waiting for your tools or hopelessly lost in a foreign code-base? Go programmers should feel that they can get a lot done with Go.

“I started another compilation, turned my chair around to face Robert, and started asking pointed questions. Before the compilation was done, we’d roped Ken in and had decided to do something.”
–Rob Pike, Less is Exponentially more

The joke goes that Go was designed while waiting for a C++ program to compile. Fast compilation is a key feature of Go and a key recruiting tool to attract new developers. While compilation speed remains a constant battleground, it is fair to say that compilations which take minutes in other languages, take seconds in Go. This helps Go developers feel as productive as their counterparts working in dynamic languages without the maintenance issues inherent in those languages.

Design is the art of arranging code to work today, and be changeable forever.
–Sandi Metz

More fundamental to the question of developer productivity, Go programmers realise that code is written to be read and so place the act of reading code above the act of writing it. Go goes so far as to enforce, via tooling and custom, that all code be formatted in a specific style. This removes the friction of learning a project specific dialect and helps spot mistakes because they just look incorrect.

Go programmers don’t spend days debugging inscrutable compile errors. They don’t waste days with complicated build scripts or deploying code to production. And most importantly they don’t spend their time trying to understand what their coworker wrote.

Complexity is anything that makes software hard to understand or to modify.
–John Ousterhout, A Philosophy of Software Design

Something I know about each of you reading this post is you will eventually leave your current employer. Maybe you’ll be moving on to a new role, or perhaps a promotion, perhaps you’ll move cities, or follow your partner overseas. Whatever the reason, we must all consider the succession of the maintainership of the programs we create.

If we strive to write programs that are clear, programs that are simple, and to focus on the productivity of those working with us that will set all Go programmers in good stead.

Because if we don’t, as we move from job to job, we’ll leave behind programs which cannot be maintained. Programs which cannot be changed. Programs which are too hard to onboard new developers, and programs which feel like career digression for those that work on them.

If software cannot be maintained, then it will be rewritten; and that could be the last time your company invests in Go.

Constant Time

This essay is a derived from my dotGo 2019 presentation about my favourite feature in Go.

Many years ago Rob Pike remarked,

“Numbers are just numbers, you’ll never see 0x80ULL in a .go source file”.
—Rob Pike, The Go Programming Language

Beyond this pithy observation lies the fascinating world of Go’s constants. Something that is perhaps taken for granted because, as Rob noted, is Go numbers–constants–just work.
In this post I intend to show you a few things that perhaps you didn’t know about Go’s const keyword.

What’s so great about constants?

To kick things off, why are constants good? Three things spring to mind:

Immutability. Constants are one of the few ways we have in Go to express immutability to the compiler.
Clarity. Constants give us a way to extract magic numbers from our code, giving them names and semantic meaning.
Performance. The ability to express to the compiler that something will not change is key as it unlocks optimisations such as constant folding, constant propagation, branch and dead code elimination.

But these are generic use cases for constants, they apply to any language. Let’s talk about some of the properties of Go’s constants.

A Challenge

To introduce the power of Go’s constants let’s try a little challenge: declare a constant whose value is the number of bits in the natural machine word.

~~We can’t use~~ unsafe.Sizeof ~~as it is not a constant expression~~⁴. We could use a build tag and laboriously record the natural word size of each Go platform, or we could do something like this:

const uintSize = 32 << (^uint(0) >> 32 & 1)

There are many versions of this expression in Go codebases. They all work roughly the same way. If we’re on a 64 bit platform then the exclusive or of the number zero–all zero bits–is a number with all bits set, sixty four of them to be exact.

1111111111111111111111111111111111111111111111111111111111111111

If we shift that value thirty two bits to the right, we get another value with thirty two ones in it.

0000000000000000000000000000000011111111111111111111111111111111

Anding that with a number with one bit in the final position give us, the same thing, 1,

0000000000000000000000000000000011111111111111111111111111111111 & 1 = 1

Finally we shift the number thirty two one place to the right, giving us 64⁵.

32 << 1 = 64

This expression is an example of a constant expression. All of these operations happen at compile time and the result of the expression is itself a constant. If you look in the in runtime package, in particular the garbage collector, you’ll see how constant expressions are used to set up complex invariants based on the word size of the machine the code is compiled on.

So, this is a neat party trick, but most compilers will do this kind of constant folding at compile time for you. Let’s step it up a notch.

Constants are values

In Go, constants are values and each value has a type. In Go, user defined types can declare their own methods. Thus, a constant value can have a method set. If you’re surprised by this, let me show you an example that you probably use every day.

const timeout = 500 * time.Millisecond
fmt.Println("The timeout is", timeout) // 500ms

In the example the untyped literal constant 500 is multiplied by time.Millisecond, itself a constant of type time.Duration. The rule for assignments in Go are, unless otherwise declared, the type on the left hand side of the assignment operator is inferred from the type on the right.500 is an untyped constant so it is converted to a time.Duration then multiplied with the constant time.Millisecond.

Thus timeout is a constant of type time.Duration which holds the value 500000000.
Why then does fmt.Println print 500ms, not 500000000?

The answer is time.Duration has a String method. Thus any time.Duration value, even a constant, knows how to pretty print itself.

Now we know that constant values are typed, and because types can declare methods, we can derive that constant values can fulfil interfaces. In fact we just saw an example of this. fmt.Println doesn’t assert that a value has a String method, it asserts the value implements the Stringer interface.

Let’s talk a little about how we can use this property to make our Go code better, and to do that I’m going to take a brief digression into the Singleton pattern.

Singletons

I’m generally not a fan of the singleton pattern, in Go or any language. Singletons complicate testing and create unnecessary coupling between packages. I feel the singleton pattern is often used not to create a singular instance of a thing, but instead to create a place to coordinate registration. net/http.DefaultServeMux is a good example of this pattern.

package http

// DefaultServeMux is the default ServeMux used by Serve.
var DefaultServeMux = &defaultServeMux

var defaultServeMux ServeMux

There is nothing singular about http.defaultServerMux, nothing prevents you from creating another ServeMux. In fact the http package provides a helper that will create as many ServeMux‘s as you want.

// NewServeMux allocates and returns a new ServeMux.
func NewServeMux() *ServeMux { return new(ServeMux) }

http.DefaultServeMux is not a singleton. Never the less there is a case for things which are truely singletons because they can only represent a single thing. A good example of this are the file descriptors of a process; 0, 1, and 2 which represent stdin, stdout, and stderr respectively.

It doesn’t matter what names you give them, 1 is always stdout, and there can only ever be one file descriptor 1. Thus these two operations are identical:

fmt.Fprintf(os.Stdout, "Hello dotGo\n")
syscall.Write(1, []byte("Hello dotGo\n"))

So let’s look at how the os package defines Stdin, Stdout, and Stderr:

package os

var (
        Stdin  = NewFile(uintptr(syscall.Stdin), "/dev/stdin")
        Stdout = NewFile(uintptr(syscall.Stdout), "/dev/stdout")
        Stderr = NewFile(uintptr(syscall.Stderr), "/dev/stderr")
)

There are a few problems with this declaration. Firstly their type is *os.File not the respective io.Reader or io.Writer interfaces. People have long complained that this makes replacing them with alternatives problematic. However the notion of replacing these variables is precisely the point of this digression. Can you safely change the value of os.Stdout once your program is running without causing a data race?

I argue that, in the general case, you cannot. In general, if something is unsafe to do, as programmers we shouldn’t let our users think that it is safe, lest they begin to depend on that behaviour.

Could we change the definition of os.Stdout and friends so that they retain the observable behaviour of reading and writing, but remain immutable? It turns out, we can do this easily with constants.

type readfd int

func (r readfd) Read(buf []byte) (int, error) {
       return syscall.Read(int(r), buf)
}

type writefd int

func (w writefd) Write(buf []byte) (int, error) {
        return syscall.Write(int(w), buf)
}

const (
        Stdin  = readfd(0)
        Stdout = writefd(1)
        Stderr = writefd(2)
)

func main() {
        fmt.Fprintf(Stdout, "Hello world")
}

In fact this change causes only one compilation failure in the standard library.⁶

Sentinel error values

Another case of things which look like constants but really aren’t, are sentinel error values. io.EOF, sql.ErrNoRows, crypto/x509.ErrUnsupportedAlgorithm, and so on are all examples of sentinel error values. They all fall into a category of expected errors, and because they are expected, you’re expected to check for them.

To compare the error you have with the one you were expecting, you need to import the package that defines that error. Because, by definition, sentinel errors are exported public variables, any code that imports, for example, the io package could change the value of io.EOF.

package nelson

import "io"

func init() {
        io.EOF = nil // haha!
}

I’ll say that again. If I know the name of io.EOF I can import the package that declares it, which I must if I want to compare it to my error, and thus I could change io.EOF‘s value. Historically convention and a bit of dumb luck discourages people from writing code that does this, but technically there is nothing to prevent you from doing so.

Replacing io.EOF is probably going to be detected almost immediately. But replacing a less frequently used sentinel error may cause some interesting side effects:

package innocent

import "crypto/rsa"

func init() {
        rsa.ErrVerification = nil // 🤔
}

If you were hoping the race detector will spot this subterfuge, I suggest you talk to the folks writing testing frameworks who replace os.Stdout without it triggering the race detector.

Fungibility

I want to digress for a moment to talk about the most important property of constants. Constants aren’t just immutable, its not enough that we cannot overwrite their declaration,
Constants are fungible. This is a tremendously important property that doesn’t get nearly enough attention.

Fungible means identical. Money is a great example of fungibility. If you were to lend me 10 bucks, and I later pay you back, the fact that you gave me a 10 dollar note and I returned to you 10 one dollar bills, with respect to its operation as a financial instrument, is irrelevant. Things which are fungible are by definition equal and equality is a powerful property we can leverage for our programs.

var myEOF = errors.New("EOF") // io/io.go line 38
fmt.Println(myEOF == io.EOF)  // false

Putting aside the effect of malicious actors in your code base the key design challenge with sentinel errors is they behave like singletons, not constants. Even if we follow the exact procedure used by the io package to create our own EOF value, myEOF and io.EOF are not equal. myEOF and io.EOF are not fungible, they cannot be interchanged. Programs can spot the difference.

When you combine the lack of immutability, the lack of fungibility, the lack of equality, you have a set of weird behaviours stemming from the fact that sentinel error values in Go are not constant expressions. But what if they were?

Constant errors

Ideally a sentinel error value should behave as a constant. It should be immutable and fungible. Let’s recap how the built in error interface works in Go.

type error interface {
        Error() string
}

Any type with an Error() string method fulfils the error interface. This includes user defined types, it includes types derived from primitives like string, and it includes constant strings. With that background, consider this error implementation:

type Error string

func (e Error) Error() string {
        return string(e)
}

We can use this error type as a constant expression:

const err = Error("EOF")

Unlike errors.errorString, which is a struct, a compact struct literal initialiser is not a constant expression and cannot be used.

const err2 = errors.errorString{"EOF"} // doesn't compile

As constants of this Error type are not variables, they are immutable.

const err = Error("EOF")
err = Error("not EOF")   // doesn't compile

Additionally, two constant strings are always equal if their contents are equal:

const str1 = "EOF"
const str2 = "EOF"
fmt.Println(str1 == str2) // true

which means two constants of a type derived from string with the same contents are also equal.

type Error string

const err1 = Error("EOF")
const err2 = Error("EOF")
fmt.Println(err1 == err2) // true```

Said another way, equal constant Error values are the same, in the way that the literal constant 1 is the same as every other literal constant 1.

Now we have all the pieces we need to make sentinel errors, like io.EOF, and rsa.ErrVerfication, immutable, fungible, constant expressions.

% git diff
diff --git a/src/io/io.go b/src/io/io.go
index 2010770e6a..355653b4b8 100644
--- a/src/io/io.go
+++ b/src/io/io.go
@@ -35,7 +35,12 @@ var ErrShortBuffer = errors.New("short buffer")
 // If the EOF occurs unexpectedly in a structured data stream,
 // the appropriate error is either ErrUnexpectedEOF or some other error
 // giving more detail.
-var EOF = errors.New("EOF")
+const EOF = ioError("EOF")
+
+type ioError string
+
+func (e ioError) Error() string { return string(e) }

This change is probably a bit of a stretch for the Go 1 contract, but there is no reason you cannot adopt a constant error pattern for your sentinel errors in the packages that you write.

In summary

Go’s constants are powerful. If you only think of them as immutable numbers, you’re missing out. Go’s constants let us compose programs that are more correct and harder to misuse.

Today I’ve outlined three ways to use constants that are more than your typical immutable number.

Now it’s over to you, I’m excited to see where you can take these ideas.

The three Rs of remote work

I started working remotely in 2012. Since then I’ve worked for big companies and small, organisations with outstanding remote working cultures, and others that probably would have difficulty spelling the word without predictive text. I broadly classify my experiences into three tiers;

Little r remote

The first kind of remote work I call little r remote.

Your company has an office, but it’s not convenient or you don’t want to work from there. It could be the commute is too long, or its in the next town over, or perhaps a short plane flight away. Sometimes you might go into the office for a day or two a week, and should something serious arise you could join your co-workers onsite for an extended period of time.

If you often hear people say they are going to work from home to get some work done, that’s little r remote.

Big R remote

The next category I call Big R remote. Big R remote differs mainly from little r remote by the tyranny of distance. It’s not impossible to visit your co-workers in person, but it is inconvenient. Meeting face to face requires a day’s flying. Passports and boarder crossings are frequently involved. The expense and distance necessitates week long sprints and commensurate periods of jetlag recuperation.

Because of timezone differences meetings must be prearranged and periods of overlap closely guarded. Communication becomes less spontaneous and care must be taken to avoid committing to unsustainable working hours.

Gothic ℜ remote

The final category is basically Big R remote working on hard mode. Everything that was hard about Big R remote, timezone, travel schedules, public holidays, daylight savings, video call latency, cultural and language barriers is multiplied for each remote worker.

In person meetings are so rare that without a focus on written asynchronous communication progress can repeatedly stall for days, if not weeks, as miscommunication leads to disillusionment and loss of trust.

In my experience, for knowledge workers, little r remote work offers many benefits over the open office hell scape du jour. Big R remote takes a serious commitment by all parties and if you are the first employee in that category you will bare most of the cost to making Big R remote work for you.

Gothic ℜ remote working should probably be avoided unless all those involved have many years of working in that style and the employer is committed to restructuring the company as a remote first organisation. It is not possible to succeed in a Gothic ℜ remote role without a culture of written communication and asynchronous decision making mandated, and consistently enforced, by the leaders of the company.

Talk, then code

The open source projects that I contribute to follow a philosophy which I describe as talk, then code. I think this is generally a good way to develop software and I want to spend a little time talking about the benefits of this methodology.

Avoiding hurt feelings

The most important reason for discussing the change you want to make is it avoids hurt feelings. Often I see a contributor work hard in isolation on a pull request only to find their work is rejected. This can be for a bunch of reasons; the PR is too large, the PR doesn’t follow the local style, the PR fixes an issue which wasn’t important to the project or was recently fixed indirectly, and many more.

The underlying cause of all these issues is a lack of communication. The goal of the talk, then code philosophy is not to impede or frustrate, but to ensure that a feature lands correctly the first time, without incurring significant maintenance debt, and neither the author of the change, or the reviewer, has to carry the emotional burden of dealing with hurt feelings when a change appears out of the blue with an implicit “well, I’ve done the work, all you have to do is merge it, right?”

What does discussion look like?

Every new feature or bug fix should be discussed with the maintainer(s) of the project before work commences. It’s fine to experiment privately, but do not send a change without discussing it first.

The definition of talk for simple changes can be as little as a design sketch in a GitHub issue. If your PR fixes a bug, you should link to the bug it fixes. If there isn’t one, you should raise a bug and wait for the maintainers to acknowledge it before sending a PR. This might seem a little backward–who wouldn’t want a bug fixed–but consider the bug could be a misunderstanding in how the software works or it could be a symptom of a larger problem that needs further investigation.

For more complicated changes, especially feature requests, I recommend that a design document be circulated and agreed upon before sending code. This doesn’t have to be a full blown document, a sketch in an issue may be sufficient, but the key is to reach agreement using words, before locking it in stone with code.

In all cases you shouldn’t proceed to send code until there is a positive agreement from the maintainer that the approach is one they are happy with. A pull request is for life, not just for Christmas.

Code review, not design by committee

A code review is not the place for arguments about design. This is for two reasons. First, most code review tools are not suitable for long comment threads, GitHub’s PR interface is very bad at this, Gerrit is better, but few have a team of admins to maintain a Gerrit instance. More importantly, disagreements at the code review stage suggests there wasn’t agreement on how the change should be implemented.

Talk about what you want to code, then code what you talked about. Please don’t do it the other way around.

The office coffee model of concurrent garbage collection

Garbage collection is a field with its own terminology. Concepts like like mutators, card marking, and write barriers create a hurdle to understanding how garbage collectors work. Here’s an analogy to explain the operations of a concurrent garbage collector using everyday items found in the workplace.

Before we discuss the operation of concurrent garbage collection, let’s introduce the dramatis personae. In offices around the world you’ll find one of these:

In the workplace coffee is a natural resource. Employees visit the break room and fill their cups as required. That is, until the point someone goes to fill their cup only to discover the pot is empty!

Immediately the office is thrown into chaos. Meeting are called. Investigations are held. The perpetrator who took the last cup without refilling the machine is found and reprimanded. Despite many passive aggressive notes the situation keeps happening, thus a committee is formed to decide if a larger coffee pot should be requisitioned. Once the coffee maker is again full office productivity slowly returns to normal.

This is the model of stop the world garbage collection. The various parts of your program proceed through their day consuming memory, or in our analogy coffee, without a care about the next allocation that needs to be made. Eventually one unlucky attempt to allocate memory is made only to find the heap, or the coffee pot, exhausted, triggering a stop the world garbage collection.

Down the road at a more enlightened workplace, management have adopted a different strategy for mitigating their break room’s coffee problems. Their policy is simple: if the pot is more than half full, fill your cup and be on your way. However, if the pot is less than half full, before filling your cup, you must add a little coffee and a little water to the top of the machine. In this way, by the time the next person arrives for their re-up, the level in the pot will hopefully have risen higher than when the first person found it.

This policy does come at a cost to office productivity. Rather than filling their cup and hoping for the best, each worker may, depending on the aggregate level of consumption in the office, have to spend a little time refilling the percolator and topping up the water. However, this is time spent by a person who was already heading to the break room. It costs a few extra minutes to maintain the coffee machine, but does not impact their officemates who aren’t in need of caffeination. If several people take a break at the same time, they will all find the level in the pot below the half way mark and all proceed to top up the coffee maker–the more consumption, the greater the rate the machine will be refilled, although this takes a little longer as the break room becomes congested.

This is the model of concurrent garbage collection as practiced by the Go runtime (and probably other language runtimes with concurrent collectors). Rather than each heap allocation proceeding blindly until the heap is exhausted, leading to a long stop the world pause, concurrent collection algorithms spread the work of walking the heap to find memory which is no longer reachable over the parts of the program allocating memory. In this way the parts of the program which allocate memory each pay a small cost–in terms of latency–for those allocations rather than the whole program being forced to halt when the heap is exhausted.

Lastly, in keeping with the office coffee model, if the rate of coffee consumption in the office is so high that management discovers that their staff are always in the break room trying desperately to refill the coffee machine, it’s time to invest in a machine with a bigger pot–or in garbage collection terms, grow the heap.

Containers versus Operating Systems

What does a distro provide?

The most popular docker base container image is either busybox, or scratch. This is driven by a movement that is equal parts puritanical and pragmatic. The puritan asks “Why do I need to run init(1) just to run my process?” The pragmatist asks “Why do I need a 700 meg base image to deploy my application?” And both, seeking immutable deployment units ask “Is it a good idea that I can ssh into my container?” But let’s step back for a second and look at the history of how we got to the point where questions like this are even a thing.

In the very beginnings, there were no operating systems. Programs ran one at a time with the whole machine at their disposal. While efficient, this created a problem for the keepers of these large and expensive machines. To maximise their investment, the time between one program finishing and another starting must be kept to an absolute minimum; hence monitor programs and batch processing was born.

Monitors started as barely more than watchdog timers. They knew how to load the next program off tape, then set an alarm if the program ran too long. As time went on, monitors became job control–quasi single user operating systems where the operators could schedule batch jobs with slightly more finesse than the previous model of concatenating them in the card reader.⁷

In response to the limitations of batch processing, and with the help of increased computing resources, interactive computing was born. Interactive computing allowing multiple users to interact with the computer directly, time slicing, or time sharing, the resources between users to present the illusion of each program having a whole computer to itself.

“The UNIX kernel is an I/O multiplexer more than a complete operating system. This is as it should be.”
Ken Thompson, BSTJ, 1978

interactive computing in raw terms was less efficient than batch, however it recognised that the potential to deliver programs faster outweighed a less than optimal utilisation of the processor; a fact borne out by the realisation that programming time was not benefiting from the same economies of scale that Moore’s law was delivering for hardware. Job control evolved to became what we know as the kernel, a supervisor program which sits above the raw hardware, portioning it out and mediating access to hardware devices.

With interactive users came the shell, a place to start programs, and return once they completed. The shell presented an environment, a virtual work space to organise your work, communicate with others, and of course customise. Customise with programs you wrote, programs you got from others, and programs that you collaborated with your coworkers on.

Interactive computing, multi user systems and then networking gave birth to the first wave of client/server computing–the server was your world, your terminal was just a pane of glass to interact with it. Thus begat userspace, a crowded bazaar of programs, written in many languages, traded, sold, swapped and sometimes stolen. A great inter breeding between the UNIX vendors produced a whole far larger than the sum of its parts.

Each server was an island, lovingly tended by operators, living for years, slowly patched and upgraded, becoming ever more unique through the tide of software updates and personnel changes.

Skip forward to Linux and the GNU generation, a kernel by itself does not serve the market, it needs a user space of tools to attract and nurture users accustomed to the full interactive environment.

But that software was hard, and messy, and spread across a million ftp, tucows, sourceforge, and cvs servers. Their installation procedures are each unique, their dependencies are unknown or unmanaged–in short, a job for an expert. Thus distributions became experts at packaging open source software to work together as a coherent interactive userspace story.

Container sprawl

We used to just have lots of servers, drawing power, running old software, old operating systems, hidden under people’s desks, and sometimes left running behind dry wall. Along came virtualisation to sweep away all the old, slow, flaky, out of warranty hardware. Yet the software remained, and multiplied.

Vmsprawl, it was called. Now free from a purchase order and a network switch port, virtual machines could spawn faster than rabbits. But their lifespan would be much longer.

Back when physical hardware existed, you could put labels on things, assign them to operators, have someone to blame, or at least ask if the operating system was up to date, but virtual machines became ephemeral, multitudinous, and increasingly, redundant,

Now that a virtual machines’ virtual bulk has given way to containers, what does that mean for the security and patching landscape? Surely it’s as bad, if not worse. Containers can multiply even faster than VMs and at such little cost compared to their bloated cousins that the problem could be magnified many times over. Or will it?

The problem is maintaining the software you didn’t write. Before containers that was everything between you and the hardware; obviously a kernel, that is inescapable, but the much larger surface area (in recent years ballooning to a DVD’s girth) was the userland. The gigabytes of software that existed to haul the machine onto the network, initialise its device drivers, scrub its /tmp partition, and so on.

But what if there was no userland? What if the network was handled for you, truly virtualised at layer 3, not layer 1. Your volumes were always mounted and your local storage was fleeting, so nothing to scrub. What would be the purpose of all those decades of lovingly crafted userland cruft?

If interactive software goes unused, was it ever installed at all?

Immutable images

Netflix tells us that immutable images are the path to enlightenment. Built it once, deploy it often. If there is a problem, an update, a software change, a patch, or a kernel fix, then build another image and roll it out. Never change something in place. This mirrors the trend towards immutability writ large by the functional programming tidal wave.

While Netflix use virtual machines, and so need software to configure their (simulated) hardware and software to plumb their (simulated) network interfaces to get to the point of being able to launch the application, containers leave all these concerns to the host. A container is spawned with any block devices or network interfaces required already mounted or plumbed a priori.

So, if you remove the requirement, and increasingly, the ability, to change the contents of the running image, and you remove the requirement to prepare the environment before starting the application, because the container is created with all its facilities already prepared, why do you need a userland inside a container?

Debugging? Possibly.

Today there are many of my generation who would feel helpless without being about to ssh to a host, run their favourite (and disparate) set of inspection tools. But a container is just a process inside a larger host operating system, so do you diagnosis there instead. Unlike virtual machines, these are not black boxes, the host operating system has far more capability to inspect and diagnose a guest than the guest itself–so leave your diagnosis tools on the host. And your ssh daemon, for good measure.

Updates, updates. Updates!

Why do we have operating system distros? In a word, outsourcing.

Sure, every admin could subscribe to the mailing lists of all the software packages installed on the servers they maintain (you do know all the software installed on the machines you are responsible for, right?) and then download, test, certify, upgrade the software promptly after being notified. Sound’s simple. Any admin worth hiring should be able to do this.

Sure, assuming you can find an admin who wants to do this grunt work, and that they can keep up with the workload, and that they can service more than a few machines before they’re hopelessly chasing their tails.

No, of course not, we outsource this to operating system vendor. In return for using outdated versions of software, distros will centralise the triage, testing and preparation of upgrades and patches.

This is the reason that a distro and its package management tool of choice are synonymous. Without a tool to automate the dissemination, installation and upgrade of packaged software, distro vendors would have no value. And without someone to marshal unique snowflake open source software into a unified form, package management software would have no value.

No wonder that the revenue model for all open source distro vendors centers around tooling that automates the distribution of update packages.

The last laugh

Ironically, the last laugh in this tale may be the closed source operating system vendors. It was Linux and open source that destroyed the proprietary UNIX market after the first dot com crash.

Linux rode Moore’s law to become the server operating system for the internet, and made kings of the operating system distributors. But it’s Linux that is driving containers, at least in their current form, and Linux containers, or more specifically a program that communicates directly with the kernel syscall api inside a specially prepared process namespace, is defining the new normal for applications.

Containers are eating the very Linux distribution market which enabled their creation.

OSX and Windows may be relegated to second class citizens–the clients in the client/server or client/container equation–but at least nobody is asking difficult questions about the role of their userspace.

Whither distros

What is the future of operating system distributions? Their services, while mature, scalable, well integrated, and expertly staffed, will unfortunately be priced out of the market. History tells us this.

In the first dot com bust, companies retreated from expensive proprietary software, not because it wasn’t good, not because it wasn’t extensible or changeable, but because it was too expensive. With no money coming in, thousands of dollars of opex walking out the door in licence fees was unsustainable.

The companies that survived the crash, or were born in its wreckage, chose open source software. Software that was arguably less mature, less refined, at the time, but with a price tag that was much more approachable. They kept their investors money in the bank, rode the wave of hardware improvements, and by pulling together in a million loosely organised software projects created a free (as in free puppy) platform to build their services on top–trading opex for some risk that they may have no-one to blame if their free software balloon sprang a leak.

Now, it is the Linux distributors who are chasing the per seat or per cpu licence fees. Offering scaled out professional services in the form of a stream of software updates and patches, well tested and well integrated.

But, with the exception of the kernel–which is actually provided by the host operating system, not the container–all those patches and updates are for software that is not used by the container, and in the case of our opening examples, busybox and scratch. not present. The temptation to go it alone, cut out the distro vendors, backed by the savings in licence fees is overwhelming.

What can distros do?

What would you do if you woke up one day to find that you owned the best butchers shop in a town that had decided to become vegetarian en mass?