Author Archives: Dave Cheney

About Dave Cheney

A chaotic neutral System Administrator with super cow powers. My weapons are: * fear * cynicism * an almost fanatical devotion to the command line twitter.com/davecheney

Talk, then code

The open source projects that I contribute to follow a philosophy which I describe as talk, then code. I think this is generally a good way to develop software and I want to spend a little time talking about the benefits of this methodology.

Avoiding hurt feelings

The most important reason for discussing the change you want to make is it avoids hurt feelings. Often I see a contributor work hard in isolation on a pull request only to find their work is rejected. This can be for a bunch of reasons; the PR is too large, the PR doesn’t follow the local style, the PR fixes an issue which wasn’t important to the project or was recently fixed indirectly, and many more.

The underlying cause of all these issues is a lack of communication. The goal of the talk, then code philosophy is not to impede or frustrate, but to ensure that a feature lands correctly the first time, without incurring significant maintenance debt, and neither the author of the change, or the reviewer, has to carry the emotional burden of dealing with hurt feelings when a change appears out of the blue with an implicit “well, I’ve done the work, all you have to do is merge it, right?”

What does discussion look like?

Every new feature or bug fix should be discussed with the maintainer(s) of the project before work commences. It’s fine to experiment privately, but do not send a change without discussing it first.

The definition of talk for simple changes can be as little as a design sketch in a GitHub issue. If your PR fixes a bug, you should link to the bug it fixes. If there isn’t one, you should raise a bug and wait for the maintainers to acknowledge it before sending a PR. This might seem a little backward–who wouldn’t want a bug fixed–but consider the bug could be a misunderstanding in how the software works or it could be a symptom of a larger problem that needs further investigation.

For more complicated changes, especially feature requests, I recommend that a design document be circulated and agreed upon before sending code. This doesn’t have to be a full blown document, a sketch in an issue may be sufficient, but the key is to reach agreement using words, before locking it in stone with code.

In all cases you shouldn’t proceed to send code until there is a positive agreement from the maintainer that the approach is one they are happy with. A pull request is for life, not just for Christmas.

Code review, not design by committee

A code review is not the place for arguments about design. This is for two reasons. First, most code review tools are not suitable for long comment threads, GitHub’s PR interface is very bad at this, Gerrit is better, but few have a team of admins to maintain a Gerrit instance. More importantly, disagreements at the code review stage suggests there wasn’t agreement on how the change should be implemented.


Talk about what you want to code, then code what you talked about. Please don’t do it the other way around.

You shouldn’t name your variables after their types for the same reason you wouldn’t name your pets “dog” or “cat”

The name of a variable should describe its contents, not the type of the contents. Consider this example:

var usersMap map[string]*User

What are some good properties of this declaration? We can see that it’s a map, and it has something to do with the *User type, so that’s probably good. But usersMapis a map and Go, being a statically typed language, won’t let us accidentally use a map where a different type is required, so the Map suffix as a safety precaution is redundant.

Now, consider what happens if we declare other variables using this pattern:

var (
companiesMap map[string]*Company
productsMap map[string]*Products
)

Now we have three map type variables in scope, usersMapcompaniesMap, and productsMap, all mapping strings to different struct types. We know they are maps, and we also know that their declarations prevent us from using one in place of another—​the compiler will throw an error if we try to use companiesMap where the code is expecting a map[string]*User. In this situation it’s clear that the Map suffix does not improve the clarity of the code, its just extra boilerplate to type.

My suggestion is avoid any suffix that resembles the type of the variable. Said another way, if users isn’t descriptive enough, then usersMap won’t be either.

This advice also applies to function parameters. For example:

type Config struct {
//
}

func WriteConfig(w io.Writer, config *Config)

Naming the *Config parameter config is redundant. We know it’s a pointer to a Config, it says so right there in the declaration. Instead consider if conf will do, or maybe just c if the lifetime of the variable is short enough.

This advice is more than just a desire for brevity. If there is more that one *Config in scope at any one time, calling them config1 and config2 is less descriptive than calling them original and updated . The latter are less likely to be accidentally transposed—something the compiler won’t catch—while the former differ only in a one character suffix.

Finally, don’t let package names steal good variable names. The name of an imported identifier includes its package name. For example the Context type in the context package will be known as context.Context when imported into another package . This makes it impossible to use context as a variable or type, unless of course you rename the import, but that’s throwing good after bad. This is why the local declaration for context.Context types is traditionally ctx. eg.

func WriteLog(ctx context.Context, message string)

A variable’s name should be independent of its type. You shouldn’t name your variables after their types for the same reason you wouldn’t name your pets “dog” or “cat”. You shouldn’t include the name of your type in the name of your variable for the same reason.

Eliminate error handling by eliminating errors

Go 2 aims to improve the overhead of error handling, but do you know what is better than an improved syntax for handling errors? Not needing to handle errors at all. Now, I’m not saying “delete your error handling code”, instead I’m suggesting changing your code so you don’t have as many errors to handle.

This article draws inspiration from a chapter in John Ousterhout’s, A philosophy of Software Design, “Define Errors Out of Existence”. I’m going to try to apply his advice to Go.


Here’s a function to count the number of lines in a file,

func CountLines(r io.Reader) (int, error) {
var (
br = bufio.NewReader(r)
lines int
err error
)

for {
_, err = br.ReadString('\n')
lines++
if err != nil {
break
}
}

if err != io.EOF {
return 0, err
}
return lines, nil
}

We construct a bufio.Reader, then sit in a loop calling the ReadString method, incrementing a counter until we reach the end of the file, then we return the number of lines read. That’s the code we wanted to write, instead CountLines is made more complicated by its error handling. For example, there is this strange construction:

                _, err = br.ReadString('\n')
lines++
if err != nil {
break
}

We increment the count of lines before checking the error—​that looks odd. The reason we have to write it this way is ReadString will return an error if it encounters an end-of-file—io.EOF—before hitting a newline character. This can happen if there is no trailing newline.

To address this problem, we rearrange the logic to increment the line count, then see if we need to exit the loop.1

But we’re not done checking errors yet. ReadString will return io.EOF when it hits the end of the file. This is expected, ReadString needs some way of saying stop, there is nothing more to read. So before we return the error to the caller of CountLine, we need to check if the error was not io.EOF, and in that case propagate it up, otherwise we return nil to say that everything worked fine. This is why the final line of the function is not simply

return lines, err

I think this is a good example of Russ Cox’s observation that error handling can obscure the operation of the function. Let’s look at an improved version.

func CountLines(r io.Reader) (int, error) {
sc := bufio.NewScanner(r)
lines := 0

for sc.Scan() {
lines++
}

return lines, sc.Err()
}

This improved version switches from using bufio.Reader to bufio.Scanner. Under the hood bufio.Scanner uses bufio.Reader adding a layer of abstraction which helps remove the error handling which obscured the operation of our previous version of CountLines 2

The method sc.Scan() returns true if the scanner has matched a line of text and has not encountered an error. So, the body of our for loop will be called only when there is a line of text in the scanner’s buffer. This means our revised CountLines correctly handles the case where there is no trailing newline, It also correctly handles the case where the file is empty.

Secondly, as sc.Scan returns false once an error is encountered, our for loop will exit when the end-of-file is reached or an error is encountered. The bufio.Scanner type memoises the first error it encounters and we recover that error once we’ve exited the loop using the sc.Err() method.

Lastly, buffo.Scanner takes care of handling io.EOF and will convert it to a nil if the end of file was reached without encountering another error.


My second example is inspired by Rob Pikes’ Errors are values blog post.

When dealing with opening, writing and closing files, the error handling is present but not overwhelming as, the operations can be encapsulated in helpers like ioutil.ReadFile and ioutil.WriteFile. However, when dealing with low level network protocols it often becomes necessary to build the response directly using I/O primitives, thus the error handling can become repetitive. Consider this fragment of a HTTP server which is constructing a HTTP/1.1 response.

type Header struct {
Key, Value string
}

type Status struct {
Code int
Reason string
}

func WriteResponse(w io.Writer, st Status, headers []Header, body io.Reader) error {
_, err := fmt.Fprintf(w, "HTTP/1.1 %d %s\r\n", st.Code, st.Reason)
if err != nil {
return err
}

for _, h := range headers {
_, err := fmt.Fprintf(w, "%s: %s\r\n", h.Key, h.Value)
if err != nil {
return err
}
}

if _, err := fmt.Fprint(w, "\r\n"); err != nil {
return err
}

_, err = io.Copy(w, body)
return err
}

First we construct the status line using fmt.Fprintf, and check the error. Then for each header we write the header key and value, checking the error each time. Lastly we terminate the header section with an additional \r\n, check the error, and copy the response body to the client. Finally, although we don’t need to check the error from io.Copy, we do need to translate it from the two return value form that io.Copy returns into the single return value that WriteResponse expects.

Not only is this a lot of repetitive work, each operation—fundamentally writing bytes to an io.Writer—has a different form of error handling. But we can make it easier on ourselves by introducing a small wrapper type.

type errWriter struct {
io.Writer
err error
}

func (e *errWriter) Write(buf []byte) (int, error) {
if e.err != nil {
return 0, e.err
}

var n int
n, e.err = e.Writer.Write(buf)
return n, nil
}

errWriter fulfils the io.Writer contract so it can be used to wrap an existing io.WritererrWriter passes writes through to its underlying writer until an error is detected. From that point on, it discards any writes and returns the previous error.

func WriteResponse(w io.Writer, st Status, headers []Header, body io.Reader) error {
ew := &errWriter{Writer: w}
fmt.Fprintf(ew, "HTTP/1.1 %d %s\r\n", st.Code, st.Reason)

for _, h := range headers {
fmt.Fprintf(ew, "%s: %s\r\n", h.Key, h.Value)
}

fmt.Fprint(ew, "\r\n")
io.Copy(ew, body)

return ew.err
}

Applying errWriter to WriteResponse dramatically improves the clarity of the code. Each of the operations no longer needs to bracket itself with an error check. Reporting the error is moved to the end of the function by inspecting the ew.err field, avoiding the annoying translation from io.Copy’s return values.


When you find yourself faced with overbearing error handling, try to extract some of the operations into a helper type.

Avoid package names like base, util, or common

Writing a good Go package starts with its name. Think of your package’s name as an elevator pitch, you have to describe what it does using just one word.

A common cause of poor package names are utility packages. These are packages where helpers and utility code congeal. These packages contain an assortment of unrelated functions, as such their utility is hard to describe in terms of what the package provides. This often leads to a package’s name being derived from what the package contains—utilities.

Package names like utils or helpers are commonly found in projects which have developed deep package hierarchies and want to share helper functions without introducing import loops. Extracting utility functions to new package breaks the import loop, but as the package stems from a design problem in the project, its name doesn’t reflect its purpose, only its function in breaking the import cycle.

[A little] duplication is far cheaper than the wrong abstraction.

Sandy Metz

My recommendation to improve the name of utils or helpers packages is to analyse where they are imported and move the relevant functions into the calling package. Even if this results in some code duplication this is preferable to introducing an import dependency between two packages. In the case where utility functions are used in many places, prefer multiple packages, each focused on a single aspect with a correspondingly descriptive name.

Packages with names like base or common are often found when functionality common to two or more related facilities, for example common types between a client and server or a server and its mock, has been refactored into a separate package. Instead the solution is to reduce the number of packages by combining client, server, and common code into a single package named after the facility the package provides.

For example, the net/http package does not have client and server packages, instead it has client.go and server.go files, each holding their respective types. transport.go holds for the common message transport code used by both HTTP clients and servers.

Name your packages after what they provide, not what they contain.

Internets of Interest #11: Yesterday’s Computer of Tomorrow: The Xerox Alto

How did personal computing start? Many credit Apple and IBM for this radical shift, but in 1973, years before the Apple II and IBM PC, Xerox built the Alto, a computer its makers thought could become the “computer of tomorrow.” The Alto embodied for the first time many of the defining features of personal computing that seem natural now, over forty years later: individual use; interactive, graphical displays; networking; graphical interfaces with overlapping windows and icons; WYSIWYG word processing; browsers; email; and the list goes on. The birthplace of this pioneering machine was Xerox’s Palo Alto Research Center (PARC), which assembled a remarkable collection of computer scientists and engineers who made real their idea of “distributed personal computing.”

The office coffee model of concurrent garbage collection

Garbage collection is a field with its own terminology. Concepts like like mutators, card marking, and write barriers create a hurdle to understanding how garbage collectors work. Here’s an analogy to explain the operations of a concurrent garbage collector using everyday items found in the workplace.

Before we discuss the operation of concurrent garbage collection, let’s introduce the dramatis personae. In offices around the world you’ll find one of these:

In the workplace coffee is a natural resource. Employees visit the break room and fill their cups as required. That is, until the point someone goes to fill their cup only to discover the pot is empty!

Immediately the office is thrown into chaos. Meeting are called. Investigations are held. The perpetrator who took the last cup without refilling the machine is found and reprimanded. Despite many passive aggressive notes the situation keeps happening, thus a committee is formed to decide if a larger coffee pot should be requisitioned. Once the coffee maker is again full office productivity slowly returns to normal.

This is the model of stop the world garbage collection. The various parts of your program proceed through their day consuming memory, or in our analogy coffee, without a care about the next allocation that needs to be made. Eventually one unlucky attempt to allocate memory is made only to find the heap, or the coffee pot, exhausted, triggering a stop the world garbage collection.


Down the road at a more enlightened workplace, management have adopted a different strategy for mitigating their break room’s coffee problems. Their policy is simple: if the pot is more than half full, fill your cup and be on your way. However, if the pot is less than half full, before filling your cup, you must add a little coffee and a little water to the top of the machine. In this way, by the time the next person arrives for their re-up, the level in the pot will hopefully have risen higher than when the first person found it.

This policy does come at a cost to office productivity. Rather than filling their cup and hoping for the best, each worker may, depending on the aggregate level of consumption in the office, have to spend a little time refilling the percolator and topping up the water. However, this is time spent by a person who was already heading to the break room. It costs a few extra minutes to maintain the coffee machine, but does not impact their officemates who aren’t in need of caffeination. If several people take a break at the same time, they will all find the level in the pot below the half way mark and all proceed to top up the coffee maker–the more consumption, the greater the rate the machine will be refilled, although this takes a little longer as the break room becomes congested.

This is the model of concurrent garbage collection as practiced by the Go runtime (and probably other language runtimes with concurrent collectors). Rather than each heap allocation proceeding blindly until the heap is exhausted, leading to a long stop the world pause, concurrent collection algorithms spread the work of walking the heap to find memory which is no longer reachable over the parts of the program allocating memory. In this way the parts of the program which allocate memory each pay a small cost–in terms of latency–for those allocations rather than the whole program being forced to halt when the heap is exhausted.

Lastly, in keeping with the office coffee model, if the rate of coffee consumption in the office is so high that management discovers that their staff are always in the break room trying desperately to refill the coffee machine, it’s time to invest in a machine with a bigger pot–or in garbage collection terms, grow the heap.

Internets of Interest #8: Todd Fernandez on the manufacturing of modern semiconductors

Every since I started giving my High Performance Go workshop I’ve been fascinated with the physics of semiconductors. This presentation from Hope Conference ’09 doesn’t cover the latest EUV shenanigans, but does an excellent job of detailing the difficulties in semiconductor manufacturing ten years ago. The problems have only become more complicated as semiconductor fabs attempt to push feature sizes into the single digits.