Author Archives: Dave Cheney

About Dave Cheney

A chaotic neutral System Administrator with super cow powers. My weapons are: * fear * cynicism * an almost fanatical devotion to the command line twitter.com/davecheney

Simulating minicomputers on microcontrollers

This is a short blog post to reference the slides from my builderscon 2016 presentation.

I had a great time at buildercon, the talks were varied and engaging from a wide selection of Japanese makers. I’m grateful to the builderscon organisers for accepting my talk and inviting me to present at the inaugural builderscon conference in Tokyo, Japan.

Slides:

Further reading:

Go 1.8 toolchain improvements

This is a progress report on the Go toolchain improvements during the 1.8 development cycle.

Now we’re well into November, the 1.8 development window is closing fast on the few remaining in fly change lists, with the remainder being told to wait until the 1.9 development season opens when Go 1.8 ships in February 2017.

For more in this series, read my previous post on the Go 1.8 toolchain improvements from September, and my post on the improvements to the Go toolchain in the 1.7 development cycle.

Faster compilation

Since Go 1.5, released in August 2015, compile times have been significantly slower than Go 1.4. Work on addressing this slow down started in ernest in the Go 1.7 cycle, and is still ongoing.

Robert Griesemer and Matthew Dempsky’s worked on rewriting the parser to make it faster and remove many of the package level variables inherited from the previous yacc based parser. This parser produces a new abstract syntax tree while the rest of the compiler expects the previous yacc syntax tree. For 1.8 the new parser must transform its output into the previous syntax tree for consumption by the rest of the compiler. Even with this extra transformation step the new parser is no slower than the previous version and plans are being made to remove this transformation requirement in Go 1.9.

Compile time for full build relative to Go 1.4.3

Compile time for full build relative to Go 1.4.3

The take away is Go 1.8 is on target to improve compile times by an average of 15% over Go 1.7. Compared to the 3-5% improvements reported two months prior, it’s nice to know that there is still blood in this stone.

Note: The benchmark scripts for jujud, kube-controller-manager, and gogs are online. Please try them yourself and report your findings.

Code generation improvements

The big feature of the previous 1.7 cycle was the new SSA backend for 64 bit Intel. In Go 1.8 the SSA backend has been rolled out to all the other architectures that Go supports and the old backend code has been deleted.

amd64, by virtue of being the most popular production architecture, has always been the fastest. As I reported a few months ago, the results comparing Go 1.8 to Go 1.7 on Intel architectures show middling improvement driven equally by improvements to code generation, escape analysis improvements, and optimisations to the std library.

name                     old time/op    new time/op    delta
BinaryTree17-4              3.04s ± 1%     3.03s ± 0%     ~     (p=0.222 n=5+5)
Fannkuch11-4                3.27s ± 0%     3.39s ± 1%   +3.74%  (p=0.008 n=5+5)
FmtFprintfEmpty-4          60.0ns ± 3%    58.3ns ± 1%   -2.70%  (p=0.008 n=5+5)
FmtFprintfString-4          177ns ± 2%     164ns ± 2%   -7.47%  (p=0.008 n=5+5)
FmtFprintfInt-4             169ns ± 2%     157ns ± 1%   -7.22%  (p=0.008 n=5+5)
FmtFprintfIntInt-4          264ns ± 1%     243ns ± 1%   -8.10%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-4     254ns ± 2%     244ns ± 1%   -4.02%  (p=0.008 n=5+5)
FmtFprintfFloat-4           357ns ± 1%     348ns ± 2%   -2.35%  (p=0.032 n=5+5)
FmtManyArgs-4              1.10µs ± 1%    0.97µs ± 1%  -11.03%  (p=0.008 n=5+5)
GobDecode-4                9.85ms ± 1%    9.31ms ± 1%   -5.51%  (p=0.008 n=5+5)
GobEncode-4                8.75ms ± 1%    8.17ms ± 1%   -6.67%  (p=0.008 n=5+5)
Gzip-4                      282ms ± 0%     289ms ± 1%   +2.32%  (p=0.008 n=5+5)
Gunzip-4                   50.9ms ± 1%    51.7ms ± 0%   +1.67%  (p=0.008 n=5+5)
HTTPClientServer-4          195µs ± 1%     196µs ± 1%     ~     (p=0.095 n=5+5)
JSONEncode-4               21.6ms ± 6%    19.8ms ± 3%   -8.37%  (p=0.008 n=5+5)
JSONDecode-4               70.2ms ± 3%    71.0ms ± 1%     ~     (p=0.310 n=5+5)
Mandelbrot200-4            5.20ms ± 0%    4.73ms ± 1%   -9.05%  (p=0.008 n=5+5)
GoParse-4                  4.38ms ± 3%    4.28ms ± 2%     ~     (p=0.056 n=5+5)
RegexpMatchEasy0_32-4      96.7ns ± 2%    98.1ns ± 0%     ~     (p=0.127 n=5+5)
RegexpMatchEasy0_1K-4       311ns ± 1%     313ns ± 0%     ~     (p=0.214 n=5+5)
RegexpMatchEasy1_32-4      97.9ns ± 2%    89.8ns ± 2%   -8.33%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K-4       519ns ± 0%     510ns ± 2%   -1.70%  (p=0.040 n=5+5)
RegexpMatchMedium_32-4      158ns ± 2%     146ns ± 0%   -7.71%  (p=0.016 n=5+4)
RegexpMatchMedium_1K-4     46.3µs ± 1%    47.8µs ± 2%   +3.12%  (p=0.008 n=5+5)
RegexpMatchHard_32-4       2.53µs ± 3%    2.46µs ± 0%   -2.91%  (p=0.008 n=5+5)
RegexpMatchHard_1K-4       76.1µs ± 0%    74.5µs ± 2%   -2.12%  (p=0.008 n=5+5)
Revcomp-4                   563ms ± 2%     531ms ± 1%   -5.78%  (p=0.008 n=5+5)
Template-4                 86.7ms ± 1%    82.2ms ± 1%   -5.16%  (p=0.008 n=5+5)
TimeParse-4                 433ns ± 3%     399ns ± 4%   -7.90%  (p=0.008 n=5+5)
TimeFormat-4                467ns ± 2%     430ns ± 1%   -7.76%  (p=0.008 n=5+5)

name                     old speed      new speed      delta
GobDecode-4              77.9MB/s ± 1%  82.5MB/s ± 1%   +5.84%  (p=0.008 n=5+5)
GobEncode-4              87.7MB/s ± 1%  94.0MB/s ± 1%   +7.15%  (p=0.008 n=5+5)
Gzip-4                   68.8MB/s ± 0%  67.2MB/s ± 1%   -2.27%  (p=0.008 n=5+5)
Gunzip-4                  381MB/s ± 1%   375MB/s ± 0%   -1.65%  (p=0.008 n=5+5)
JSONEncode-4             89.9MB/s ± 5%  98.1MB/s ± 3%   +9.11%  (p=0.008 n=5+5)
JSONDecode-4             27.6MB/s ± 3%  27.3MB/s ± 1%     ~     (p=0.310 n=5+5)
GoParse-4                13.2MB/s ± 3%  13.5MB/s ± 2%     ~     (p=0.056 n=5+5)
RegexpMatchEasy0_32-4     331MB/s ± 2%   326MB/s ± 0%     ~     (p=0.151 n=5+5)
RegexpMatchEasy0_1K-4    3.29GB/s ± 1%  3.27GB/s ± 0%     ~     (p=0.222 n=5+5)
RegexpMatchEasy1_32-4     327MB/s ± 2%   357MB/s ± 2%   +9.20%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K-4    1.97GB/s ± 0%  2.01GB/s ± 2%   +1.76%  (p=0.032 n=5+5)
RegexpMatchMedium_32-4   6.31MB/s ± 2%  6.83MB/s ± 1%   +8.31%  (p=0.008 n=5+5)
RegexpMatchMedium_1K-4   22.1MB/s ± 1%  21.4MB/s ± 2%   -3.01%  (p=0.008 n=5+5)
RegexpMatchHard_32-4     12.6MB/s ± 3%  13.0MB/s ± 0%   +2.98%  (p=0.008 n=5+5)
RegexpMatchHard_1K-4     13.4MB/s ± 0%  13.7MB/s ± 2%   +2.19%  (p=0.008 n=5+5)
Revcomp-4                 451MB/s ± 2%   479MB/s ± 1%   +6.12%  (p=0.008 n=5+5)
Template-4               22.4MB/s ± 1%  23.6MB/s ± 1%   +5.43%  (p=0.008 n=5+5)

The big improvements from the switch to the SSA backend show up on non intel architectures. Here are the results for Arm64:

name                     old time/op    new time/op     delta
BinaryTree17-8              10.6s ± 0%       8.1s ± 1%  -23.62%  (p=0.016 n=4+5)
Fannkuch11-8                9.19s ± 0%      5.95s ± 0%  -35.27%  (p=0.008 n=5+5)
FmtFprintfEmpty-8           136ns ± 0%      118ns ± 1%  -13.53%  (p=0.008 n=5+5)
FmtFprintfString-8          472ns ± 1%      331ns ± 1%  -29.82%  (p=0.008 n=5+5)
FmtFprintfInt-8             388ns ± 3%      273ns ± 0%  -29.61%  (p=0.008 n=5+5)
FmtFprintfIntInt-8          640ns ± 2%      438ns ± 0%  -31.61%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8     580ns ± 0%      423ns ± 0%  -27.09%  (p=0.008 n=5+5)
FmtFprintfFloat-8           823ns ± 0%      613ns ± 1%  -25.57%  (p=0.008 n=5+5)
FmtManyArgs-8              2.69µs ± 0%     1.96µs ± 0%  -27.12%  (p=0.016 n=4+5)
GobDecode-8                24.4ms ± 0%     17.3ms ± 0%  -28.88%  (p=0.008 n=5+5)
GobEncode-8                18.6ms ± 0%     15.1ms ± 1%  -18.65%  (p=0.008 n=5+5)
Gzip-8                      1.20s ± 0%      0.74s ± 0%  -38.02%  (p=0.008 n=5+5)
Gunzip-8                    190ms ± 0%      130ms ± 0%  -31.73%  (p=0.008 n=5+5)
HTTPClientServer-8          205µs ± 1%      166µs ± 2%  -19.27%  (p=0.008 n=5+5)
JSONEncode-8               50.7ms ± 0%     41.5ms ± 0%  -18.10%  (p=0.008 n=5+5)
JSONDecode-8                201ms ± 0%      155ms ± 1%  -22.93%  (p=0.008 n=5+5)
Mandelbrot200-8            13.0ms ± 0%     10.1ms ± 0%  -22.78%  (p=0.008 n=5+5)
GoParse-8                  11.4ms ± 0%      8.5ms ± 0%  -24.80%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8       271ns ± 0%      225ns ± 0%  -16.97%  (p=0.008 n=5+5)
RegexpMatchEasy0_1K-8      1.69µs ± 0%     1.92µs ± 0%  +13.42%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-8       292ns ± 0%      255ns ± 0%  -12.60%  (p=0.000 n=4+5)
RegexpMatchEasy1_1K-8      2.20µs ± 0%     2.38µs ± 0%   +8.38%  (p=0.008 n=5+5)
RegexpMatchMedium_32-8      411ns ± 0%      360ns ± 0%  -12.41%  (p=0.000 n=5+4)
RegexpMatchMedium_1K-8      118µs ± 0%      104µs ± 0%  -12.07%  (p=0.008 n=5+5)
RegexpMatchHard_32-8       6.83µs ± 0%     5.79µs ± 0%  -15.27%  (p=0.016 n=4+5)
RegexpMatchHard_1K-8        205µs ± 0%      176µs ± 0%  -14.19%  (p=0.008 n=5+5)
Revcomp-8                   2.01s ± 0%      1.43s ± 0%  -29.02%  (p=0.008 n=5+5)
Template-8                  259ms ± 0%      158ms ± 0%  -38.93%  (p=0.008 n=5+5)
TimeParse-8                 874ns ± 1%      733ns ± 1%  -16.16%  (p=0.008 n=5+5)
TimeFormat-8               1.00µs ± 1%     0.86µs ± 1%  -13.88%  (p=0.008 n=5+5)

name                     old speed      new speed       delta
GobDecode-8              31.5MB/s ± 0%   44.3MB/s ± 0%  +40.61%  (p=0.008 n=5+5)
GobEncode-8              41.3MB/s ± 0%   50.7MB/s ± 1%  +22.92%  (p=0.008 n=5+5)
Gzip-8                   16.2MB/s ± 0%   26.1MB/s ± 0%  +61.33%  (p=0.008 n=5+5)
Gunzip-8                  102MB/s ± 0%    150MB/s ± 0%  +46.45%  (p=0.016 n=4+5)
JSONEncode-8             38.3MB/s ± 0%   46.7MB/s ± 0%  +22.10%  (p=0.008 n=5+5)
JSONDecode-8             9.64MB/s ± 0%  12.49MB/s ± 0%  +29.54%  (p=0.016 n=5+4)
GoParse-8                5.09MB/s ± 0%   6.78MB/s ± 0%  +33.02%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8     118MB/s ± 0%    142MB/s ± 0%  +20.29%  (p=0.008 n=5+5)
RegexpMatchEasy0_1K-8     605MB/s ± 0%    534MB/s ± 0%  -11.85%  (p=0.016 n=5+4)
RegexpMatchEasy1_32-8     110MB/s ± 0%    125MB/s ± 0%  +14.23%  (p=0.029 n=4+4)
RegexpMatchEasy1_1K-8     465MB/s ± 0%    430MB/s ± 0%   -7.72%  (p=0.008 n=5+5)
RegexpMatchMedium_32-8   2.43MB/s ± 0%   2.77MB/s ± 0%  +13.99%  (p=0.016 n=5+4)
RegexpMatchMedium_1K-8   8.68MB/s ± 0%   9.87MB/s ± 0%  +13.71%  (p=0.008 n=5+5)
RegexpMatchHard_32-8     4.68MB/s ± 0%   5.53MB/s ± 0%  +18.08%  (p=0.016 n=4+5)
RegexpMatchHard_1K-8     5.00MB/s ± 0%   5.83MB/s ± 0%  +16.60%  (p=0.008 n=5+5)
Revcomp-8                 126MB/s ± 0%    178MB/s ± 0%  +40.88%  (p=0.008 n=5+5)
Template-8               7.48MB/s ± 0%  12.25MB/s ± 0%  +63.74%  (p=0.008 n=5+5)

These are pretty big improvements from just recompiling your binary.

Defer and cgo improvements

The question of if defer can be used in hot code paths remains open, but during the 1.8 cycle Austin reduced the overhead of using defer by a half, according to some benchmarks.

The runtime package benchmarks are a little less rosy.

name         old time/op  new time/op  delta
Defer-4       101ns ± 1%    66ns ± 0%  -34.73%  (p=0.000 n=20+20)
Defer10-4    93.2ns ± 1%  62.5ns ± 8%  -33.02%  (p=0.000 n=20+20)
DeferMany-4   148ns ± 3%   131ns ± 3%  -11.42%  (p=0.000 n=19+19)

According to them defer improved by a third in most common circumstances where the statement closes over no more than a single variable.

Additionally, an optimisation by David Crawshaw reduced the overhead of defer in the cgo path by nearly half.

name       old time/op  new time/op  delta
CgoNoop-8  93.5ns ± 0%  51.1ns ± 1%  -45.34%  (p=0.016 n=4+5)

One more thing

Go 1.7 supported 64 bit mips platforms, thanks to the work of Minux and Cherry. However, the less powerful but plentiful, 32 bit mips platforms were not supported. As a bonus, thanks to the work of Vladimir Stefanovic, Go 1.8 will ship will support for 32 bit mips.

% env GOARCH=mips go build -o godoc.mips golang.org/x/tools/cmd/godoc
% file godoc.mips 
godoc.mips: ELF 32-bit MSB  executable, MIPS, MIPS32 version 1 (SYSV), statically linked, not stripped

While 32 bit mips hosts are probably too small to compile Go programs natively, you can always cross compile from your development workstation for linux/mips.

Do not fear first class functions

This is the text of my dotGo 2016 presentation. A recording and slide deck are also available.


firstclass-functions-763

Hello, welcome to dotGo.

Two years ago I stood on a stage, not unlike this one, and told you my opinion for how configuration options should be handled in Go. The cornerstone of my presentation was Rob Pike’s blog post, Self-referential functions and the design of options.

Since then it has been wonderful to watch this idea mature from Rob’s original blog post, to the gRPC project, who in my opinion have continued to evolve this design pattern into its best form to date.

But, when talking to Gophers at a conference in London a few months ago, several of them expressed a concern that while they understood the notion of a function that returns a function, the technique that powers functional options, they worried that other Go programmers—I suspect they meant less experienced Go programmers—wouldn’t be able to understand this style of programming.

And this made me a bit sad because I consider Go’s support of first class functions to be a gift, and something that we should all be able to take advantage of. So I’m here today to show you, that you do not need to fear first class functions.

Functional options recap

To begin, I’ll very quickly recap the functional options pattern

type Config struct{ ... }

func WithReticulatedSplines(c *Config) { ... }

type Terrain struct {
        config Config
}

func NewTerrain(options ...func(*Config)) *Terrain {
        var t Terrain
        for _, option := range options {
                option(&t.config)
        }
        return &t

}

func main() {
        t := NewTerrain(WithReticulatedSplines)
        // [ simulation intensifies ]
}

We start with some options, expressed as functions which take a pointer to a structure to configure. We pass those functions to a constructor, and inside the body of that constructor each option function is invoked in order, passing in a reference to the Config value. Finally, we call NewTerrain with the options we want, and away we go.

Okay, everyone should be familiar with this pattern. Where I believe the confusion comes from, is when you need an option function which take a parameter. For example, we have WithCities, which lets us add a number of cities to our terrain model.

 // WithCities adds n cities to the Terrain model
func WithCities(n int) func(*Config) { ... }

func main() {        
        t := NewTerrain(WithCities(9))      
        // ...
}

Because WithCities takes an argument, we cannot simply pass WithCities to NewTerrain, its signature does not match. Instead we evaluate WithCities, passing in the number of cities to create, and use the result as the value to pass to NewTerrain.

Functions as first class values

What’s going on here? Let’s break it down. Fundamentally, evaluating a function returns a value. We have functions that take two numbers and return a number.

package math

func Min(a, b float64) float64

We have functions that take a slice, and return a pointer to a structure.

package bytes

func NewReader(b []byte) *Reader

and now we have a function which returns a function.

func WithCities(n int) func(*Config)

The type of the value that is returned from WithCities is a function which takes a pointer to a Config. This ability to treat functions as regular values leads to their name: first class functions.

interface.Apply

Another way to think about what is going on here is to try to rewrite the functional option pattern using an interface.

type Option interface {
        Apply(*Config)
}

Rather than a function type we declare an interface, we’ll call it Option, and give it a single method, Apply which takes a pointer to a Config.

func NewTerrain(options ...Option) *Terrain {
        var config Config
        for _, option := range options {
                option.Apply(&config)
        }
        // ...
}

Whenever we call NewTerrain we pass in one or more values that implement the Option interface. Inside NewTerrain, just as before, we loop over the slice of options and call the Apply method on each.

This doesn’t look too different to the previous example. Rather than ranging over a slice of functions and calling them, we range over a slice of interface values and call a method on each. Let’s take a look at the other side, declaring the WithReticulatedSplines option.

type splines struct{}

func (s *splines) Apply(c *Config) { ... }

func WithReticulatedSplines() Option {
        return new(splines)
}

Because we’re passing around interface implementations, we need to declare a type to hold the Apply method. We also need to declare a constructor function to return our splines option implementation–you can already see that this is going to be more code.

To write WithCities using our Option interface we need to do a bit more work.

type cities struct {
        cities int
}

func (c *cities) Apply(c *Config) { ... }

func WithCities(n int) Option {
        return &cities{
                cities: n,
        }
}

In the previous, functional, version the value of n, the number of cities to create, was captured lexically for us in the declaration of the anonymous function. Because we’re using an interface we need to declare a type to hold the count of cities and we need a constructor to assign the field during construction.

func main() {
        t := NewTerrain(WithReticulatedSplines(), WithCities(9))
        // ...
}

Putting it all together, we call NewTerrain with the results of evaluating WithReticulatedSplines and WithCities.

At GopherCon last year Tomás Senart spoke about the duality of a first class function and an interface with one method. You can see this duality play out in our example; an interface with one method and a function are equivalent.

But, you can also see that using functions as first class values involves much less code.

Encapsulating behaviour

Let’s leave interfaces for a moment and talk about some other properties of first class functions.

When we invoke a function or a method, we do so passing around data. The job of that function is often to interpret that data and take some action. Function values allow you to pass behaviour to be executed, rather that data to be interpreted. In effect, passing a function value allows you to declare code that will execute later, perhaps in a different context.

To illustrate this, here is a simple calculator.

type Calculator struct {
        acc float64
}

const (
        OP_ADD = 1 << iota
        OP_SUB
        OP_MUL
)

It has a set of operations it understands.

func (c *Calculator) Do(op int, v float64) float64 {
        switch op {
        case OP_ADD:
                c.acc += v
        case OP_SUB:
                c.acc -= v
        case OP_MUL:
                c.acc *= v
        default:
                panic("unhandled operation")
        }
        return c.acc
}

It has one method, Do, which takes an operation and an operand, v. For convenience, Do also returns the value of the accumulator after the operation is applied.

func main() {
        var c Calculator
        fmt.Println(c.Do(OP_ADD, 100))     // 100
        fmt.Println(c.Do(OP_SUB, 50))      // 50
        fmt.Println(c.Do(OP_MUL, 2))       // 100
}

Our calculator only knows how to add, subtract, and multiply. If we wanted to implement division, we’d have to allocate an operation constant, then open up the Do method and add the code to implement division. Sounds reasonable, it’s only a few lines, but what if we wanted to add square root and exponentiation?

Each time we did this, Do grows longer and become harder to follow, because each time we add an operation we have to encode into Do knowledge of how to interpret that operation.

Let’s rewrite our calculator a little.

type Calculator struct {
        acc float64
}

type opfunc func(float64, float64) float64

func (c *Calculator) Do(op opfunc, v float64) float64 {
        c.acc = op(c.acc, v)
        return c.acc
}

As before we have a Calculator, which manages its own accumulator. The Calculator has a Do method, which this time takes an function as the operation, and a value as the operand. Whenever Do is called, it calls the operation we pass in, using its own accumulator and the operand we provide.

So, how do we use this new Calculator? You guessed it, by writing our operations as functions.

func Add(a, b float64) float64 { return a + b }

This is the code for Add. What about the other operations? It turns out they aren’t too hard either.

func Sub(a, b float64) float64 { return a - b }
func Mul(a, b float64) float64 { return a * b }

func main() {
        var c Calculator
        fmt.Println(c.Do(Add, 5))       // 5
        fmt.Println(c.Do(Sub, 3))       // 2
        fmt.Println(c.Do(Mul, 8))       // 16
}

As before we construct a Calculator and call it passing operations and an operand.

Extending the calculator

Now we can describe operations as functions, we can try to extend our calculator to handle square root.

func Sqrt(n, _ float64) float64 {
        return math.Sqrt(n)
}

But, it turns out there is a problem. math.Sqrt takes one argument, not two. However our Calculator’s Do method’s signature requires an operation function that takes two arguments.

func main() {
        var c Calculator
        c.Do(Add, 16)
        c.Do(Sqrt, 0) // operand ignored
}

Maybe we just cheat and ignore the operand. That’s a bit gross, I think we can do better.

Let’s redefine Add from a function that is called with two values and returns a third, to a function which returns a function that takes a value and returns a value.

func Add(n float64) func(float64) float64 {
        return func(acc float64) float64 {
                return acc + n
        }
}

func (c *Calculator) Do(op func(float64) float64) float64 {
        c.acc = op(c.acc)
        return c.acc
}

Do now invokes the operation function passing in its own accumulator and recording the result back in the accumulator.

func main() {
        var c Calculator
        c.Do(Add(10))   // 10
        c.Do(Add(20))   // 30
}

Now in main we call Do not with the Add function itself, but with the result of evaluating Add(10). The type of the result of evaluating Add(10) is a function which takes a value, and returns a value, matching the signature that Do requires.

func Sub(n float64) func(float64) float64 {
        return func(acc float64) float64 {
                return acc - n
        }
}

func Mul(n float64) func(float64) float64 {
        return func(acc float64) float64 {
                return acc * n
        }
}

Subtraction and multiplication are similarly easy to implement. But what about square root?

func Sqrt() func(float64) float64 {
        return func(n float64) float64 {
                return math.Sqrt(n)
        }
}

func main() {
        var c Calculator
        c.Do(Add(2))
        c.Do(Sqrt())   // 1.41421356237
}

This implementation of square root avoids the awkward syntax of the previous calculator’s operation function, as our revised calculator now operates on functions which take and return only one value.

Hopefully you’ve noticed that the signature of our Sqrt function is the same as math.Sqrt, so we can make this code smaller by reusing any function from the math package that takes a single argument.

func main() {
        var c Calculator
        c.Do(Add(2))      // 2
        c.Do(math.Sqrt)   // 1.41421356237
        c.Do(math.Cos)    // 0.99969539804
}

We started with a model of hard coded, interpreted logic. We moved to a more functional model, where we pass in the behaviour we want. Then, by taking it a step further, we generalised our calculator to work for operations regardless of their number of arguments.

Let’s talk about actors

photofunia-1475700854-1253
Let’s change tracks a little and talk about why most of us are here at a Go conference; concurrency, specifically actors. To give due credit, the examples here are inspired by Bryan Boreham’s talk from GolangUK, you should check it out.

Suppose we’re building a chat server, we plan to be the next Hipchat or Slack, but we’ll start small for the moment.

type Mux struct {
        mu    sync.Mutex
        conns map[net.Addr]net.Conn
}

func (m *Mux) Add(conn net.Conn) {
        m.mu.Lock()
        defer m.mu.Unlock()
        m.conns[conn.RemoteAddr()] = conn
}

We have a way to register new connections.

func (m *Mux) Remove(addr net.Addr) {
        m.mu.Lock()
        defer m.mu.Unlock()
        delete(m.conns, addr)
}

Remove old connections.

func (m *Mux) SendMsg(msg string) error {
        m.mu.Lock()
        defer m.mu.Unlock()
        for _, conn := range m.conns {
                err := io.WriteString(conn, msg)
                if err != nil {
                        return err
                }
        }
        return nil
}

And a way to send a message to all the registered connections. Because this is a server, all of these methods will be called concurrently, so we need to use a mutex to protect the conns map and prevent data races. Is this what you’d call idiomatic Go code?

Don’t communicate by sharing memory, share memory by communicating.

Our first proverb–don’t mediate access to shared memory with locks and mutexes, instead share that memory by communicating. So let’s apply this advice to our chat server.

Rather than using a mutex to serialise access to the Mux‘s conns map, we can give that job to a goroutine, and communicate with that goroutine via channels.

type Mux struct {
        add     chan net.Conn
        remove  chan net.Addr
        sendMsg chan string
}

func (m *Mux) Add(conn net.Conn) {
        m.add <- conn
}

Add sends the connection to add to the add channel.

func (m *Mux) Remove(addr net.Addr) {
        m.remove <- addr
}

Remove sends the address of the connection to the remove channel.

func (m *Mux) SendMsg(msg string) error {
        m.sendMsg <- msg
        return nil
}

And send message sends the message to be transmitted to each connection to the sendMsg channel.

func (m *Mux) loop() {
        conns := make(map[net.Addr]net.Conn)
        for {
                select {
                case conn := <-m.add:
                        m.conns[conn.RemoteAddr()] = conn
                case addr := <-m.remove:
                        delete(m.conns, addr)
                case msg := <-m.sendMsg:
                        for _, conn := range m.conns {
                                io.WriteString(conn, msg)
                        }
                }
        }
}

Rather than using a mutex to serialise access to the conns map, loop will wait until it receives an operation in the form of a value sent over one of the add, remove, or sendMsg channels and apply the relevant case. We don’t need a mutex anymore because the shared state, our conns map, is local to the loop function.

But, there’s still a lot of hard coded logic here. loop only knows how to do three things; add, remove and broadcast a message. As with the previous example, adding new features to our Mux type will involve:

  • creating a channel.
  • adding a helper to send the data over the channel.
  • extending the select logic inside loop to process that data.

Just like our Calculator example we can rewrite our Mux to use first class functions to pass around behaviour we want to executed, not data to interpret. Now, each method sends an operation to be executed in the context of the loop function, using our single ops channel.

type Mux struct {
        ops chan func(map[net.Addr]net.Conn)
}

func (m *Mux) Add(conn net.Conn) {
        m.ops <- func(m map[net.Addr]net.Conn) {
                m[conn.RemoteAddr()] = conn
        }
}

In this case the signature of the operation is a function which takes a map of net.Addr’s to net.Conn’s. In a real program you’d probably have a much more complicated type to represent a client connection, but it’s sufficient for the purpose of this example.

func (m *Mux) Remove(addr net.Addr) {
        m.ops <- func(m map[net.Addr]net.Conn) {
                delete(m, addr)
        }
}

Remove is similar, we send a function that deletes its connection’s address from the supplied map.

func (m *Mux) SendMsg(msg string) error {
        m.ops <- func(m map[net.Addr]net.Conn) {
                for _, conn := range m {
                        io.WriteString(conn, msg)
                }
        }
        return nil
}

SendMsg is a function which iterates over all connections in the supplied map and calls io.WriteString to send each a copy of the message.

func (m *Mux) loop() {

        conns := make(map[net.Addr]net.Conn)
        for op := range m.ops {
                op(conns)
        }
}

You can see that we’ve moved the logic from the body of loop into anonymous functions created by our helpers. So the job of loop is now to create a conns map, wait for an operation to be provided on the ops channel, then invoke it, passing in its map of connections.

But there are a few problems still to fix. The most pressing is the lack of error handling in SendMsg; an error writing to a connection will not be communicated back to the caller. So let’s fix that now.

func (m *Mux) SendMsg(msg string) error {
        result := make(chan error, 1)
        m.ops <- func(m map[net.Addr]net.Conn) {
                for _, conn := range m.conns {
                        err := io.WriteString(conn, msg)
                        if err != nil {
                                result <- err
                                return
                        }
                }
                result <- nil
        }
        return <-result
}

To handle the error being generated inside the anonymous function we pass to loop we need to create a channel to communicate the result of the operation. This also creates a point of synchronisation, the last line of SendMsg blocks until the function we passed into loop has been executed.

func (m *Mux) loop() {
        conns := make(map[net.Addr]net.Conn)
        for op := range m.ops {
                op(conns)
        }
}

Note that we didn’t have the change the body of loop at all to incorporate this error handling. And now we know how to do this, we can easily add a new function to Mux to send a private message to a single client.

func (m *Mux) PrivateMsg(addr net.Addr, msg string) error {
        result := make(chan net.Conn, 1)
        m.ops <- func(m map[net.Addr]net.Conn) {
                result <- m[addr]
        }
        conn := <-result
        if conn == nil {
                return errors.Errorf("client %v not registered", addr)
        }
        return io.WriteString(conn, msg)
}

To do this we pass a “lookup function” to loop via the ops channel, which will look in the map provided to it—this is loop‘s conns map—and return the value for the address we want on the result channel.

In the rest of the function we check to see if the result was nil—the zero value from the map lookup implies that the client is not registered. Otherwise we now have a reference to the client and we can call io.WriteString to send them a message.

And just to reiterate, we did this all without changing the body of loop, or affecting any of the other operations.

Conclusion

In summary

  • First class functions bring you tremendous expressive power. They let you pass around behaviour, not just dead data that must be interpreted.
  • First class functions aren’t new or novel. Many older languages have offered them, even C. In fact it was only somewhere along the lines of removing pointers did programmers in the OO stream of languages lose access to first class functions. If you’re a Javascript programmer, you’ve probably spent the last 15 minutes wondering what the big deal is.
  • First class functions, like the other features Go offers, should be used with restraint. Just as it is possible to make an overcomplicated program with the overuse of channels, it’s possible to make an impenetrable program with an overuse of first class functions. But that does not mean you shouldn’t use them at all; just use them in moderation.
  • First class functions are something that I believe every Go programmer should have in their toolbox. First class functions aren’t unique to Go, and Go programmers shouldn’t be afraid of them.
  • If you can learn to use interfaces, you can learn to use first class functions. They aren’t hard, just a little unfamiliar, and unfamiliarity is something that I believe can be overcome with time and practice.

So next time you define an API that has just one method, ask yourself, shouldn’t it really just be a function?

Introducing Go 2.0

Just so we’re clear, this post is a thought experiment, not any form of commitment to deliver Go 2.0 in any time frame. While I personally believe there will be a Go 2.0 in the future, I’m in no position to influence its creation; hence, this post is mere speculation.


Why introduce a new major version of Go?

Go 1.0 was released over 4 years ago, and since then the Go 1 compatibility contract has been a boon to anyone investing in Go as the language to build their product.  So, why introduce a new version of Go?

By the time that Go 1.8 is released at the start of 2017, the standard library will have accumulated cruft and hacks for five years, and if you consider that Go started life in 2007, it’s closer to ten. An opportunity to address this cruft and remove some of the packages which are now understood to be a bad idea would make the standard library more consistent and approachable to newcomers.

It is possible the language itself could become smaller. Rob Pike noted in 2014 that there are too many ways to declare a variable in Go, and this could be rationalised. Similarly the incongruence between make and new might be resolved. Then there is the problem of non latin characters not being considered upper case. So, lots of little cleanups to do.

Obviously some kind of solution for templated types would have to be part of any Go 2.0 discussion and, as David Symonds pointed out several years ago, they would have to be used to rewrite the standard library, both causing, and justifying, the compatibility break.

Backward compatibility

Backwards compatibility is not about syntax or features, backwards compatibility is about investment. Investment in the language; both at a technical and career level. Investment in libraries. Investment in backends that generate machine code. Investment in the mid part of the compiler that transforms and optimises code. Investment in build scripts and toolchains that embeds one piece of compiled code into another.

Brian Goetz, the Java language architect, describes the commitment to backward compatibility as the “central park effect“. This is something our cousins in the hardware world have long understood–never let the customer unbolt your product from the rack, ‘cos they might take the opportunity to use that space for your competition.

The lessons of Python 3000 are prescient; ignore backward compatibility at your peril. No matter how compelling the new version of your language, if you make it incompatible with the investment in the previous version, you are launching a new product which is in direct competition with itself. And just to make it clear, I’m not picking on Python specifically, there are plenty of other examples; D 2.0, Perl 6, and VB.net also come to mind.

All of these examples show the danger of creating a new version of a language that requires its users to rewrite all the source of their program, including all their dependencies (which may be non trivial), before it will compile and run.

A plausible implementation

So, how to create a new Go 2.0 language, with a new syntax and a new standard library, without making it incompatible every piece of Go code written to date? How could we avoid the all or nothing stand-off in which other languages place their users?

What if we could combine code written in Go 1.0 and a proposed Go 2.0 in one program using the package level as the boundary between language versions? Go 2.0 would be a new language, with a new standard library built upon a runtime shared between itself and Go 1.0, thereby allowing users to work outwards from their Go 2.0 main package to the limbs of their dependency graph, one package at a time.

A Go 2.0 package would be able to call down to Go 1.0, but not the other way around. Go 2.0 types would be able to interoperate with Go 1.0 types, but Go 1.0 types would be unaware of Go 2.0 constructed code. Perhaps calling from Go 2.0 to Go 1.0 looks conceptually like using cgo to call C code, except without the overhead as both languages would be compiled to the same intermediary form.

The key is both language versions would be compiled to a single intermediate representation, one that can represent the superset of both syntaxes. This has been done before; in the first few versions of Go, C code and Go code was compiled to an intermediate representation, Ken Thompson’s universal assembly language, then converted to machine code at link time. Now with Keith Randall’s SSA compiler, there is a single low level intermediate representation (similar to gcc’s GIMPLE and LLVM’s IR) that describes all the things that make Go programs Go1.

There is a strong precedent for this; the ~Sun~ Oracle JVM. For more than a decade the JVM has hosted byte-code that was not compiled from .java source file. Combined with a version of gofix that could automate some of the effort in migrating a package to Go 2.0 syntax, this could be a plausible way to introduce a new version of Go without abrogating the investment in code written for Go 1.0.


  1. This also raises the possibility of developing other language front-ends using the Go toolchain. If you look at what LLVM has done for projects like Pony, Crystal, and Rust, think of what a portable, cross platform, optimising compiler, with user space concurrency built in, and written in Go, not C++, would mean for language experimentation.

IoT p0wnership

The recent total war bombardment of Brian Krebs’ site, and the subsequent allegation that the traffic emanated from compromised home routers, cameras, baby monitors, doorbells, thermostats, and whatnot, got me thinking.

So DDoS is a thing, and as much as I enjoy the lampooning of IoT by everyone’s favourite wat account, @internetofshit, I wonder if the status quo of insecure consumer devices will have an unexpected knock-on effect.

Previously, DDoS traffic was assumed to come from compromised servers (waves hand in the approximate direction of the cloud) or malware infected PCs. For the former, cloud providers have gotten pretty good at rooting out insecure hosts and booting them off their networks, and organised crime have pretty much figured out that deleting stuff of people’s home computer is less profitable than encrypting said stuff and holding it for ransom. There’s always the chance of someone spotting unexpected outbound traffic from a box — that is one thing the host based AV industry does seem to be good at — because there’s usually a human sitting in front of the device whenever it’s awake. But not so with the cable router or ADSL modem sitting under the hall table, or the IP connected baby monitor you installed in the nursery, or the hundreds of other IP connected whatevers produced for the lowest possible cost because that is what we, as consumers, demand; price, the ultimate arbiter of quality.

My home router, what's it doing? I've got no idea.

My home router. What’s it doing? I’ve got no idea.

Homes filled with tiny linux boxes running weak software are a tempting target. Not just because owning them is easy, but as long as the devices continue to work as reliably as they did before compromise, nobody is going to suspect that their excess capacity is being soaked up under someone else’s control. Embedded devices have other attractive properties, they’re usually online 24/7, not sporadically like a laptop trying to conserve battery power, and commonly enjoy a wired ethernet connection, not the whims of a rapidly changing WiFi network. Can any of you reading this post tell me that you know the provenance of every packet that leaves your home network?

But back to Krebs and the IP cameras. With the nose dive in desktop and laptop sales, it’s pretty clear that the botnet action has moved to the embedded space. Assuming the attribution is correct, then DDoS has gone from being manageable to very not manageable, quickly. It’s the early 2000’s spam wars all over again, and companies that make real money on the internet are going expect a solution. And when I say solution, I mean litigation.

Who’s to blame for shitty insecure consumer devices? Who’s going to receive the summons?

Will it be the local ISPs, or carriers? Unlikely. In the States, carriers enjoy common carrier status which indemnifies them from crimes committed using their service. In Australia the situation is less clear but the message isn’t. ISPs are not interested in policing the behaviour of the users of that service. If you’re Walmart who’s been forced offline by 1Tbps of traffic during the holiday sales, you can forget about suing ISPs.

Ok, what about the device manufacturers themselves? I’ll be honest, I don’t have the stamina to read the EULA paperwork that comes with a device so I cannot assert this as a fact, but I would be amazed if the liability for the damage the device did, if not expressly waived by opening the box, exceeded the purchase price. Looking towards other industries, car manufacturers are not liable for the damage their vehicles do in the case of misuse, which is why in many parts of the world licensing a vehicle to drive on a public road requires compulsory third party insurance.

Let’s cut to the chase, the reason IoT DDoS is a thing is because the security of the software inside those devices is laughable. You can debate about why this is why it is, but that does not change the fact that all the other members of this supply chain have deftly sidestepped the buck on this one, and so the liability for insecure software rests with us, the authors of said software. Because, as Robert C. Martin likes to remind us, software rules the world, so programmers rule the world.

Serious stuff.

Serious stuff.

If you look at the history of unsafe products; food, paint, hair spray, electrical goods, governments have forced manufacturers to improve with a combination of regulations and import controls. In Australia, for example, it is illegal to import a product that connects to the mains supply unless its plug has the correct shroud over the live and neutral pins. This is how governments work, they cut off a manufacturer’s air supply by forbidding them from importing their products into the country. That tends to effect change, smartly.

Will IoT be the tipping point that forces the software industry to adopt voluntary, or mandatory, regulation or procedural standardisation?

 

Go 1.8 performance improvements, one month in

Sunday September the 18th marks a month since the Go 1.8 cycle opened officially. I’m passionate about the performance of Go programs, and of the compiler itself. This post is a brief look at the state of play, roughly 1/2 way into the development cycle for Go 1.81.

Note: these results are of course preliminary and represent only a point in time, not the performance of the final Go 1.8 release.

Compile times

Nothing much to report here. Using the methodology from my previous Go 1.7 benchmarks, there is a 3.22%–5.11% improvement in full compile time compared to Go 1.7.

Go 1.4.3, Go 1.7, Go tip

Performance improvements

Intel amd64

Better code generation and small improvements to the runtime and standard library show some small improvements for amd642, but really nothing to write home about yet.

name                       old time/op    new time/op  delta
BinaryTree17-4              3.07s ± 2%     3.06s ± 2%    ~      (p=0.661 n=10+9)
Fannkuch11-4                3.23s ± 1%     3.22s ± 0%  -0.43%   (p=0.008 n=9+10)
FmtFprintfEmpty-4          64.4ns ± 0%    61.8ns ± 4%  -4.17%   (p=0.005 n=9+10)
FmtFprintfString-4          162ns ± 0%     162ns ± 0%    ~      (p=0.065 n=10+9)
FmtFprintfInt-4             142ns ± 0%     142ns ± 0%    ~      (p=0.137 n=8+10)
FmtFprintfIntInt-4          220ns ± 0%     217ns ± 0%  -1.18%   (p=0.000 n=9+10)
FmtFprintfPrefixedInt-4     224ns ± 0%     224ns ± 1%    ~       (p=0.206 n=9+9)
FmtFprintfFloat-4           313ns ± 0%     312ns ± 0%  -0.26%   (p=0.001 n=10+9)
FmtManyArgs-4               906ns ± 0%     894ns ± 0%  -1.32%    (p=0.000 n=7+6)
GobDecode-4                8.88ms ± 1%    8.81ms ± 0%  -0.81%  (p=0.003 n=10+10)
GobEncode-4                7.93ms ± 1%    7.88ms ± 0%  -0.66%   (p=0.008 n=9+10)
Gzip-4                      272ms ± 1%     277ms ± 0%  +1.95%   (p=0.000 n=10+9)
Gunzip-4                   47.4ms ± 0%    47.4ms ± 0%    ~      (p=0.720 n=9+10)
HTTPClientServer-4          201µs ± 4%     202µs ± 2%    ~     (p=0.631 n=10+10)
JSONEncode-4               19.3ms ± 0%    19.3ms ± 0%    ~     (p=0.063 n=10+10)
JSONDecode-4               61.0ms ± 0%    61.2ms ± 0%  +0.33%   (p=0.000 n=10+8)
Mandelbrot200-4            5.20ms ± 0%    5.20ms ± 0%    ~      (p=0.475 n=10+7)
GoParse-4                  3.95ms ± 1%    3.97ms ± 1%  +0.65%    (p=0.003 n=9+9)
RegexpMatchEasy0_32-4      88.4ns ± 0%    88.7ns ± 0%  +0.34%   (p=0.001 n=10+9)
RegexpMatchEasy0_1K-4      1.14µs ± 0%    1.14µs ± 0%    ~       (p=0.369 n=9+6)
RegexpMatchEasy1_32-4      82.6ns ± 0%    82.0ns ± 0%  -0.70%   (p=0.000 n=9+10)
RegexpMatchEasy1_1K-4       469ns ± 0%     463ns ± 0%  -1.23%    (p=0.000 n=6+9)
RegexpMatchMedium_32-4      138ns ± 1%     136ns ± 0%  -1.38%   (p=0.000 n=10+9)
RegexpMatchMedium_1K-4     43.6µs ± 1%    42.0µs ± 0%  -3.74%    (p=0.000 n=9+9)
RegexpMatchHard_32-4       2.25µs ± 1%    2.23µs ± 0%  -0.57%    (p=0.000 n=8+8)
RegexpMatchHard_1K-4       68.8µs ± 0%    68.6µs ± 0%  -0.37%    (p=0.000 n=8+8)
Revcomp-4                   477ms ± 1%     472ms ± 0%  -1.03%    (p=0.000 n=8+8)
Template-4                 76.1ms ± 0%    76.4ms ± 0%  +0.35%    (p=0.000 n=9+9)
TimeParse-4                 367ns ± 0%     366ns ± 0%  -0.16%   (p=0.003 n=10+8)
TimeFormat-4                386ns ± 0%     384ns ± 0%  -0.58%    (p=0.000 n=9+9)

name                     old speed      new speed      delta
GobDecode-4              86.4MB/s ± 1%  87.1MB/s ± 0%  +0.81%  (p=0.003 n=10+10)
GobEncode-4              96.7MB/s ± 1%  97.4MB/s ± 0%  +0.66%   (p=0.007 n=9+10)
Gzip-4                   71.4MB/s ± 1%  70.0MB/s ± 0%  -1.91%   (p=0.000 n=10+9)
Gunzip-4                  409MB/s ± 0%   410MB/s ± 0%    ~      (p=0.703 n=9+10)
JSONEncode-4              101MB/s ± 0%   100MB/s ± 0%    ~     (p=0.084 n=10+10)
JSONDecode-4             31.8MB/s ± 0%  31.7MB/s ± 0%  -0.33%   (p=0.000 n=10+8)
GoParse-4                14.7MB/s ± 1%  14.6MB/s ± 1%  -0.67%    (p=0.002 n=9+9)
RegexpMatchEasy0_32-4     362MB/s ± 0%   361MB/s ± 0%  -0.36%   (p=0.000 n=10+9)
RegexpMatchEasy0_1K-4     898MB/s ± 0%   898MB/s ± 0%    ~       (p=0.762 n=9+8)
RegexpMatchEasy1_32-4     387MB/s ± 0%   390MB/s ± 0%  +0.70%   (p=0.000 n=9+10)
RegexpMatchEasy1_1K-4    2.18GB/s ± 0%  2.21GB/s ± 0%  +1.20%    (p=0.000 n=9+9)
RegexpMatchMedium_32-4   7.23MB/s ± 1%  7.32MB/s ± 0%  +1.19%   (p=0.000 n=10+9)
RegexpMatchMedium_1K-4   23.5MB/s ± 1%  24.4MB/s ± 0%  +3.88%    (p=0.000 n=9+9)
RegexpMatchHard_32-4     14.2MB/s ± 1%  14.3MB/s ± 0%  +0.58%    (p=0.000 n=8+8)
RegexpMatchHard_1K-4     14.9MB/s ± 0%  14.9MB/s ± 0%  +0.34%    (p=0.000 n=8+7)
Revcomp-4                 533MB/s ± 1%   539MB/s ± 0%  +1.04%    (p=0.000 n=8+8)
Template-4               25.5MB/s ± 0%  25.4MB/s ± 0%  -0.36%    (p=0.000 n=9+9)

ARM

The major improvement that landed recently in the development branch is the conversion of the remaining architecture backends to use the compiler’s SSA form. This has brought a substantial improvement in generated code for non Intel architectures, like ARM3.

name                       old time/op    new time/op    delta
BinaryTree17-4              33.8s ± 1%      27.7s ± 0%  -18.06%  (p=0.000 n=10+10)
Fannkuch11-4                42.0s ± 0%      19.3s ± 0%  -54.10%  (p=0.000 n=10+10)
FmtFprintfEmpty-4           670ns ± 1%      581ns ± 1%  -13.30%  (p=0.000 n=10+10)
FmtFprintfString-4         2.04µs ± 1%     1.65µs ± 0%  -19.09%  (p=0.000 n=10+10)
FmtFprintfInt-4            1.71µs ± 0%     1.21µs ± 0%  -29.39%   (p=0.000 n=10+9)
FmtFprintfIntInt-4         2.69µs ± 1%     1.94µs ± 0%  -27.77%  (p=0.000 n=10+10)
FmtFprintfPrefixedInt-4    2.70µs ± 0%     1.85µs ± 0%  -31.41%   (p=0.000 n=10+9)
FmtFprintfFloat-4          5.15µs ± 0%     3.65µs ± 0%  -29.01%   (p=0.000 n=9+10)
FmtManyArgs-4              11.3µs ± 0%      8.5µs ± 0%  -24.79%   (p=0.000 n=10+9)
GobDecode-4                 112ms ± 0%       77ms ± 1%  -31.04%    (p=0.000 n=9+9)
GobEncode-4                88.5ms ± 1%     77.2ms ± 1%  -12.78%  (p=0.000 n=10+10)
Gzip-4                      4.79s ± 0%      3.34s ± 0%  -30.18%    (p=0.000 n=9+9)
Gunzip-4                    702ms ± 0%      463ms ± 0%  -34.05%  (p=0.000 n=10+10)
HTTPClientServer-4          645µs ± 3%      571µs ± 3%  -11.45%  (p=0.000 n=10+10)
JSONEncode-4                227ms ± 0%      186ms ± 0%  -18.16%  (p=0.000 n=10+10)
JSONDecode-4                845ms ± 0%      618ms ± 0%  -26.81%  (p=0.000 n=10+10)
Mandelbrot200-4            59.3ms ± 0%     40.0ms ± 0%  -32.47%  (p=0.000 n=10+10)
GoParse-4                  45.0ms ± 0%     37.0ms ± 0%  -17.68%    (p=0.000 n=9+9)
RegexpMatchEasy0_32-4       974ns ± 0%      878ns ± 0%   -9.81%   (p=0.000 n=10+9)
RegexpMatchEasy0_1K-4      4.60µs ± 0%     4.48µs ± 0%   -2.57%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-4      1.02µs ± 0%     0.94µs ± 0%   -8.08%   (p=0.000 n=8+10)
RegexpMatchEasy1_1K-4      6.92µs ± 0%     6.08µs ± 0%  -12.10%  (p=0.000 n=10+10)
RegexpMatchMedium_32-4     1.61µs ± 0%     1.27µs ± 0%  -20.98%    (p=0.000 n=9+6)
RegexpMatchMedium_1K-4      447µs ± 0%      317µs ± 0%  -29.05%   (p=0.000 n=10+9)
RegexpMatchHard_32-4       24.9µs ± 0%     18.4µs ± 0%  -25.89%  (p=0.000 n=10+10)
RegexpMatchHard_1K-4        740µs ± 0%      552µs ± 0%  -25.36%  (p=0.000 n=10+10)
Revcomp-4                  81.0ms ± 1%     65.2ms ± 0%  -19.53%    (p=0.000 n=9+9)
Template-4                  1.17s ± 0%      0.81s ± 0%  -31.28%    (p=0.000 n=9+9)
TimeParse-4                5.52µs ± 0%     3.79µs ± 0%  -31.42%   (p=0.000 n=10+9)
TimeFormat-4               10.6µs ± 0%      8.5µs ± 0%  -19.14%  (p=0.000 n=10+10)

name                     old speed      new speed        delta
GobDecode-4              6.86MB/s ± 0%   9.95MB/s ± 1%  +45.00%    (p=0.000 n=9+9)
GobEncode-4              8.67MB/s ± 1%   9.94MB/s ± 1%  +14.69%  (p=0.000 n=10+10)
Gzip-4                   4.05MB/s ± 0%   5.81MB/s ± 0%  +43.32%   (p=0.000 n=10+9)
Gunzip-4                 27.6MB/s ± 0%   41.9MB/s ± 0%  +51.63%  (p=0.000 n=10+10)
JSONEncode-4             8.53MB/s ± 0%  10.43MB/s ± 0%  +22.20%  (p=0.000 n=10+10)
JSONDecode-4             2.30MB/s ± 0%   3.14MB/s ± 0%  +36.39%   (p=0.000 n=9+10)
GoParse-4                1.29MB/s ± 0%   1.56MB/s ± 0%  +20.93%   (p=0.000 n=9+10)
RegexpMatchEasy0_32-4    32.8MB/s ± 0%   36.4MB/s ± 0%  +10.87%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K-4     222MB/s ± 0%    228MB/s ± 0%   +2.64%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-4    31.3MB/s ± 0%   34.0MB/s ± 0%   +8.75%   (p=0.000 n=9+10)
RegexpMatchEasy1_1K-4     148MB/s ± 0%    168MB/s ± 0%  +13.76%  (p=0.000 n=10+10)
RegexpMatchMedium_32-4    620kB/s ± 0%    790kB/s ± 0%  +27.42%   (p=0.000 n=10+8)
RegexpMatchMedium_1K-4   2.29MB/s ± 0%   3.23MB/s ± 0%  +41.05%  (p=0.000 n=10+10)
RegexpMatchHard_32-4     1.29MB/s ± 0%   1.74MB/s ± 0%  +34.88%   (p=0.000 n=9+10)
RegexpMatchHard_1K-4     1.38MB/s ± 0%   1.85MB/s ± 0%  +34.06%  (p=0.000 n=10+10)
Revcomp-4                31.4MB/s ± 1%   39.0MB/s ± 0%  +24.26%    (p=0.000 n=9+9)
Template-4               1.65MB/s ± 0%   2.41MB/s ± 0%  +45.71%   (p=0.000 n=10+9)

Notes:

  1. Despite the Go 1.8 development cycle opening 18 days late, in order to keep to the 6 month cadence, the feature freeze for this cycle will still occur on the 1st of November.
  2. Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 3.13.0-95-generic #142-Ubuntu
  3. Freescale i.MX6, 3.14.77-1-ARCH

SOLID Go Design

This post is based on the text of my GolangUK keynote delivered on the 18th of August 2016.
A recording of the talk is available on YouTube.

This post has been translated into Traditional Chinese by Haohao Tian. Thanks Haohao!


How many Go programmers are there in the world?

How many Go programmers are there in the world? Think of a number and hold it in your head, we’ll come back to it at the end of this talk.

Code review

Who here does code review as part of their job? [the entire room raised their hand, which was encouraging]. Okay, why do you do code review? [someone shouted out “to stop bad code”]

If code review is there to catch bad code, then how do you know if the code you’re reviewing is good, or bad?

Now it’s fine to say “that code is ugly” or ”wow that source code is beautiful”, just as you might say “this painting is beautiful” or “this room is beautiful” but these are subjective terms, and I’m looking for objective ways to talk about the properties of good or bad code.

Bad code

What are some of the properties of bad code that you might pick up on in code review?

  • Rigid. Is the code rigid? Does it have a straight jacket of overbearing types and parameters, that making modification difficult?
  • Fragile. Is the code fragile? Does the slightest change ripple through the code base causing untold havoc?
  • Immobile. Is the code hard to refactor? Is it one keystroke away from an import loop?
  • Complex. Is there code for the sake of having code, are things over-engineered?
  • Verbose. Is it just exhausting to use the code? When you look at it, can you even tell what this code is trying to do?

Are these positive sounding words? Would you be pleased to see these words used in a review of your code?

Probably not.

Good design

But this is an improvement, now we can say things like “I don’t like this because it’s too hard to modify”, or “I don’t like this because i cannot tell what the code is trying to do”, but what about leading with the positive?

Wouldn’t it be great if there were some ways to describe the properties of good design, not just bad design, and to be able to do so in objective terms?

SOLID

In 2002 Robert Martin published his book, Agile Software Development, Principles, Patterns, and Practices. In it he described five principles of reusable software design, which he called the SOLID principles, after the first letters in their names.

  • Single Responsibility Principle
  • Open / Closed Principle
  • Liskov Substitution Principle
  • Interface Segregation Principle
  • Dependency Inversion Principle

This book is a little dated, the languages that it talks about are the ones in use more than a decade ago. But, perhaps there are some aspects of the SOLID principles that may give us a clue about how to talk about a well designed Go programs.

So this is what I want to spend some time discussing with you this morning.

Single Responsibility Principle

The first principle of SOLID, the S, is the single responsibility principle.

A class should have one, and only one, reason to change.
–Robert C Martin

Now Go obviously doesn’t have classes—instead we have the far more powerful notion of composition—but if you can look past the use of the word class, I think there is some value here.

Why is it important that a piece of code should have only one reason for change? Well, as distressing as the idea that your own code may change, it is far more distressing to discover that code your code depends on is changing under your feet. And when your code does have to change, it should do so in response to a direct stimuli, it shouldn’t be a victim of collateral damage.

So code that has a single responsibility therefore has the fewest reasons to change.

Coupling & Cohesion

Two words that describe how easy or difficult it is to change a piece of software are coupling and cohesion.

Coupling is simply a word that describes two things changing together–a movement in one induces a movement in another.

A related, but separate, notion is the idea of cohesion, a force of mutual attraction.

In the context of software, cohesion is the property of describing pieces of code are naturally attracted to one another.

To describe the units of coupling and cohesion in a Go program, we might talk about functions and methods, as is very common when discussing SRP but I believe it starts with Go’s package model.

Package names

In Go, all code lives inside a package, and a well designed package starts with its name. A package’s name is both a description of its purpose, and a name space prefix. Some examples of good packages from the Go standard library might be:

  • net/http, which provides http clients and servers.
  • os/exec, which runs external commands.
  • encoding/json, which implements encoding and decoding of JSON documents.

When you use another package’s symbols inside your own this is accomplished by the `import` declaration, which establishes a source level coupling between two packages. They now know about each other.

Bad package names

This focus on names is not just pedantry. A poorly named package misses the opportunity to enumerate its purpose, if indeed it ever had one.

What does package server provide? … well a server, hopefully, but which protocol?

What does package private provide? Things that I should not see? Should it have any public symbols?

And package common, just like its partner in crime, package utils, is often found close by these other offenders.

Catch all packages like these become a dumping ground for miscellany, and because they have many responsibilities they change frequently and without cause.

Go’s UNIX philosophy

In my view, no discussion about decoupled design would be complete without mentioning Doug McIlroy’s Unix philosophy; small, sharp tools which combine to solve larger tasks, oftentimes tasks which were not envisioned by the original authors.

I think that Go packages embody the spirit of the UNIX philosophy. In effect each Go package is itself a small Go program, a single unit of change, with a single responsibility.

Open / Closed Principle

The second principle, the O, is the open closed principle by Bertrand Meyer who in 1988 wrote:

Software entities should be open for extension, but closed for modification.
–Bertrand Meyer, Object-Oriented Software Construction

How does this advice apply to a language written 21 years later?

package main

type A struct {

        year int

}

func (a A) Greet() { fmt.Println("Hello GolangUK", a.year) }

type B struct {

        A

}

func (b B) Greet() { fmt.Println("Welcome to GolangUK", b.year) }

func main() {

        var a A

        a.year = 2016

        var b B

        b.year = 2016

        a.Greet() // Hello GolangUK 2016

        b.Greet() // Welcome to GolangUK 2016

}

We have a type A, with a field year and a method Greet. We have a second type, B which embeds an A, thus callers see B‘s methods overlaid on A‘s because A is embedded, as a field, within B, and B can provide its own Greet method, obscuring that of A.

But embedding isn’t just for methods, it also provides access to an embedded type’s fields. As you see, because both A and B are defined in the same package, B can access A‘s private year field as if it were declared inside B.

So embedding is a powerful tool which allows Go’s types to be open for extension.

package main

type Cat struct {

        Name string

}

func (c Cat) Legs() int { return 4 }

func (c Cat) PrintLegs() {

        fmt.Printf("I have %d legs\n", c.Legs())

}

type OctoCat struct {

        Cat

}

func (o OctoCat) Legs() int { return 5 }

func main() {

        var octo OctoCat

        fmt.Println(octo.Legs()) // 5

        octo.PrintLegs()         // I have 4 legs

}

In this example we have a Cat type, which can count its number of legs with its Legs method. We embed this Cat type into a new type, an OctoCat, and declare that Octocats have five legs. However, although OctoCat defines its own Legs method, which returns 5, when the PrintLegs method is invoked, it returns 4.

This is because PrintLegs is defined on the Cat type. It takes a Cat as its receiver, and so it dispatches to Cat‘s Legs method. Cat has no knowledge of the type it has been embedded into, so its method set cannot be altered by embedding.

Thus, we can say that Go’s types, while being open for extension, are closed for modification.

In truth, methods in Go are little more than syntactic sugar around a function with a predeclared formal parameter, their receiver.

func (c Cat) PrintLegs() {
        fmt.Printf("I have %d legs\n", c.Legs())
}

func PrintLegs(c Cat) {
        fmt.Printf("I have %d legs\n", c.Legs())
}

The receiver is exactly what you pass into it, the first parameter of the function, and because Go does not support function overloading, OctoCats are not substitutable for regular Cats. Which brings me to the next principle.

Liskov Substitution Principle

Coined by Barbara Liskov, the Liskov substitution principle states, roughly, that two types are substitutable if they exhibit behaviour such that the caller is unable to tell the difference.

In a class based language, Liskov’s substitution principle is commonly interpreted as a specification for an abstract base class with various concrete subtypes. But Go does not have classes, or inheritance, so substitution cannot be implemented in terms of an abstract class hierarchy.

Interfaces

Instead, substitution is the purview of Go’s interfaces. In Go, types are not required to nominate that they implement a particular interface, instead any type implements an interface simply provided it has methods whose signature matches the interface declaration.

We say that in Go, interfaces are satisfied implicitly, rather than explicitly, and this has a profound impact on how they are used within the language.

Well designed interfaces are more likely to be small interfaces; the prevailing idiom is an interface contains only a single method. It follows logically that small interfaces lead to simple implementations, because it is hard to do otherwise. Which leads to packages comprised of simple implementations connected by common behaviour.

io.Reader

type Reader interface {
        // Read reads up to len(buf) bytes into buf.
        Read(buf []byte) (n int, err error)
}

Which brings me to io.Reader, easily my favourite Go interface.

The io.Reader interface is very simple; Read reads data into the supplied buffer, and returns to the caller the number of bytes that were read, and any error encountered during read. It seems simple but it’s very powerful.

Because io.Reader‘s deal with anything that can be expressed as a stream of bytes, we can construct readers over just about anything; a constant string, a byte array, standard in, a network stream, a gzip’d tar file, the standard out of a command being executed remotely via ssh.

And all of these implementations are substitutable for one another because they fulfil the same simple contract.

So the Liskov substitution principle, applied to Go, could be summarised by this lovely aphorism from the late Jim Weirich.

Require no more, promise no less.
–Jim Weirich

And this is a great segue into the fourth SOLID principle.

Interface Segregation Principle

The fourth principle is the interface segregation principle, which reads:

Clients should not be forced to depend on methods they do not use.
–Robert C. Martin

In Go, the application of the interface segregation principle can refer to a process of isolating the behaviour required for a function to do its job. As a concrete example, say I’ve been given a task to write a function that persists a Document structure to disk.

// Save writes the contents of doc to the file f.
func Save(f *os.File, doc *Document) error

I could define this function, let’s call it Save, which takes an *os.File as the destination to write the supplied Document. But this has a few problems.

The signature of Save precludes the option to write the data to a network location. Assuming that network storage is likely to become requirement later, the signature of this function would have to change, impacting all its callers.

Because Save operates directly with files on disk, it is unpleasant to test. To verify its operation, the test would have to read the contents of the file after being written. Additionally the test would have to ensure that f was written to a temporary location and always removed afterwards.

*os.File also defines a lot of methods which are not relevant to Save, like reading directories and checking to see if a path is a symlink. It would be useful if the signature of our Save function could describe only the parts of *os.File that were relevant.

What can we do about these problems?

// Save writes the contents of doc to the supplied ReadWriterCloser.
func Save(rwc io.ReadWriteCloser, doc *Document) error

Using io.ReadWriteCloser we can apply the Interface Segregation Principle to redefine Save to take an interface that describes more general file-shaped things.

With this change, any type that implements the io.ReadWriteCloser interface can be substituted for the previous *os.File. This makes Save both broader in its application, and clarifies to the caller of Save which methods of the *os.File type are relevant to its operation.

As the author of Save I no longer have the option to call those unrelated methods on *os.File as it is hidden behind the io.ReadWriteCloser interface. But we can take the interface segregation principle a bit further.

Firstly, it is unlikely that if Save follows the single responsibility principle, it will read the file it just wrote to verify its contents–that should be responsibility of another piece of code. So we can narrow the specification for the interface we pass to Save to just writing and closing.

// Save writes the contents of doc to the supplied WriteCloser.
func Save(wc io.WriteCloser, doc *Document) error

Secondly, by providing Save with a mechanism to close its stream, which we inherited in a desire to make it look like a file shaped thing, this raises the question of under what circumstances will wc be closed. Possibly Save will call Close unconditionally, or perhaps Close will be called in the case of success.

This presents a problem for the caller of Save as it may want to write additional data to the stream after the document is written.

type NopCloser struct {
        io.Writer
}

// Close has no effect on the underlying writer.
func (c *NopCloser) Close() error { return nil }

A crude solution would be to define a new type which embeds an io.Writer and overrides the Close method, preventing Save from closing the underlying stream.

But this would probably be a violation of the Liskov Substitution Principle, as NopCloser doesn’t actually close anything.

// Save writes the contents of doc to the supplied Writer.
func Save(w io.Writer, doc *Document) error

A better solution would be to redefine Save to take only an io.Writer, stripping it completely of the responsibility to do anything but write data to a stream.

By applying the interface segregation principle to our Save function, the results has simultaneously been a function which is the most specific in terms of its requirements–it only needs a thing that is writable–and the most general in its function, we can now use Save to save our data to anything which implements io.Writer.

A great rule of thumb for Go is accept interfaces, return structs.
–Jack Lindamood

Stepping back a few paces, this quote is an interesting meme that has been percolating in the Go zeitgeist over the last few years.

This tweet sized version lacks nuance, and this is not Jack’s fault, but I think it represents one of the first piece of defensible Go design lore.

Dependency Inversion Principle

The final SOLID principle is the dependency inversion principle, which states:

High-level modules should not depend on low-level modules. Both should depend on abstractions.
Abstractions should not depend on details. Details should depend on abstractions.
–Robert C. Martin

But what does dependency inversion mean, in practice, for Go programmers?

If you’ve applied all the principles we’ve talked about up to this point then your code should already be factored into discrete packages, each with a single well defined responsibility or purpose. Your code should describe its dependencies in terms of interfaces, and those interfaces should be factored to describe only the behaviour those functions require. In other words, there shouldn’t be much left to do.

So what I think Martin is talking about here, certainly the context of Go, is the structure of your import graph.

In Go, your import graph must be acyclic. A failure to respect this acyclic requirement is grounds for a compilation failure, but more gravely represents a serious error in design.

All things being equal the import graph of a well designed Go program should be a wide, and relatively flat, rather than tall and narrow. If you have a package whose functions cannot operate without enlisting the aid of another package, that is perhaps a sign that code is not well factored along package boundaries.

The dependency inversion principle encourages you to push the responsibility for the specifics, as high as possible up the import graph, to your main package or top level handler, leaving the lower level code to deal with abstractions–interfaces.

SOLID Go Design

To recap, when applied to Go, each of the SOLID principles are powerful statements about design, but taken together they have a central theme.

The Single Responsibility Principle encourages you to structure the functions, types, and methods into packages that exhibit natural cohesion; the types belong together, the functions serve a single purpose.

The Open / Closed Principle encourages you to compose simple types into more complex ones using embedding.

The Liskov Substitution Principle encourages you to express the dependencies between your packages in terms of interfaces, not concrete types. By defining small interfaces, we can be more confident that implementations will faithfully satisfy their contract.

The Interface Substitution Principle takes that idea further and encourages you to define functions and methods that depend only on the behaviour that they need. If your function only requires a parameter of an interface type with a single method, then it is more likely that this function has only one responsibility.

The Dependency Inversion Principle encourages you move the knowledge of the things your package depends on from compile time–in Go we see this with a reduction in the number of import statements used by a particular package–to run time.

If you were to summarise this talk it would probably be; interfaces let you apply the SOLID principles to Go programs.

Because interfaces let Go programmers describe what their package provides–not how it does it. This is all just another way of saying “decoupling”, which is indeed the goal, because software that is loosely coupled is software that is easier to change.

As Sandi Metz notes:

Design is the art of arranging code that needs to work today, and to be easy to change forever.
–Sandi Metz

Because if Go is going to be a language that companies invest in for the long term, the maintenance of Go programs, the ease of which they can change, will be a key factor in their decision.

Coda

In closing, let’s return to the question I opened this talk with; How many Go programmers are there in the world? This is my guess:

By 2020, there will be 500,000 Go developers.
-me

What will half a million Go programmers do with their time? Well, obviously, they’ll write a lot of Go code and, if we’re being honest, not all of it will be good, and some will be quite bad.

Please understand that I do not say this to be cruel, but, every one of you in this room with experience with development in other languages–the languages you came from, to Go–knows from your own experience that there is an element of truth to this prediction.

Within C++, there is a much smaller and cleaner language struggling to get out.
–Bjarne Stroustrup, The Design and Evolution of C++

The opportunity for all Go programmers to make our language a success hinges directly on our collective ability to not make such a mess of things that people start to talk about Go the way that they joke about C++ today.

The narrative that derides other languages for being bloated, verbose, and overcomplicated, could one day well be turned upon Go, and I don’t want to see this happen, so I have a request.

Go programmers need to start talking less about frameworks, and start talking more about design. We need to stop focusing on performance at all cost, and focus instead on reuse at all cost.

What I want to see is people talking about how to use the language we have today, whatever its choices and limitations, to design solutions and to solve real problems.

What I want to hear is people talking about how to design Go programs in a way that is well engineered, decoupled, reusable, and above all responsive to change.

… one more thing

Now, it’s great that so many of you are here today to hear from the great lineup of speakers, but the reality is that no matter how large this conference grows, compared to the number of people who will use Go during its lifetime, we’re just a tiny fraction.

So we need to tell the rest of the world how good software should be written. Good software, composable software, software that is amenable to change, and show them how to do it, using Go. And this starts with you.

I want you to start talking about design, maybe use some of the ideas I presented here, hopefully you’ll do your own research, and apply those ideas to your projects. Then I want you to:

  • Write a blog post about it.
  • Teach a workshop about it what you did.
  • Write a book about what you learnt.
  • And come back to this conference next year and give a talk about what you achieved.

Because by doing these things we can build a culture of Go developers who care about programs that are designed to last.

Thank you.

Transistor logic fundamentals

Long time readers of this blog will know that when I’m not shilling for the Go language, my hobbies include electronics and retro computing. For me, projects like James Newman’s Megaprocessor, a computer built entirely from discrete components, is about as good as it gets.

James has recently finished construction of the Megaprocessor and has started to document it on YouTube, you should totally check it out. But this post isn’t about the Megaprocessor.

When I subscribed to the Megaprocessor channel on YouTube I discovered James has produced another series of videos focused on the fundamentals of implementing digital logic with transistors. In the three videos embedded below, James lays out the foundations of digital logic.

The first video describes (in James’ wonderfully understated manner) the operation of the simplest digital logic circuit; a voltage controlled inverter built with one transistor1.

In the second video, James adds a second transistor in series with the first and demonstrates the implementation of the NAND (Not AND) function2.

In the third video, by reorganising the transistors in parallel, James shows the circuit now implements the logical NOR (Not OR) function.

… and that’s it. There are more videos in James’ Stepping Stones video series, but with these three operations, NOT (inversion), NAND, and NOR, any combination of digital logic of any size can be created, as the Megaprocessor shows3.

Why is this important?

The circuits described in this set of videos feature far fewer transistors than you would find in real processor, but they are not simplified. The circuits described in this video were used in mainframe computers in the 1960’s and formed the basis for the integrated microprocessors of the 1970’s.

In these three videos James describes the entire foundation for contemporary computation. No matter how many layers of operating systems, networks, and source code abstraction you build on top, the fundamentals of computation and digital logic remain as simple as these three videos.

Notes and further reading

If you’re interested in learning more, here are a few suggestions for your own research.

  1. If you have no background in electronics a simple analogy for the relationships between voltage, current, and resistance is water flowing through a pipe. In this analogy, voltage represents water pressure, pushing water through the pipe. Current is the water itself. The volume of water in the pipe is a property of both the diameter of the pipe, and any resistance which may cause segments of the pipe to be less than full. Resistance, the final property, is any constriction or obstruction of the pipe. The higher the resistance, the more the pipe is constricted, reducing the amount of water (current) flowing through it.
  2. James’ tutorials use discrete TTL logic. TTL stands for Transistor to Transistor Logic, introduced in the early 1960’s by Sylvania (yes, the lightbulb makers). Before TTL there were at least two other forms of digital logic, what were they, and why did they succumb to TTL?
  3. James’ tutorials, and the Megaprocessor itself, use NPN transistors. If the Megaprocessor was shrunk down to a single integrated circuit it would most likely be implemented using NMOS logic. NMOS was very popular in the 70’s and early 80’s but has since given way to CMOS logic. What are the differences between NMOS and CMOS and why would James have chosen NMOS to implement the Megaprocessor?

Automatically fetch your project’s dependencies with gb

gb has been in development for just over a year now. Since the announcement in May 2015 the project has received over 1,600 stars, produced 16 releases, and attracted 41 contributors.

Thanks to a committed band of early adopters, gb has grown to be a usable day to day replacement for the go tool. But, there is one area where gb has not lived up to my hopes, and that is dependency management.

gb’s $PROJECT/vendor/ directory was the inspiration for the go tool’s vendor/ directory (although their implementations differ greatly) and has delivered on its goal of reproducible builds for Go projects. However, the success of gb’s project based model, and vendoring code in general, has a few problems. Specifically, wholesale copying (or forking if you prefer) of one code base into another continues to sidestep the issue of adoption of a proper release and versioning culture amongst Go developers.

To be fair, for Go developers using the tools they have access to today–including gb–there is no incentive to release their code. As a Go package author, you get no points for doing proper versioned releases if your build tool just pulls from HEAD anyway. There is similarly limited value in adopting a version numbering policy like SemVer if your tools only memorise the git revision you last copied your code at.

A second problem, equally poorly served by gb or the vendor/ support in the go tool, are developers and projects who cannot, usually for legal reasons, or do not wish to, copy code wholesale into their project. Suggestions of using git submodules have been soundly dismissed as unworkable.

With the release of gb 0.4.3, there is a new way to manage dependencies with gb. This new method does not replace gb vendor or $PROJECT/vendor as the recommended method for achieving reproducible builds, but it does acknowledge that vendoring is not appropriate for all use cases.

To be clear, this new mode of managing dependencies does not supersede or deprecate the existing mechanisms of cloning source code into $PROJECT/vendor. The automatic download feature is optional and is activated by the project author creating a file in their project’s root called, $PROJECT/depfile.

If you have a gb project that is currently vendoring code, or you’re using gb vendor restore to actively avoid cloning code into your project, you can try this feature today, with the following caveats:

  1. Currently only GitHub is supported. This is because the new facility uses the GitHub API to download release tarballs via https. Vanity urls that redirect to GitHub are also not supported yet, but will be supported soon.
  2. The repository must have made a release of its code, and that release must be tagged with a tag containing a valid SemVer 2.0.0 version number. The format of the tag is described in this proposal. If a dependency you want to consume in your gb project has not released their code, then please ask them to do so.

Polishing this feature will be the remainder of the 0.4.x development series. After this work is complete gb vendor will be getting some attention. Ultimately both gb vendor and $PROJECT/depfile do the same thing–one copies the source of your dependencies into your project, the other into your home directory.

Gophers, please tag your releases

What do we want? Version management for Go packages! When do we want it? Yesterday!

What does everyone want? We want our Go build tool of choice to fetch the latest stable version when you start using the package in your project. We want them to grab security updates and bug fixes automatically, but not upgrade to a version where the author deleted a method you were using.

But as it stands, today, in 2016, there is no way for a human, or a tool, to look at an arbitrary git (or mercurial, or bzr, etc) repository of Go code and ask questions like:

  • What versions of this project have been released?
  • What is the latest stable release of this software?
  • If I have version 1.2.3, is there a bugfix or security update that I should apply?

The reason for this is Go projects (repositories of Go packages) don’t have versions, at least not in the way that our friends in other languages use that word. Go projects do not have versions because there is no formalised release process.

But there’s vendor/ right?

Arguing about tools to manage your vendor/ directory, or which markup format a manifest file should be written in is eating the elephant from the wrong end.

Before you can argue about the format of a file that records the version of a package, you have to have some way of actually knowing what that version is. A version number has to be sortable, so you can ask, “is there a newer version available than the one you have on disk?” Ideally the version number should give you a clue to how large the jump between versions is, perhaps even give a clue to backwards or forwards compatibility between two versions.

SemVer is no one’s favourite, yet one format is everyone’s favourite.

I recommended that Go projects adopt SemVer 2.0.0. It’s a sound standard, it is well understood by many, not just Go programmers, and semantic versioning will let people write tools to build a dependency management ecosystem on top of a minimal release process.

Following the lead of the big three Go projects, Docker, Kubernetes, and CoreOS (and GitHub’s on releases page), the format of the tag must be:

v<SemVer>

That is, the letter v followed by a string which is SemVer 2.0.0 compliant. Here are some examples:

git tag -a v1.2.3
git tag -a v0.1.0
git tag -a v1.0.0-rc.1

Here are some incorrect examples:

git tag -a 1.2.3        // missing v prefix
git tag -a v1.0         // 1.0 is not SemVer compliant
git tag -a v2.0.0beta3  // also not SemVer compliant

Of course, if you’re using hg, bzr, or another version control system, please adjust as appropriate. This isn’t just for git or GitHub repos.

What do you get for this?

Imagine if godoc.org could show you the documentation for the version of the package you’re using, not just the latest from HEAD.

Now, imagine if godoc.org could not just show you the documentation, but also serve you a tarball or zip file of the source code of that version. Imagine not having to install mercurial just to go get that one dependency that is still on google code (rest in peace), or bitbucket in hg form.

Establishing a single release process for Go projects and adopting semantic versioning will let your favourite Go package management or vendoring tool provide you things like a real upgrade command. Instead of letting you figure out which revision to switch to, SemVer gives tool writers the ability to do things like upgrade a dependency to the latest patch release of version 1.2.

Build it and they will come

Tagging releases is pointless if people don’t write tools to consume the information. Just like writing tools that can, at the moment, only record git hashes is pointless.

Here’s the deal: If you release your Go projects with the correctly formatted tags, then there are a host of developers who are working dependency management tools for Go packages that want to consume this information.

How can I declare which versions of other packages my project depends on?

If you’ve read this far you are probably wondering how using tagging releases in your own repository is going to help specify the versions of your Go project’s dependencies.

The Go import statement doesn’t contain this version information, all it has is the import path. But whether you’re in the camp that wants to add version information to the import statement, a comment inside the source file, or you would prefer to put that information in a metadata file, everyone needs version information, and that starts with tagging your release of your Go projects.

No version information, no tools, and the situation never improves. It’s that simple.