Tag Archives: testing

How to write benchmarks in Go

This post continues a series on the testing package I started a few weeks back. You can read the previous article on writing table driven tests here. You can find the code mentioned below in the https://github.com/davecheney/fib repository.

Introduction

The Go testing package contains a benchmarking facility that can be used to examine the performance of your Go code. This post explains how to use the testing package to write a simple benchmark.

You should also review the introductory paragraphs of Profiling Go programs, specifically the section on configuring power management on your machine. For better or worse, modern CPUs rely heavily on active thermal management which can add noise to benchmark results.

Writing a benchmark

We’ll reuse the Fib function from the previous article.

func Fib(n int) int {
        if n < 2 {
                return n
        }
        return Fib(n-1) + Fib(n-2)
}

Benchmarks are placed inside _test.go files and follow the rules of their Test counterparts. In this first example we’re going to benchmark the speed of computing the 10th number in the Fibonacci series.

// from fib_test.go
func BenchmarkFib10(b *testing.B) {
        // run the Fib function b.N times
        for n := 0; n < b.N; n++ {
                Fib(10)
        }
}

Writing a benchmark is very similar to writing a test as they share the infrastructure from the testing package. Some of the key differences are

  • Benchmark functions start with Benchmark not Test.
  • Benchmark functions are run several times by the testing package. The value of b.N will increase each time until the benchmark runner is satisfied with the stability of the benchmark. This has some important ramifications which we’ll investigate later in this article.
  • Each benchmark must execute the code under test b.N times. The for loop in BenchmarkFib10 will be present in every benchmark function.

Running benchmarks

Now that we have a benchmark function defined in the tests for the fib package, we can invoke it with go test -bench=.

% go test -bench=.
PASS
BenchmarkFib10   5000000               509 ns/op
ok      github.com/davecheney/fib       3.084s

Breaking down the text above, we pass the -bench flag to go test supplying a regular expression matching everything. You must pass a valid regex to -bench, just passing -bench is a syntax error. You can use this property to run a subset of benchmarks.

The first line of the result, PASS, comes from the testing portion of the test driver, asking go test to run your benchmarks does not disable the tests in the package. If you want to skip the tests, you can do so by passing a regex to the -run flag that will not match anything. I usually use

go test -run=XXX -bench=.

The second line is the average run time of the function under test for the final value of b.N iterations. In this case, my laptop can execute Fib(10) in 509 nanoseconds. If there were additional Benchmark functions that matched the -bench filter, they would be listed here.

Benchmarking various inputs

As the original Fib function is the classic recursive implementation, we’d expect it to exhibit exponential behavior as the input grows. We can explore this by rewriting our benchmark slightly using a pattern that is very common in the Go standard library.

func benchmarkFib(i int, b *testing.B) {
        for n := 0; n < b.N; n++ {
                Fib(i)
        }
}

func BenchmarkFib1(b *testing.B)  { benchmarkFib(1, b) }
func BenchmarkFib2(b *testing.B)  { benchmarkFib(2, b) }
func BenchmarkFib3(b *testing.B)  { benchmarkFib(3, b) }
func BenchmarkFib10(b *testing.B) { benchmarkFib(10, b) }
func BenchmarkFib20(b *testing.B) { benchmarkFib(20, b) }
func BenchmarkFib40(b *testing.B) { benchmarkFib(40, b) }

Making benchmarkFib private avoids the testing driver trying to invoke it directly, which would fail as its signature does not match func(*testing.B). Running this new set of benchmarks gives these results on my machine.

BenchmarkFib1   1000000000               2.84 ns/op
BenchmarkFib2   500000000                7.92 ns/op
BenchmarkFib3   100000000               13.0 ns/op
BenchmarkFib10   5000000               447 ns/op
BenchmarkFib20     50000             55668 ns/op
BenchmarkFib40         2         942888676 ns/op

Apart from confirming the exponential behavior of our simplistic Fib function, there are some other things to observe in this benchmark run.

  • Each benchmark is run for a minimum of 1 second by default. If the second has not elapsed when the Benchmark function returns, the value of b.N is increased in the sequence 1, 2, 5, 10, 20, 50, … and the function run again.
  • The final BenchmarkFib40 only ran two times with the average was just under a second for each run. As the testing package uses a simple average (total time to run the benchmark function over b.N) this result is statistically weak. You can increase the minimum benchmark time using the -benchtime flag to produce a more accurate result.
    % go test -bench=Fib40 -benchtime=20s
    PASS
    BenchmarkFib40        50         944501481 ns/op

Traps for young players

Above I mentioned the for loop is crucial to the operation of the benchmark driver. Here are two examples of a faulty Fib benchmark.

func BenchmarkFibWrong(b *testing.B) {
        for n := 0; n < b.N; n++ {
                Fib(n)
        }
}

func BenchmarkFibWrong2(b *testing.B) {
        Fib(b.N)
}

On my system BenchmarkFibWrong never completes. This is because the run time of the benchmark will increase as b.N grows, never converging on a stable value. BenchmarkFibWrong2 is similarly affected and never completes.

A note on compiler optimisations

Before concluding I wanted to highlight that to be completely accurate, any benchmark should be careful to avoid compiler optimisations eliminating the function under test and artificially lowering the run time of the benchmark.

var result int

func BenchmarkFibComplete(b *testing.B) {
        var r int
        for n := 0; n < b.N; n++ {
                // always record the result of Fib to prevent
                // the compiler eliminating the function call.
                r = Fib(10)
        }
        // always store the result to a package level variable
        // so the compiler cannot eliminate the Benchmark itself.
        result = r
}

Conclusion

The benchmarking facility in Go works well, and is widely accepted as a reliable standard for measuring the performance of Go code. Writing benchmarks in this manner is an excellent way of communicating a performance improvement, or a regression, in a reproducible way.

Stress test your Go packages

This is a short post on stress testing your Go packages. Concurrency or memory correctness errors are more likely to show up at higher concurrency levels (higher values of GOMAXPROCS). I use this script when testing my packages, or when reviewing code that goes into the standard library.

#!/usr/bin/env bash -e

go test -c
# comment above and uncomment below to enable the race builder
# go test -c -race
PKG=$(basename $(pwd))

while true ; do 
        export GOMAXPROCS=$[ 1 + $[ RANDOM % 128 ]]
        ./$PKG.test $@ 2>&1
done

I keep this script in $HOME/bin so usage is

$ cd $SOMEPACKAGE
$ stress.bash
PASS
PASS
PASS

You can pass additional arguments to your test binary on the command line,

stress.bash -test.v -test.run=ThisTestOnly

The goal is to be able to run the stress test for as long as you want without a test failure. Once you achieve that, uncomment go test -c -race and try again.

Writing table driven tests in Go

This article is intended as a short introduction to the mechanics and syntax of writing a table driven test in Go. Supporting this article is a small repository, https://github.com/davecheney/fib, which contains all the code mentioned below.

Introduction

I enjoy writing table driven tests in Go. While not unique to the language, table driven tests leverage several features, composite literals and anonymous structs, to allow you to write related tests in a compact form.

As an example, please consider the case of testing this overused function

package fib

// Fib returns the nth number in the Fibonacci series.
func Fib(n int) int {
        if n < 2 {
                return n
        }
        return Fib(n-1) + Fib(n-2)
}

The table structure

At the heart of all table driven tests is the table itself, which provides the inputs and expected results of the function under test. In most cases the table is a slice of anonymous structs, which allows the table to be written in a compact form.

var fibTests = []struct {
        n        int // input
        expected int // expected result
}{
        {1, 1},
        {2, 1},
        {3, 2},
        {4, 3},
        {5, 5},
        {6, 8},
        {7, 13},
}

If you wished, you could give the struct a name, in which case the table definition would look something like this

type fibTest struct {
        n        int
        expected int
}

var fibTests = []fibTest {
        {1, 1}, {2, 1}, {3, 2}, {4, 3}, {5, 5}, {6, 8}, {7, 13},
}

Hooking it up

Now that we have the table of inputs and results defined, we need to write a driver function to iterate through the inputs and compare the results to their expected value. Rather than one Test function per set of values, we can use the range clause to loop over each test case.

func TestFib(t *testing.T) {
        for _, tt := range fibTests {
                actual := Fib(tt.n)
                if actual != tt.expected {
                        t.Errorf("Fib(%d): expected %d, actual %d", tt.n, tt.expected, actual)
                }
        }
}

In this example we range over all the fibTests defined above, assigning their value in turn to tt. We then call Fib passing in the value of tt.n and compare the result, stored in actual, with the value of tt.expected.

The use of the names actual and expected show my JUnit heritage, others may prefer names like want and got. You should choose something that works for you and gives a clear meaning in your test code.

The use of t.Errorf instead of t.Fatalf is a personal preference. As Fib is a pure function it is safe to continue the loop after a failure. I find this generally reduces test whack-a-mole by returning all the failures at once.

Conclusion

In my introduction I said that table driven tests are one my favorite parts of the Go language. They allow you to write unit tests in a concise fashion, hopefully leading to greater test coverage at a lower line count. If done correctly, adding additional test cases is as simple as a new element in the test table.

This is certainly not the only way that tests could be written in Go, nor the only way to write table driven tests. The Go standard library contains many examples of this form of testing which are worth studying. In particular I suggest the tests for the math and time packages are an excellent starting point.

At the other end of the spectrum is this table driven test of the Juju status command which defines its own language of helpers to populate the table structure. Although this elevates the table driven test to ninja levels, it still contains the same components and concepts, and right at the bottom of the file you’ll find a simple function driving each test.