Tag Archives: compiler

How to dump the GOSSAFUNC graph for a method

The Go compiler’s SSA backend contains a facility to produce HTML debugging output of the compilation phases. This post covers how to print the SSA output for function and methods.

Let’s start with a sample program which contains a function, a value method, and a pointer method:

package main

import (

type Numbers struct {
    vals []int

func (n *Numbers) Add(v int) {
    n.vals = append(n.vals, v)

func (n Numbers) Average() float64 {
    sum := 0.0
    for _, num := range n.vals {
        sum += float64(num)
    return sum / float64(len(n.vals))

func main() {
    var numbers Numbers

Control of the SSA debugging output is via the GOSSAFUNC environment variable. This variable takes the name of the function to dump. This is not the functions fully qualified name. For func main above the name of the function is main not main.main.

% env GOSSAFUNC=main go build
dumped SSA to ../../go/src/runtime/ssa.html
dumped SSA to ./ssa.html

In this example GOSSAFUNC=main matched both main.main and a function called runtime.main.1 This is a little unfortunate, but in practice probably not a big deal as, if you’re performance tuning your code, it won’t be in a giant spaghetti blob in func main.

What is more likely is your code will be in a method, so you’ve probably landed on this post looking for the correct incantation to dump the SSA output for a method.

To print the SSA debug for the pointer method func (n *Numbers) Add, the equivalent function name is(*Numbers).Add:2

% env "GOSSAFUNC=(*Numbers).Add" go build
dumped SSA to ./ssa.html

To print the SSA debug for a value method func (n Numbers) Average, the equivalent function name is (*Numbers).Average even though this is a value method:

% env "GOSSAFUNC=(*Numbers).Average" go build
dumped SSA to ./ssa.html

Diamond interface composition in Go 1.14

Per the overlapping interfaces proposal, Go 1.14 now permits embedding of interfaces with overlapping method sets. This is a brief post explain what this change means:

Let’s start with the definition of the three key interfaces from the io package; io.Reader, io.Writer, and io.Closer:

package io

type Reader interface {
    Read([]byte) (int, error)

type Writer interface {
    Write([]byte) (int, error)

type Closer interface {
    Close() error

Just as embedding a type inside a struct allows the embedded type’s fields and methods to be accessed as if it were declared on the embedding type1, the process is true for interfaces. Thus there is no difference between explicitly declaring

type ReadCloser interface {
    Read([]byte) (int, error)
    Close() error

and using embedding to compose the interface

type ReadCloser interface {

You can even mix and match

type WriteCloser interface {
    Write([]byte) (int, error)

However, prior to Go 1.14, if you continued to compose interface declarations in this manner you would likely find that something like this,

type ReadWriteCloser interface {

would fail to compile

% go build interfaces.go
./interfaces.go:27:2: duplicate method Close

Fortunately, with Go 1.14 this is no longer a limitation, thus solving problems that typically occur with diamond-shaped embedding graphs.

However, there is a catch that I ran into attempting to demonstrate this feature to the local user group–this feature is only enabled when the Go compiler uses the 1.14 (or later) spec.

As near as I can make out the rules for which version of the Go spec is used during compilation appear to be:

  1. If your source code is stored inside GOPATH (or you have disabled modules with GO111MODULE=off) then the version of the Go spec used to compile with matches the version of the compiler you are using. Said another way, if you have Go 1.13 installed, your Go version is 1.13. If you have Go 1.14 installed, your version is 1.14. No surprises here.
  2. If your source code is stored outside GOPATH (or you have forced modules on with GO111MODULE=on) then the go tool will take the Go version from the go.mod file.
  3. If there is no Go version listed in go.mod then the version of the spec will be the version of Go installed. This is identical to point 1.
  4. If you are in module mode, either by being outside GOPATH or with GO111MODULE=on, but there is no go.mod file in the current, or any parent, directory then the version of the Go spec used to compile your code defaults to Go 1.13.

The last point caught me out.

Go 1.7 toolchain improvements

This is a progress report on the Go toolchain improvements during the 1.7 development cycle. All measurements were taken using a Thinkpad x220, Core i5-2520M, running Ubuntu 14.04 linux.

Faster compilation

Since Go 1.5, when the compiler itself was translated from C to Go, compile times are slower than they used to be. Everyone knows it, nobody is happy about it, and we’re working on fixing it.

A huge amount of effort in the 1.7 cycle has gone into reducing the amount of memory and the wall time the compiler uses for various benchmark jobs. The results so far are:

Compile time for full build relative to Go 1.4.3

Compile time for full build relative to Go 1.4.3

Previously I reported the current 1.7 compiler was 2x slower than Go 1.4.3 for the Jujud test. After re-benchmarking everything for this post, the slowdown is closer to 2.2x. Jujud is the largest of the three benchmarks–512 packages, vs 304 and 102 packages respectively–and shows the largest slowdown.

The cmd/go result is a little misleading as the code being compiled changes with every release, vs the fixed codebases of the other benchmarks.

Note: The benchmark scripts for jujud, kube-controller-manager, and gogs are online. Please try them yourself and report your findings.

Improved linker performance

A significant part of the build time improvements observed above come from improvements to the linker. Relative to Go 1.6, the linker is now roughly 66% faster.

Link time relative to Go 1.4.3

Link time relative to Go 1.4.3

Relative to Go 1.4.3, linking is 10% faster for any non trivial binary, and up to 30% faster for large binaries like jujud. These figures are for ELF targets only, Mach-o and PE targets have not improved as much.

This isn’t just useful for building final binaries. A faster linker improves the edit/compile/test cycle as each test is itself a program that must be linked and run. Anecdotally the linker now uses a third less memory, which is valuable when linking large binaries.

Smaller binaries

With code generation and linker improvements, binaries produced by the tip compiler are substantially smaller than Go 1.6. This work has been spearheaded by David Crawshaw.

Binary sizes

Binary size (bytes)

At this point, with the exception of its own cmd/go tool, Go 1.7 produces smaller binaries than Go 1.4.3.

Code generation improvements

The big feature for the Go 1.7 cycle is the new SSA backend for 64bit Intel.

While not the focus of this post, it would be remiss not to include some information about performance improvements in compiled code (not just the compiler’s code). Stressing of course that these are preliminary figures as there is still four months to go before the new backend becomes the default for amd64.

Go 1 benchmarks, Go 1.6 vs 683448a

Go 1 benchmarks, Go 1.6 vs 683448a

These numbers match the figures reported by Keith Randall a few weeks ago, and are in line with his thesis in the SSA design doc.

I think it would be fairly easy to make the generated programs 20% smaller and 10% faster. — khr

These improvements are not just the work of the SSA backend. The standard library and garbage collector continue to see improvements, including a 20% improvement to the fmt package by Martin Möhrmann. These benefits flow to all the platforms that Go supports.

The sole regression above is caused by a current limitation in the register optimiser which manages to registerise one less variable in the Mandelbrot inner loop.

Looking ahead

According to the release schedule, approximately one month remains before the 1.7 change window closes and the dev cycle enters the bug fix phase. There is still lots of work to do, but the improvements so far will easily make Go 1.7 the best Go release to date.

How does the go build command work ?

This post explains how the Go build process works using examples from Go’s standard library.

The gc toolchain

This article focuses on the gc toolchain. The gc toolchain takes its name for the Go compiler frontend, cmd/gc, and is mainly used to distinguish it from the gccgo toolchain. When people talk about the Go compilers, they are likely referring to the gc toolchain. This article won’t be focusing on the gccgo toolchain.

The gc toolchain is a direct descendant of the Plan 9 toolchain. The toolchain consists of a Go compiler, a C compiler, an assembler and a linker. Each of these tools are found in the src/cmd/ subdirectory of the Go source and consist of a frontend, which is shared across all implementations, and a backend which is specific to the processor architecture. The backends are distinguished by their letter, which is again a Plan 9 tradition. The commands are:

  • 5g, 6g, and 8g are the compilers for .go files for arm, amd64 and 386
  • 5c, 6c, and 8c are the compilers for .c files for arm, amd64 and 386
  • 5a, 6a, and 8a are the assemblers for .s files for arm, and64 and 386
  • 5l, 6l, and 8l are the linkers for files produced by the commands above, again for arm, amd64 and 386.

It should be noted that each of these commands can be compiled on any supported platform and this forms the basis of Go’s cross compilation abilities. You can read more about cross compilation in this article.

Building packages

Building a Go package involves at least two steps, compiling the .go files then packing the results into an archive. In this example I’m going to use crypto/hmac as it is a small package, only one source and one test file. Using the -x option I’ve asked go build to print out every step as it executes them

% go build -x crypto/hmac
mkdir -p $WORK/crypto/hmac/_obj/
mkdir -p $WORK/crypto/
cd /home/dfc/go/src/pkg/crypto/hmac
/home/dfc/go/pkg/tool/linux_arm/5g -o $WORK/crypto/hmac/_obj/_go_.5 -p crypto/hmac -complete -D _/home/dfc/go/src/pkg/crypto/hmac -I $WORK ./hmac.go
/home/dfc/go/pkg/tool/linux_arm/pack grcP $WORK $WORK/crypto/hmac.a $WORK/crypto/hmac/_obj/_go_.5

Stepping through each of these steps

mkdir -p $WORK/crypto/hmac/_obj/
mkdir -p $WORK/crypto/

go build creates a temporary directory, /tmp/go-build249279931 and populates it with some skeleton subdirectories to hold the results of the compilation. The second mkdir may be redundant, issue 6538 has been created to track this.

cd /home/dfc/go/src/pkg/crypto/hmac
/home/dfc/go/pkg/tool/linux_arm/5g -o $WORK/crypto/hmac/_obj/_go_.5 -p crypto/hmac -complete -D _/home/dfc/go/src/pkg/crypto/hmac -I $WORK ./hmac.go

The go tool switches the the source directory of crypto/hmac and invokes the go compiler for this architecture, in this case 5g. In reality there is no cd, /home/dfc/go/src/pkg/crypto/hmac is the supplied as the exec.Command.Dir field when 5g is executed. This means the .go source files can be relative to their source directory, making the command line shorter.

The compiler produces a single temporary output file in $WORK/crypto/hmac/_obj/_go_.5 which will be used in the final step.

/home/dfc/go/pkg/tool/linux_arm/pack grcP $WORK $WORK/crypto/hmac.a $WORK/crypto/hmac/_obj/_go_.5

The final step is to pack the object file into an archive file, .a, which the linker and the compiler consume.

Because we invoked go build on a package, the result is discarded as $WORK is deleted after the build completes. If we invoke go install -x two additional lines appear in the output

mkdir -p /home/dfc/go/pkg/linux_arm/crypto/
cp $WORK/crypto/hmac.a /home/dfc/go/pkg/linux_arm/crypto/hmac.a

This demonstrates the difference between go build and install; build builds, install builds then installs the result to be used by other builds.

Building more complex packages

You may be wondering what the pack step in the previous example does. As the compiler and linker only accept a single file representing the contents of the package, if a package contains multiple object files, they must be packed into a single .a archive before they can be used.

A common example of a package producing more than one intermediary object file is cgo, but that is too complicated for this article, instead a simpler example is a package that contains some .s assembly files, like crypto/md5.

% go build -x crypto/md5
mkdir -p $WORK/crypto/md5/_obj/
mkdir -p $WORK/crypto/
cd /home/dfc/go/src/pkg/crypto/md5
/home/dfc/go/pkg/tool/linux_amd64/6g -o $WORK/crypto/md5/_obj/_go_.6 -p crypto/md5 -D _/home/dfc/go/src/pkg/crypto/md5 -I $WORK ./md5.go ./md5block_decl.go
/home/dfc/go/pkg/tool/linux_amd64/6a -I $WORK/crypto/md5/_obj/ -o $WORK/crypto/md5/_obj/md5block_amd64.6 -D GOOS_linux -D GOARCH_amd64 ./md5block_amd64.s
/home/dfc/go/pkg/tool/linux_amd64/pack grcP $WORK $WORK/crypto/md5.a $WORK/crypto/md5/_obj/_go_.6 $WORK/crypto/md5/_obj/md5block_amd64.6

In this example, executed on a linux/amd64 host, 6g is invoked to compile two .go files, md5.go and md5block_decl.go. The latter contains the forward declaration for the functions implemented in assembly.

6a is then invoked to assemble md5block_amd64.s. The logic for choosing which .s to compile is described in my previous article on conditional compilation.

Finally pack is invoked to pack the Go object file, _go_.6, and the assembly object file, md5block_amd64.6, into a single archive.

Building commands

A Go command is a package who’s name is main. Main packages, or commands, are compiled just like other packages, but then undergo several additional steps to be linked into final executable. Let’s investigate this process with cmd/gofmt

% go build -x cmd/gofmt
mkdir -p $WORK/cmd/gofmt/_obj/
mkdir -p $WORK/cmd/gofmt/_obj/exe/
cd /home/dfc/go/src/cmd/gofmt
/home/dfc/go/pkg/tool/linux_amd64/6g -o $WORK/cmd/gofmt/_obj/_go_.6 -p cmd/gofmt -complete -D _/home/dfc/go/src/cmd/gofmt -I $WORK ./doc.go ./gofmt.go ./rewrite.go ./simplify.go
/home/dfc/go/pkg/tool/linux_amd64/pack grcP $WORK $WORK/cmd/gofmt.a $WORK/cmd/gofmt/_obj/_go_.6
cd .
/home/dfc/go/pkg/tool/linux_amd64/6l -o $WORK/cmd/gofmt/_obj/exe/a.out -L $WORK $WORK/cmd/gofmt.a
cp $WORK/cmd/gofmt/_obj/exe/a.out gofmt

The first six lines should be familiar, main packages are compiled like any other Go package, they are even packed like any other package.

The difference is the penultimate line, which invokes the linker to produce a binary executable.

/home/dfc/go/pkg/tool/linux_amd64/6l -o $WORK/cmd/gofmt/_obj/exe/a.out -L $WORK $WORK/cmd/gofmt.a

The final line copies and renames the completed binary to its final name. If you had used go install the binary would be copied to $GOPATH/bin (or $GOBIN if set).

A little history

If you go far enough back in time, back before the go tool, back to the time of Makefiles, you can still find the core of the Go compilation process. This example is taken from the release.r60 documentation

$ cat >hello.go <<EOF
package main

import "fmt"

func main() {
        fmt.Printf("hello, world\n")
$ 6g hello.go
$ 6l hello.6
$ ./6.out
hello, world

It’s all here, 6g compiling a .go file into a .6 object file, 6l linking the object file against the fmt (and runtime) packages to produce a binary, 6.out.

Wrapping up

In this post we’ve talked about how go build works and touched on how go install differs in its treatment of the compilation result.

Now that you know how go build works, and how to investigate the build process with -x, try passing that flag to go test and observe the result.

Additionally, if you have gccgo installed on your system, you can pass -compiler gccgo to go build, and using -x investigate how Go code is built using this compiler.

Notes on exploring the compiler flags in the Go compiler suite

I’ve been doing some work improving the code generation of the 5g compiler, which is the Go compiler for arm. These notes also apply to the 6g and 8g compilers for amd64 and 386 respectively.

For this discussion we’ll use a very simple package.

package addr

func addr(s[]int) *int {
return &s[2]

To see the assembly produced by compiling this package we use the -S flag. -S can be passed directly to the compiler with go tool 5g -S addr.go, but it is simpler (and more portable) to use the -gcflags flag on the go tool itself.

% go build -gcflags=-S addr.go 
# command-line-arguments
--- prog list "addr" ---
0000 (/home/dfc/src/addr.go:3) TEXT addr+0(SB),$0-16
0001 (/home/dfc/src/addr.go:4) MOVW $s+0(FP),R0
0002 (/home/dfc/src/addr.go:4) MOVW 4(R0),R1
0003 (/home/dfc/src/addr.go:4) CMP $2,R1,
0004 (/home/dfc/src/addr.go:4) BHI ,6(APC)
0005 (/home/dfc/src/addr.go:4) BL ,runtime.panicindex+0(SB)
0006 (/home/dfc/src/addr.go:4) MOVW 0(R0),R0
0007 (/home/dfc/src/addr.go:4) ADD $8,R0
0008 (/home/dfc/src/addr.go:4) MOVW R0,.noname+12(FP)
0009 (/home/dfc/src/addr.go:4) RET ,

This is quite a lot of code for a one line function. One of the reasons for this is s is a slice, whose length is not known at compile time, so the compiler must insert a bounds check. We can tell the compiler to not emit bounds checks with the -B flag.

% go build -gcflags=-SB addr.go
# command-line-arguments
--- prog list "addr" ---
0000 (/home/dfc/src/addr.go:3) TEXT addr+0(SB),$0-16
0001 (/home/dfc/src/addr.go:4) MOVW $s+0(FP),R0
0002 (/home/dfc/src/addr.go:4) MOVW 0(R0),R0
0003 (/home/dfc/src/addr.go:4) ADD $8,R0
0004 (/home/dfc/src/addr.go:4) MOVW R0,.noname+12(FP)
0005 (/home/dfc/src/addr.go:4) RET ,

It is important to note that -B is an unsupported flag. The goal of Go is a safe language, one where array subscripts are bounds checked when they are not provably safe. Go already elides bounds checks when you use range loops, and future compilers will improve this. It is also important to note that none of the builders test -B so it might even generate incorrect code. In summary, when the compiler improves, -B will go away, so don’t get too attached.

One other interesting flag is -N, which will disable the optimisation pass in the compiler

% go build -gcflags=-SN addr.go
# command-line-arguments
--- prog list "addr" ---
0000 (/home/dfc/src/addr.go:3) TEXT addr+0(SB),$0-16
0001 (/home/dfc/src/addr.go:4) MOVW $s+0(FP),R0
0002 (/home/dfc/src/addr.go:4) MOVW R0,R0
0003 (/home/dfc/src/addr.go:4) MOVW 4(R0),R1
0004 (/home/dfc/src/addr.go:4) CMP $2,R1,
0005 (/home/dfc/src/addr.go:4) BHI ,8(APC)
0006 (/home/dfc/src/addr.go:4) BL ,runtime.panicindex+0(SB)
0007 (/home/dfc/src/addr.go:4) UNDEF ,
0008 (/home/dfc/src/addr.go:4) MOVW 0(R0),R0
0009 (/home/dfc/src/addr.go:4) ADD $8,R0
0010 (/home/dfc/src/addr.go:4) MOVW R0,.noname+12(FP)
0011 (/home/dfc/src/addr.go:4) RET ,
0012 (/home/dfc/src/addr.go:5) BL ,runtime.throwreturn+0(SB)
0013 (/home/dfc/src/addr.go:5) RET ,

I think the only thing that is useful about this example is, it’s good thing the optimiser is on by default because there are some strange things going on here, for example line 0002, and the unreachable branch at line 0012.

The last thing to talk about is the output of 5g is not the final code that is executed. Aside from the usual work of a linker, 5l does several transformations on the code which are important to understand.

func addr(s[]int) *int {
10c00: e59a1000 ldr r1, [sl]
10c04: e15d0001 cmp sp, r1
10c08: 33a01004 movcc r1, #4
10c0c: 33a02010 movcc r2, #16
10c10: 31a0300e movcc r3, lr
10c14: 3b00668c blcc 2a64c
10c18: e52de004 push {lr} ; (str lr, [sp, #-4]!)
return &s[2]
10c1c: e28d0008 add r0, sp, #8
10c20: e5901004 ldr r1, [r0, #4]
10c24: e3510002 cmp r1, #2
10c28: 8a000000 bhi 10c30
10c2c: eb0035d5 bl 1e388
10c30: e5900000 ldr r0, [r0]
10c34: e2800008 add r0, r0, #8
10c38: e58d0014 str r0, [sp, #20]
10c3c: e49df004 pop {pc} ; (ldr pc, [sp], #4)

Here we use objdump -dS to dump the addr function as it is compiled into the executable. The first six instructions, starting at 10c00, are the function preamble that deals with segmented stacks which is inserted automatically by the 5l.

Taking it further

There are several other compiler flags which are useful when debugging or optimising your Go code.

  • -g will output the steps a the compiler is a taking at a very low level. The discussion of the output format is outside the scope of this article. Personally I find it easier to add a warn statement which will tell me the source line the compiler was working on at the time.
  • -l will disable inlining (but still retain other compiler optimisations). This is very useful if you are investigating small methods, but can’t find them in objdump.
  • -m is mainly a frontend switch and outputs details about escape analysis and inlining choices.