Tag Archives: compiler

Go 1.7 toolchain improvements

This is a progress report on the Go toolchain improvements during the 1.7 development cycle. All measurements were taken using a Thinkpad x220, Core i5-2520M, running Ubuntu 14.04 linux.

Faster compilation

Since Go 1.5, when the compiler itself was translated from C to Go, compile times are slower than they used to be. Everyone knows it, nobody is happy about it, and we’re working on fixing it.

A huge amount of effort in the 1.7 cycle has gone into reducing the amount of memory and the wall time the compiler uses for various benchmark jobs. The results so far are:

Compile time for full build relative to Go 1.4.3

Compile time for full build relative to Go 1.4.3

Previously I reported the current 1.7 compiler was 2x slower than Go 1.4.3 for the Jujud test. After re-benchmarking everything for this post, the slowdown is closer to 2.2x. Jujud is the largest of the three benchmarks–512 packages, vs 304 and 102 packages respectively–and shows the largest slowdown.

The cmd/go result is a little misleading as the code being compiled changes with every release, vs the fixed codebases of the other benchmarks.

Note: The benchmark scripts for jujud, kube-controller-manager, and gogs are online. Please try them yourself and report your findings.

Improved linker performance

A significant part of the build time improvements observed above come from improvements to the linker. Relative to Go 1.6, the linker is now roughly 66% faster.

Link time relative to Go 1.4.3

Link time relative to Go 1.4.3

Relative to Go 1.4.3, linking is 10% faster for any non trivial binary, and up to 30% faster for large binaries like jujud. These figures are for ELF targets only, Mach-o and PE targets have not improved as much.

This isn’t just useful for building final binaries. A faster linker improves the edit/compile/test cycle as each test is itself a program that must be linked and run. Anecdotally the linker now uses a third less memory, which is valuable when linking large binaries.

Smaller binaries

With code generation and linker improvements, binaries produced by the tip compiler are substantially smaller than Go 1.6. This work has been spearheaded by David Crawshaw.

Binary sizes

Binary size (bytes)

At this point, with the exception of its own cmd/go tool, Go 1.7 produces smaller binaries than Go 1.4.3.

Code generation improvements

The big feature for the Go 1.7 cycle is the new SSA backend for 64bit Intel.

While not the focus of this post, it would be remiss not to include some information about performance improvements in compiled code (not just the compiler’s code). Stressing of course that these are preliminary figures as there is still four months to go before the new backend becomes the default for amd64.

Go 1 benchmarks, Go 1.6 vs 683448a

Go 1 benchmarks, Go 1.6 vs 683448a

These numbers match the figures reported by Keith Randall a few weeks ago, and are in line with his thesis in the SSA design doc.

I think it would be fairly easy to make the generated programs 20% smaller and 10% faster. — khr

These improvements are not just the work of the SSA backend. The standard library and garbage collector continue to see improvements, including a 20% improvement to the fmt package by Martin Möhrmann. These benefits flow to all the platforms that Go supports.

The sole regression above is caused by a current limitation in the register optimiser which manages to registerise one less variable in the Mandelbrot inner loop.

Looking ahead

According to the release schedule, approximately one month remains before the 1.7 change window closes and the dev cycle enters the bug fix phase. There is still lots of work to do, but the improvements so far will easily make Go 1.7 the best Go release to date.

How does the go build command work ?

This post explains how the Go build process works using examples from Go’s standard library.

The gc toolchain

This article focuses on the gc toolchain. The gc toolchain takes its name for the Go compiler frontend, cmd/gc, and is mainly used to distinguish it from the gccgo toolchain. When people talk about the Go compilers, they are likely referring to the gc toolchain. This article won’t be focusing on the gccgo toolchain.

The gc toolchain is a direct descendant of the Plan 9 toolchain. The toolchain consists of a Go compiler, a C compiler, an assembler and a linker. Each of these tools are found in the src/cmd/ subdirectory of the Go source and consist of a frontend, which is shared across all implementations, and a backend which is specific to the processor architecture. The backends are distinguished by their letter, which is again a Plan 9 tradition. The commands are:

  • 5g, 6g, and 8g are the compilers for .go files for arm, amd64 and 386
  • 5c, 6c, and 8c are the compilers for .c files for arm, amd64 and 386
  • 5a, 6a, and 8a are the assemblers for .s files for arm, and64 and 386
  • 5l, 6l, and 8l are the linkers for files produced by the commands above, again for arm, amd64 and 386.

It should be noted that each of these commands can be compiled on any supported platform and this forms the basis of Go’s cross compilation abilities. You can read more about cross compilation in this article.

Building packages

Building a Go package involves at least two steps, compiling the .go files then packing the results into an archive. In this example I’m going to use crypto/hmac as it is a small package, only one source and one test file. Using the -x option I’ve asked go build to print out every step as it executes them

% go build -x crypto/hmac
mkdir -p $WORK/crypto/hmac/_obj/
mkdir -p $WORK/crypto/
cd /home/dfc/go/src/pkg/crypto/hmac
/home/dfc/go/pkg/tool/linux_arm/5g -o $WORK/crypto/hmac/_obj/_go_.5 -p crypto/hmac -complete -D _/home/dfc/go/src/pkg/crypto/hmac -I $WORK ./hmac.go
/home/dfc/go/pkg/tool/linux_arm/pack grcP $WORK $WORK/crypto/hmac.a $WORK/crypto/hmac/_obj/_go_.5

Stepping through each of these steps

mkdir -p $WORK/crypto/hmac/_obj/
mkdir -p $WORK/crypto/

go build creates a temporary directory, /tmp/go-build249279931 and populates it with some skeleton subdirectories to hold the results of the compilation. The second mkdir may be redundant, issue 6538 has been created to track this.

cd /home/dfc/go/src/pkg/crypto/hmac
/home/dfc/go/pkg/tool/linux_arm/5g -o $WORK/crypto/hmac/_obj/_go_.5 -p crypto/hmac -complete -D _/home/dfc/go/src/pkg/crypto/hmac -I $WORK ./hmac.go

The go tool switches the the source directory of crypto/hmac and invokes the go compiler for this architecture, in this case 5g. In reality there is no cd, /home/dfc/go/src/pkg/crypto/hmac is the supplied as the exec.Command.Dir field when 5g is executed. This means the .go source files can be relative to their source directory, making the command line shorter.

The compiler produces a single temporary output file in $WORK/crypto/hmac/_obj/_go_.5 which will be used in the final step.

/home/dfc/go/pkg/tool/linux_arm/pack grcP $WORK $WORK/crypto/hmac.a $WORK/crypto/hmac/_obj/_go_.5

The final step is to pack the object file into an archive file, .a, which the linker and the compiler consume.

Because we invoked go build on a package, the result is discarded as $WORK is deleted after the build completes. If we invoke go install -x two additional lines appear in the output

mkdir -p /home/dfc/go/pkg/linux_arm/crypto/
cp $WORK/crypto/hmac.a /home/dfc/go/pkg/linux_arm/crypto/hmac.a

This demonstrates the difference between go build and install; build builds, install builds then installs the result to be used by other builds.

Building more complex packages

You may be wondering what the pack step in the previous example does. As the compiler and linker only accept a single file representing the contents of the package, if a package contains multiple object files, they must be packed into a single .a archive before they can be used.

A common example of a package producing more than one intermediary object file is cgo, but that is too complicated for this article, instead a simpler example is a package that contains some .s assembly files, like crypto/md5.

% go build -x crypto/md5
mkdir -p $WORK/crypto/md5/_obj/
mkdir -p $WORK/crypto/
cd /home/dfc/go/src/pkg/crypto/md5
/home/dfc/go/pkg/tool/linux_amd64/6g -o $WORK/crypto/md5/_obj/_go_.6 -p crypto/md5 -D _/home/dfc/go/src/pkg/crypto/md5 -I $WORK ./md5.go ./md5block_decl.go
/home/dfc/go/pkg/tool/linux_amd64/6a -I $WORK/crypto/md5/_obj/ -o $WORK/crypto/md5/_obj/md5block_amd64.6 -D GOOS_linux -D GOARCH_amd64 ./md5block_amd64.s
/home/dfc/go/pkg/tool/linux_amd64/pack grcP $WORK $WORK/crypto/md5.a $WORK/crypto/md5/_obj/_go_.6 $WORK/crypto/md5/_obj/md5block_amd64.6

In this example, executed on a linux/amd64 host, 6g is invoked to compile two .go files, md5.go and md5block_decl.go. The latter contains the forward declaration for the functions implemented in assembly.

6a is then invoked to assemble md5block_amd64.s. The logic for choosing which .s to compile is described in my previous article on conditional compilation.

Finally pack is invoked to pack the Go object file, _go_.6, and the assembly object file, md5block_amd64.6, into a single archive.

Building commands

A Go command is a package who’s name is main. Main packages, or commands, are compiled just like other packages, but then undergo several additional steps to be linked into final executable. Let’s investigate this process with cmd/gofmt

% go build -x cmd/gofmt
mkdir -p $WORK/cmd/gofmt/_obj/
mkdir -p $WORK/cmd/gofmt/_obj/exe/
cd /home/dfc/go/src/cmd/gofmt
/home/dfc/go/pkg/tool/linux_amd64/6g -o $WORK/cmd/gofmt/_obj/_go_.6 -p cmd/gofmt -complete -D _/home/dfc/go/src/cmd/gofmt -I $WORK ./doc.go ./gofmt.go ./rewrite.go ./simplify.go
/home/dfc/go/pkg/tool/linux_amd64/pack grcP $WORK $WORK/cmd/gofmt.a $WORK/cmd/gofmt/_obj/_go_.6
cd .
/home/dfc/go/pkg/tool/linux_amd64/6l -o $WORK/cmd/gofmt/_obj/exe/a.out -L $WORK $WORK/cmd/gofmt.a
cp $WORK/cmd/gofmt/_obj/exe/a.out gofmt

The first six lines should be familiar, main packages are compiled like any other Go package, they are even packed like any other package.

The difference is the penultimate line, which invokes the linker to produce a binary executable.

/home/dfc/go/pkg/tool/linux_amd64/6l -o $WORK/cmd/gofmt/_obj/exe/a.out -L $WORK $WORK/cmd/gofmt.a

The final line copies and renames the completed binary to its final name. If you had used go install the binary would be copied to $GOPATH/bin (or $GOBIN if set).

A little history

If you go far enough back in time, back before the go tool, back to the time of Makefiles, you can still find the core of the Go compilation process. This example is taken from the release.r60 documentation

$ cat >hello.go <<EOF
package main

import "fmt"

func main() {
        fmt.Printf("hello, world\n")
$ 6g hello.go
$ 6l hello.6
$ ./6.out
hello, world

It’s all here, 6g compiling a .go file into a .6 object file, 6l linking the object file against the fmt (and runtime) packages to produce a binary, 6.out.

Wrapping up

In this post we’ve talked about how go build works and touched on how go install differs in its treatment of the compilation result.

Now that you know how go build works, and how to investigate the build process with -x, try passing that flag to go test and observe the result.

Additionally, if you have gccgo installed on your system, you can pass -compiler gccgo to go build, and using -x investigate how Go code is built using this compiler.

Notes on exploring the compiler flags in the Go compiler suite

I’ve been doing some work improving the code generation of the 5g compiler, which is the Go compiler for arm. These notes also apply to the 6g and 8g compilers for amd64 and 386 respectively.

For this discussion we’ll use a very simple package.

package addr

func addr(s[]int) *int {
        return &s[2]

To see the assembly produced by compiling this package we use the -S flag. -S can be passed directly to the compiler with go tool 5g -S addr.go, but it is simpler (and more portable) to use the -gcflags flag on the go tool itself.

% go build -gcflags=-S addr.go
# command-line-arguments

--- prog list "addr" ---
0000 (/home/dfc/src/addr.go:3) TEXT addr+0(SB),$0-16
0001 (/home/dfc/src/addr.go:4) MOVW $s+0(FP),R0
0002 (/home/dfc/src/addr.go:4) MOVW 4(R0),R1
0003 (/home/dfc/src/addr.go:4) CMP $2,R1,
0004 (/home/dfc/src/addr.go:4) BHI ,6(APC)
0005 (/home/dfc/src/addr.go:4) BL ,runtime.panicindex+0(SB)
0006 (/home/dfc/src/addr.go:4) MOVW 0(R0),R0
0007 (/home/dfc/src/addr.go:4) ADD $8,R0
0008 (/home/dfc/src/addr.go:4) MOVW R0,.noname+12(FP)
0009 (/home/dfc/src/addr.go:4) RET ,

This is quite a lot of code for a one line function. One of the reasons for this is s is a slice, whose length is not known at compile time, so the compiler must insert a bounds check. We can tell the compiler to not emit bounds checks with the -B flag.

% go build -gcflags=-SB addr.go
# command-line-arguments

--- prog list "addr" ---
0000 (/home/dfc/src/addr.go:3) TEXT     addr+0(SB),$0-16
0001 (/home/dfc/src/addr.go:4) MOVW     $s+0(FP),R0
0002 (/home/dfc/src/addr.go:4) MOVW     0(R0),R0
0003 (/home/dfc/src/addr.go:4) ADD      $8,R0
0004 (/home/dfc/src/addr.go:4) MOVW     R0,.noname+12(FP)
0005 (/home/dfc/src/addr.go:4) RET      ,

It is important to note that -B is an unsupported flag. The goal of Go is a safe language, one where array subscripts are bounds checked when they are not provably safe. Go already elides bounds checks when you use range loops, and future compilers will improve this. It is also important to note that none of the builders test -B so it might even generate incorrect code. In summary, when the compiler improves, -B will go away, so don’t get too attached.

One other interesting flag is -N, which will disable the optimisation pass in the compiler

% go build -gcflags=-SN addr.go
# command-line-arguments

--- prog list "addr" ---
0000 (/home/dfc/src/addr.go:3) TEXT     addr+0(SB),$0-16
0001 (/home/dfc/src/addr.go:4) MOVW     $s+0(FP),R0
0002 (/home/dfc/src/addr.go:4) MOVW     R0,R0
0003 (/home/dfc/src/addr.go:4) MOVW     4(R0),R1
0004 (/home/dfc/src/addr.go:4) CMP      $2,R1,
0005 (/home/dfc/src/addr.go:4) BHI      ,8(APC)
0006 (/home/dfc/src/addr.go:4) BL       ,runtime.panicindex+0(SB)
0007 (/home/dfc/src/addr.go:4) UNDEF    ,
0008 (/home/dfc/src/addr.go:4) MOVW     0(R0),R0
0009 (/home/dfc/src/addr.go:4) ADD      $8,R0
0010 (/home/dfc/src/addr.go:4) MOVW     R0,.noname+12(FP)
0011 (/home/dfc/src/addr.go:4) RET      ,
0012 (/home/dfc/src/addr.go:5) BL       ,runtime.throwreturn+0(SB)
0013 (/home/dfc/src/addr.go:5) RET      ,

I think the only thing that is useful about this example is, it’s good thing the optimiser is on by default because there are some strange things going on here, for example line 0002, and the unreachable branch at line 0012.

The last thing to talk about is the output of 5g is not the final code that is executed. Aside from the usual work of a linker, 5l does several transformations on the code which are important to understand.

func addr(s[]int) *int {
   10c00:       e59a1000        ldr     r1, [sl]
   10c04:       e15d0001        cmp     sp, r1
   10c08:       33a01004        movcc   r1, #4
   10c0c:       33a02010        movcc   r2, #16
   10c10:       31a0300e        movcc   r3, lr
   10c14:       3b00668c        blcc    2a64c 
   10c18:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
        return &s[2]
   10c1c:       e28d0008        add     r0, sp, #8
   10c20:       e5901004        ldr     r1, [r0, #4]
   10c24:       e3510002        cmp     r1, #2
   10c28:       8a000000        bhi     10c30 
   10c2c:       eb0035d5        bl      1e388 
   10c30:       e5900000        ldr     r0, [r0]
   10c34:       e2800008        add     r0, r0, #8
   10c38:       e58d0014        str     r0, [sp, #20]
   10c3c:       e49df004        pop     {pc}            ; (ldr pc, [sp], #4)

Here we use objdump -dS to dump the addr function as it is compiled into the executable. The first six instructions, starting at 10c00, are the function preamble that deals with segmented stacks which is inserted automatically by the 5l.

Taking it further

There are several other compiler flags which are useful when debugging or optimising your Go code.

  • -g will output the steps a the compiler is a taking at a very low level. The discussion of the output format is outside the scope of this article. Personally I find it easier to add a warn statement which will tell me the source line the compiler was working on at the time.
  • -l will disable inlining (but still retain other compiler optimisations). This is very useful if you are investigating small methods, but can’t find them in objdump.
  • -m is mainly a frontend switch and outputs details about escape analysis and inlining choices.