Category Archives: Go

Why is a Goroutine’s stack infinite ?

Occasionally new Gophers stumble across a curious property of the Go language related to the amount of stack available to a Goroutine. This typically arises due to the programmer inadvertently creating an infinitely recursive function call. To illustrate this, consider the following (slightly contrived) example.

package main

import "fmt"

type S struct {
        a, b int
}

// String implements the fmt.Stringer interface
func (s *S) String() string {
        return fmt.Sprintf("%s", s) // Sprintf will call s.String()
}

func main() {
        s := &S{a: 1, b: 2}
        fmt.Println(s)
}

Were you to run this program, and I do not suggest that you do, you’d find that your machine would start to swap heavily, and will probably become unresponsive unless you’re quick to hit ^C before things become unsalvageable  Because I know the first thing everyone will do is try to run this program in the playground, I’ve saved you the bother.

Most programmers have run into problems with infinite recursion before, and while it is fatal to their program, it isn’t usually fatal to their machine. So, why are Go programs different ?

One of the key features of Goroutines is their cost; they are cheap to create in terms of initial memory footprint (as opposed to the 1 to 8 megabytes with a traditional POSIX thread) and their stack grows and shrinks as necessary. This allows a Goroutine to start with a single 4096 byte stack which grows and shrinks as needed without the risk of ever running out.

To implement this the linker (5l, 6l, 8l) inserts a small preamble at the start of each function1, which checks to see if the amount of stack required for the function is below the amount currently available. If not, a call is made to runtime⋅morestack, which allocates a new stack page2, copies the arguments from the caller, then returns control to the original function which can now execute safely. When that function exits, the process is undone, its return arguments are copied back to the stack frame of the caller and the unneeded stack space released.

By this process the stack is effectively infinite, and assuming that you’re not continually straddling the boundary between two stacks, colloquially known as stack splitting, is very cheap.

There is however one detail I have withheld until now, which links the accidental use of a recursive function to a serious case of memory exhaustion for your operating system, and that is, when new stack pages are needed, they are allocated from the heap.

As your infinite function continues to call itself, new stack pages are allocated from the heap, permitting the function to continue to call itself over and over again. Fairly quickly the size of the heap will exceed the amount of free physical memory in your machine, at which point swapping will soon make your machine unusable.

The size of the heap available to Go programs depends on a lot of things, including the architecture of your CPU and your operating system, but it generally represents an amount of memory that exceeds the physical memory of your machine, so your machine is likely to swap heavily before your program ever exhausts its heap.

In Go 1.1 there was a strong desire to increase the maximum size of the heap for both 32 bit and 64 bit platforms, and this has exacerbated the problem to some extent, ie, it is unlikely that you will have 128Gb3 of physical memory in your system.

As a final comment, there are several open issues (linklink) regarding this problem, but a solution that does not extract a performance penalty on properly written programs has yet to be found.

Notes
  1. This also applies to methods, but as methods are implemented as functions where the first argument is the method receiver, there is no practical difference when discussion how segmented stacks work in Go.
  2. Using the word page does not imply that only fixed, 4096 byte, allocations are possible, if necessary runtime⋅morestack will allocate a larger amount, probably rounded to a page boundary.
  3. 64 bit Windows platforms only permit a 32Gb heap due to a late change in the Go 1.1 release cycle.

Go 1.1 performance improvements, part 3

This is the final article in the series exploring the performance improvements available in the recent Go 1.1 release. You can also read part 1 and part 2 to get the back story for amd64 and 386.

This article focuses on the performance of arm platforms. Go 1.1 was an important release as it raised arm to a level on par with amd64 and 386 and introduced support for additional operating systems. Some highlights that Go 1.1 brings to arm are:

  • Support for cgo.
  • Additional of experimental support for freebsd/arm and netbsd/arm.
  • Better code generation, including a now partially working peephole optimiser, better register allocator, and many small improvements to reduce code size.
  • Support for ARMv6 hosts, including the Raspberry Pi.
  • The GOARM variable is now optional, and automatically chooses its value based on the host Go is compiled on.
  • The memory allocator is now significantly faster due to elimination of many 64 bit instructions which were previously emulated a high cost.
  • A significantly faster software division/modulo facility.

These changes were not possible without the efforts of Shenghou Ma, Rémy Oudompheng and Daniel Morsing who made enormous contributions to the compiler and runtime during the Go 1.1 development cycle.

Again, a huge debt of thanks is owed to Anthony Starks who helped prepare the benchmark data and images for this article.

Go 1 benchmarks on linux/arm

Since its release Go has supported more that one flavor of arm architecture. Presented here are benchmarks from a wide array of hosts to give a representative sample of the performance of Go 1.1 programs on arm hosts. From top left to bottom right

As always the results presented here are available in the autobench repository. The thumbnails are clickable for a full resolution view.

Hey, the images don’t work on my iSteve! Yup, it looks like iOS devices have a limit for the size of images they will load inside a web page, and these images are on the sadface side of that limit. If you click on the broken image, you’ll find the images will load fine in a separate page. Sorry for the inconvenience.

baseline-grid
The speedup in BinaryTree17, and to a lesser extent Fannkuch11, benchmarks is influenced by the performance of the heap allocator. Part of heap allocation involves updating statistics stored in 64 bit quantities, which flow into runtime.MemStats. During the 1.1 cycle, some quick work on the part of the Atom symbol removed many of these 64 bit operations, which shows as decreased run time in these benchmarks.

net/http

Across all the samples, net/http benchmarks have benefited from the new poller implementation as well as the pure Go improvements to the net/http package through the work of Brad Fitzpatrick and Jeff Allen.
net-grid

runtime

The results of the runtime benchmarks mirror those from amd64 and 386. The general trend is towards improvement, and in some cases, a large improvement, in areas like map operations.
runtime-grid
The improvements to the Append set of benchmarks shows the benefit of a change committed by Rob Pike which avoids a call to runtime.memmove when appending small amounts of data to a []byte.

The common theme across all the samples is the regression in some channel operations. This may be attributable to the high cost of performing atomic operations on arm platforms. Currently all atomic operations are implemented by the runtime package, but in the future they may be handled directly in the compiler which could reduce their overhead.

The CompareString benchmarks show a smaller improvement than other platforms because CL 8056043 has not yet been backported to arm.

Conclusion

With the additions of cgo support, throughput improvements in the net package, and improvements to code generation and garbage collector, Go 1.1 represents a significant milestone for writing Go programs targeting arm.

To wrap up this series of articles it is clear that Go 1.1 delivers on its promise of a general 30-40% improvement across all three supported architectures. If we consider the relative improvements across compilers, while 6g remains the flagship compiler and benefits from the fastest underlying hardware, 8g and 5g show a greater improvement relative to the Go 1.0 release of last year.

But wait, there is more

If you’ve enjoyed this series of posts and want to follow the progress of Go 1.2 I’ll soon be opening a branch of autobench which will track Go 1.1 vs tip (1.2). I’ll post and tweet the location when it is ready.

Since the Go 1.2 change window was opened on May 14th, the allocator and garbage collector have already received improvements from Dmitry Vyukov and the Atom symbol aimed at further reducing the cost of GC, and Carl Shapiro has started work on precise collection of stack allocated values.

Also for Go 1.2 are proposals for a better memory allocator, and a change to the scheduler to give it the ability to preempt long running goroutines, which is aimed at reducing GC latency.

Finally, Go 1.2 has a release timetable. So while we can’t really say what will or will not making it into 1.2, we can say that it should be done by the end of 2013.

Go 1.1 performance improvements, part 2

This is the second in a three part series exploring the performance improvements in the recent Go 1.1 release.

In part 1 I explored the improvements on amd64 platforms, as well as general improvements available to all via runtime and compiler frontend improvements.

In this article I will focus on the performance of Go 1.1 on 386 machines. The results in this article are taken from linux-386-d5666bad617d-vs-e570c2daeaca.txt.

Go 1 benchmarks on linux/386

When it comes to performance, the 8g compiler is at a disadvantage. The small number of general purpose registers available in the 386 programming model, and the weird restrictions on their use place a heavy burden on the compiler and optimiser. However that did not stop Rémy Oudompheng making several significant contributions to 8g during the 1.1 cycle.

Firstly the odd 387 floating point model was deprecated (it’s still there if you are running very old hardware with the GO386=387 switch) in favor of SSE2 instructions.

Secondly, Rémy put significant effort into porting code generation improvements from 6g into 8g (and 5g, the arm compiler). Where possible code was moved into the compiler frontend, gc, including introducing a framework to rewrite division as simpler shift and multiply operations.

linux-386-baseline

In general the results for linux/386 on this host show improvements that are as good, or in some cases, better than linux/amd64. Unlike linux/amd64, there is no slowdown in the Gzip or Gob benchmarks.

The two small regressions, BinaryTree17 and Fannkuch11, are assumed to be attributable to the garbage collector becoming more precise. This involves some additional bookkeeping to track the size and type of objects allocated on the heap, which shows up in these benchmarks.

net/http benchmarks

The improvements in the net package previously demonstrated in the linux/amd64 article carry over to linux/386. The improvements in the ClientServer benchmarks are not as marked as its amd64 cousin, but nonetheless show a significant improvement overall due to the tighter integration between the runtime and net package.

linux-386-net-http

Runtime microbenchmarks

Like the amd64 benchmarks in part 1, the runtime microbenchmarks show a mixture of results. Some low level operations got a bit slower, while other operations, like map have improved significantly.

linux-386-microbenchmarks

The final two benchmarks, which appear truncated, are actually so large they do not fit on the screen. The improvement is mostly due to this change which introduced a faster low level Equals operation for the strings, bytes and runtime packages. The results speak for themselves.

benchmark                                  old MB/s   new MB/s  speedup
BenchmarkCompareStringBigUnaligned         29.08      1145.48   39.39x
BenchmarkCompareStringBig                  29.09      1253.48   43.09x

Conclusion

Although 8g is not the leading compiler of the gc suite, Ken Thompson himself has said that there are essentially no free registers available on 386, linux/386 shows that it easily meets the 30-40% performance improvement claim. In some benchmarks, compared to Go 1.0, linux/386 beats linux/amd64.

Additionally, due to reductions in memory usage, all the compilers now use around half as much memory when compiling, and as a direct consequence, compile up to 30% faster than their 1.0 predecessors.

I encourage you to review the benchmark data in the autobench repository and if you are able, submit your own results.

In the final article in this series I will investigate the performance improvement Go 1.1 brings to arm platforms. I assure you, I’ve saved the best til last.

Update: thanks to @ajstarks who provided me with higher quality benchviz images.

Go 1.1 performance improvements

This is the first in a series of articles analysing the performance improvements in the Go 1.1 release.

It has been reported (here, and here) that performance improvements of 30-40% are available simply by recompiling your code under Go 1.1. For linux/amd64 this holds true for a wide spectrum of benchmarks. For platforms like linux/386 and linux/arm the results are even more impressive, but I’m putting the cart before the horse.

A note about gccgo. This series focuses on the contributions that the improvements to the gc series of compilers (5g, 6g and 8g) have made to Go 1.1’s performance. gccgo benefits indirectly from these improvements as it shares the same runtime and standard library, but is not the focus of this benchmarking series.

Go 1.1 features several improvements in the compilers, runtime and standard library that are directly attributable for the resulting improvements in program speed. Specifically

  • Code generation improvements across all three gc compilers, including better register allocation, reduction in redundant indirect loads, and reduced code size.
  • Improvements to inlining, including inlining of some builtin function calls and compiler generated stub methods when dealing with interface conversions.
  • Reduction in stack usage, which reduces pressure on stack size, leading to fewer stack splits.
  • Introduction of a parallel garbage collector. The collector remains mark and sweep, but the phases can now utillise all CPUs.
  • More precise garbage collection, which reduces the size of the heap, leading to lower GC pause times.
  • A new runtime scheduler which can make better decisions when scheduling goroutines.
  • Tighter integration of the scheduler with the net package, leading to significantly decreased packet processing latencies and higher throughput.
  • Parts of the runtime and standard library have been rewritten in assembly to take advantage of specific bulk move or crypto instructions.

Introducing autobench

Few things irk me more than unsubstantiated, unrepeatable benchmarks. As this series is going to throw out a lot of numbers, and draw some strong conclusions, it was important for me to provide a way for people to verify my results on their machines.

To this end I have built a simple make based harness which can be run on any platform that Go supports to compare the performance of a set of synthetic benchmarks against Go 1.0 and Go 1.1. While the project is still being developed, it has generated a lot of useful data which is captured in the repository. You can find the project on Github.

https://github.com/davecheney/autobench

I am indebted to Go community members who submitted benchmark data from their machines allowing me to make informed conclusions about the relative performance of Go 1.1.

If you are interested in participating in autobench there will be a branch which tracks the performance of Go 1.1 against tip opening soon.

A picture speaks a thousand words

To better visualise the benchmark results, AJ Starks has produced a wonderful tool, benchviz which turns the dry text based output of misc/benchcmp into rather nice graphs. You can read all about benchviz on AJ’s blog.

http://mindchunk.blogspot.com.au/2013/05/visualizing-go-benchmarks-with-benchviz.html

Following a tradition set by the misc/benchcmp tool, improvements, be they a reduction in run time, or an increase in throughput, are shown as bars extending towards the right. Regressions, fall back to the left.

Go 1 benchmarks on linux/amd64

The remainder of this post will focus on linux/amd64 performance. The 6g compiler is considered to be the flagship of the gc compiler suite. In addition to code generation improvements in the front and back ends, performance critical parts of the standard library and runtime have been rewritten in assembly to take advantage of SSE2 instructions.

The data for the remainder of this article is taken from the results file linux-amd64-d5666bad617d-vs-e570c2daeaca.txt.

bm0

 

The go1 benchmark suite, while being a synthetic benchmark, attempts to capture some real world usages of the main packages in the standard library. In general the results support the hypothesis of a broad 30-40% improvement. Looking at the results submitted to the autobench repository it is clear that GobEncode and Gzip have regressed and issues 5165 and 5166 have been raised, respectively  In the latter case, the switch to 64 bit ints is assumed to be at least partially to blame.

net/http benchmarks

This set of benchmarks are extracted from the net/http package and demonstrated the work that Brad Fitzpatrick and Dmitry Vyukov, and many others, have put into net and net/http packages.

bm2

 

Of note in this benchmark set are the improvements in ReadRequest benchmarks, which attempt to benchmark the decoding a HTTP request. The improvements in the ClientServerParallel benchmarks are not currently available across all amd64 platforms, as some of them have no support for the new runtime integration with the net package. Finishing support for the remaining BSD and Windows platforms is a focus for the 1.2 cycle.

Runtime microbenchmarks

The final set of benchmarks presented here are extracted from the runtime package.

bm1

 

The runtime benchmarks represent micro benchmarks of very low level parts of the runtime package.

The obvious regression is the first Append benchmark. While in wall time, the benchmark has increased from 36 ns/op to 100 ns/op, this shows that for some append use cases there has been a regression. This may have already been addressed in tip by CL 9360043.

The big wins in the runtime benchmarks are the amazing new map code by khr which addresses issue 3886, the reduction in overhead of channel operations (thanks to Dmitry’s new scheduler), improvements in operations involving complex128 operations, and speedups in hash and memmove operations which were rewritten in 64bit assembly.

Conclusion

For linux/amd64 on modern 64 bit Intel CPUs, the 6g compiler and runtime can generate significantly faster code. Other amd64 platforms share similar speedups, although the specific improvements vary. I encourage you to review the benchmark data in the autobench repository and if you are able, submit your own results.

In subsequent articles I will investigate the performance improvement Go 1.1 brings to 386 and arm platforms.

Update: thanks to @ajstarks who provided me with higher quality benchviz images.

Go 1.1 tarballs for linux/arm

For the time poor ARM fans in the room, I’ve updated my tarball distributions to Go 1.1. These tarballs are built using the same misc/dist tool that makes the official builds on the golang.org download page.

You can find the link at the Unofficial ARM tarballs for Go item at the top of this page. Please address any bug reports or comments to me directly.

There are also a number of other ways to obtain Go 1.1 appearing on the horizon. For example, if you are using Debian Sid, Go 1.1 is available now. This version has been imported into Ubuntu Saucy (which will become 13.10), although at this time it remains in the proposed channel.

Rest assured I will not be shy in announcing when Go 1.1 has wider availability in Ubuntu.

Go and Juju at Canonical slides posted

This month I had the privilege of presenting a talk at the GoSF meetup to 120 keen Gophers.

I was absolutely blown away by the Iron.io/HeavyBit offices. It was a fantastic presentation space with a professional sound and video crew to stream the meetup straight to G+.

The slides are available on my GitHub account, but a more convenient way to consume them is Gary Burd’s fantastic talks.godoc.org site.

http://talks.godoc.org/github.com/davecheney/gosf/5nines.slide#1

If you’re interested in finding out more about the Juju project itself, you can find us on the project page, https://launchpad.net/juju-core/ or / -dev on IRC.

Curious Channels

Channels are a signature feature of the Go programming language. Channels provide a powerful way to reason about the flow of data from one goroutine to another without the use of locks or critical sections.

Today I want to talk about two important properties of channels that make them useful for controlling not just data flow within your program, but the flow of control as well.

A closed channel never blocks

The first property I want to talk about is a closed channel. Once a channel has been closed, you cannot send a value on this channel, but you can still receive from the channel.

package main

import "fmt"

func main() {
        ch := make(chan bool, 2)
        ch <- true
        ch <- true
        close(ch)

        for i := 0; i < cap(ch) +1 ; i++ {
                v, ok := <- ch
                fmt.Println(v, ok)
        }
}

In this example we create a channel with a buffer of two, fill the buffer, then close it.

true true
true true
false false

Running the program shows we retrieve the first two values we sent on the channel, then on our third attempt the channel gives us the values of false and false. The first false is the zero value for that channel’s type, which is false, as the channel is of type chan bool. The second indicates the open state of the channel, which is now false, indicating the channel is closed. The channel will continue to report these values infinitely. As an experiment, alter this example to receive from the channel 100 times.

Being able to detect if your channel is closed is a useful property, it is used in the range over channel idiom to exit the loop once a channel has been drained.

package main

import "fmt"

func main() {
        ch := make(chan bool, 2)
        ch <- true
        ch <- true
        close(ch)

        for v := range ch {
                fmt.Println(v) // called twice
        }
}

but really comes into its own when combined with select. Let’s start with this example

package main

import (
        "fmt"
        "sync"
        "time"
)

func main() {
        finish := make(chan bool)
        var done sync.WaitGroup
        done.Add(1)
        go func() {
                select {
                case <-time.After(1 * time.Hour):
                case <-finish:
                }
                done.Done()
        }()
        t0 := time.Now()
        finish <- true // send the close signal
        done.Wait()    // wait for the goroutine to stop
        fmt.Printf("Waited %v for goroutine to stop\n", time.Since(t0))
}

Running the program, on my system, gives a low wait duration, hence it is clear that the goroutine does not wait the full hour before calling done.Done()

Waited 129.607us for goroutine to stop

But there are a few problems with this program. The first is the finish channel is not buffered, so the send to finish may block if the receiver forgot to add finish to their select statement. You could solve that problem by wrapping the send in a select block to make it non blocking, or making the finish channel buffered. However what if you had many goroutines listening on the finish channel, you would need to track this and remember to send the correct number of times to the finish channel. This might get tricky if you aren’t in control of creating these goroutines; they may be being created in another part of your program, perhaps in response to incoming requests over the network.

A nice solution to this problem is to leverage the property that a closed channel is always ready to receive. Using this property we can rewrite the program, now including 100 goroutines, without having to keep track of the number of goroutines spawned, or correctly size the finish channel

package main

import (
        "fmt"
        "sync"
        "time"
)

func main() {
        const n = 100
        finish := make(chan bool)
        var done sync.WaitGroup
        for i := 0; i < n; i++ { 
                done.Add(1)
                go func() {
                        select {
                        case <-time.After(1 * time.Hour):
                        case <-finish:
                        }
                        done.Done()
                }()
        }
        t0 := time.Now()
        close(finish)    // closing finish makes it ready to receive
        done.Wait()      // wait for all goroutines to stop
        fmt.Printf("Waited %v for %d goroutines to stop\n", time.Since(t0), n)
}

On my system, this returns

Waited 231.385us for 100 goroutines to stop

So what is going on here? As soon as the finish channel is closed, it becomes ready to receive. As all the goroutines are waiting to receive either from their time.After channel, or finish, the select statement is now complete and the goroutines exits after calling done.Done() to deincrement the WaitGroup counter. This powerful idiom allows you to use a channel to send a signal to an unknown number of goroutines, without having to know anything about them, or worrying about deadlock.

Before moving on to the next topic, I want to mention a final simplification that is preferred by many Go programmers. If you look at the sample program above, you’ll note that we never send a value on the finish channel, and the receiver always discards any value received. Because of this it is quite common to see the program written like this:

package main

import (
        "fmt"
        "sync"
        "time"
)

func main() {
        finish := make(chan struct{})
        var done sync.WaitGroup
        done.Add(1)
        go func() {
                select {
                case <-time.After(1 * time.Hour):
                case <-finish:
                }
                done.Done()
        }()
        t0 := time.Now()
        close(finish)
        done.Wait()
        fmt.Printf("Waited %v for goroutine to stop\n", time.Since(t0))
}

As the behaviour of the close(finish) relies on signalling the close of the channel, not the value sent or received, declaring finish to be of type chan struct{} says that the channel contains no value; we’re only interested in its closed property.

A nil channel always blocks

The second property I want to talk about is polar opposite of the closed channel property. A nil channel; a channel value that has not been initalised, or has been set to nil will always block. For example

package main

func main() {
        var ch chan bool
        ch <- true // blocks forever
}

will deadlock as ch is nil and will never be ready to send. The same is true for receiving

package main

func main() {
        var ch chan bool
        <- ch // blocks forever
}

This might not seem important, but is a useful property when you want to use the closed channel idiom to wait for multiple channels to close. For example

// WaitMany waits for a and b to close.
func WaitMany(a, b chan bool) {
        var aclosed, bclosed bool
        for !aclosed || !bclosed {
                select {
                case <-a:
                        aclosed = true
                case <-b:
                        bclosed = true
                }
        }
}

WaitMany() looks like a good way to wait for channels a and b to close, but it has a problem. Let’s say that channel a is closed first, then it will always be ready to receive. Because bclosed is still false the program can enter an infinite loop, preventing the channel b from ever being closed.

A safe way to solve the problem is to leverage the blocking properties of a nil channel and rewrite the program like this

package main

import (
        "fmt"
        "time"
)

func WaitMany(a, b chan bool) {
        for a != nil || b != nil {
                select {
                case <-a:
                        a = nil 
                case <-b:
                        b = nil
                }
        }
}

func main() {
        a, b := make(chan bool), make(chan bool)
        t0 := time.Now()
        go func() {
                close(a)
                close(b)
        }()
        WaitMany(a, b)
        fmt.Printf("waited %v for WaitMany\n", time.Since(t0))
}

In the rewritten WaitMany() we nil the reference to a or b once they have received a value. When a nil channel is part of a select statement, it is effectively ignored, so niling a removes it from selection, leaving only b which blocks until it is closed, exiting the loop without spinning.

Running this on my system gives

waited 54.912us for WaitMany

In conclusion, the simple properties of closed and nil channels are powerful building blocks that can be used to create highly concurrent programs that are simple to reason about.

What is the zero value, and why is it useful?

Let’s start with the Go language spec on the zero value.

When memory is allocated to store a value, either through a declaration or a call of make or new, and no explicit initialization is provided, the memory is given a default initialization. Each element of such a value is set to the zero value for its type: false for booleans, 0 for integers, 0.0 for floats, "" for strings, and nil for pointers, functions, interfaces, slices, channels, and maps. This initialization is done recursively, so for instance each element of an array of structs will have its fields zeroed if no value is specified.

This property of always setting a value to a known default is important for safety and correctness of your program, but can also make your Go programs simpler and more compact. This is what Go programmers talk about when they say “give your structs a useful zero value”.

Here is an example using sync.Mutex, which is designed to be usable without explicit initialization. The sync.Mutex contains two unexported integer fields. Thanks to the zero value those fields will be set to will be set to 0 whenever a sync.Mutex is declared.

package main

import "sync"

type MyInt struct {
        mu sync.Mutex
        val int
}

func main() {
        var i MyInt

        // i.mu is usable without explicit initialisation.
        i.mu.Lock()      
        i.val++
        i.mu.Unlock()
}

Another example of a type with a useful zero value is bytes.Buffer. You can decare a bytes.Buffer and start Reading or Writeing without explicit initialisation. Note that io.Copy takes an io.Reader as its second argument so we need to pass a pointer to b.

package main

import "bytes"
import "io"
import "os"

func main() {
        var b bytes.Buffer
        b.Write([]byte("Hello world"))
        io.Copy(os.Stdout, &b)
}

A useful property of slices is their zero value is nil. This means you don’t need to explicitly make a slice, you can just declare it.

package main

import "fmt"
import "strings"

func main() {
        // s := make([]string, 0)
        // s := []string{}
        var s []string

        s = append(s, "Hello")
        s = append(s, "world")
        fmt.Println(strings.Join(s, " "))
}

Note: var s []string is similar to the two commented lines above it, but not identical. It is possible to detect the difference between a slice value that is nil and a slice value that has zero length. The following code will output false.

package main

import "fmt"
import "reflect"

func main() {
        var s1 = []string{}
        var s2 []string
        fmt.Println(reflect.DeepEqual(s1, s2))
}

A surprising, but useful, property of nil pointers is you can call methods on types that have a nil value. This can be used to provide default values simply.

package main

import "fmt"

type Config struct {
        path string
}

func (c *Config) Path() string {
        if c == nil {
                return "/usr/home"
        }
        return c.path
}

func main() {
        var c1 *Config
        var c2 = &Config{
                path: "/export",
        }
        fmt.Println(c1.Path(), c2.Path())
}

With thanks to Jan MerclDoug LandauerStefan Nilsson, and Roger Peppe from the wonderful Go+ community for their feedback and suggestions.

Go, the language for emulators

So, I hear you like emulators. It turns out that Go is a great language for writing retro-computing emulators. Here are the ones that I have tried so far:

trs80 by Lawrence Kesteloot

I really liked this one because it avoids the quagmire of OpenGL or SDL dependencies and runs in your web browser. I had a little trouble getting it going so if you run into problems remember to execute the trs80 command in the source directory itself. If you’ve used go get github.com/lkesteloot/trs80 then it will be $GOPATH/src/github.com/lkesteloot/trs80.

trs80

GoSpeccy by Andrea Fazzi

GoSpeccy was the first emulator written in Go that I am aware of, Andrea has been quietly hacking away well before Go hit 1.0. I’ve even been able to get GoSpeccy running on a Raspberry Pi, X forwarded back to my laptop. Here is a screenshot running the Fire104b intro by Andrew Gerrand

GoSpeccy on a Raspberry Pi

Fergulator by Scott Ferguson

Like GoSpeccy, Fergulator shows the power of Go as a language for writing complex emulators, and the power of go get to handle packages with complex dependencies. Here are the two commands that took me from having no NES emulation on my laptop, to full NES emulation on my laptop.

lucky(~) sudo apt-get install libsdl1.2-dev libsdl-gfx1.2-dev libsdl-image1.2-dev libglew1.6-dev libxrandr-dev
lucky(~) % go get github.com/scottferg/Fergulator
Fergulator

sms by Andrea Fazzi

What’s this? Another emulator for Andrea Fazzi ? Why, yes it is. Again, super easy to install with go get -v github.com/remogatto/sms. Sadly there are no sample roms included with sms due to copyright restrictions, so no screenshot. Update: Andrea has included an open source ROM so we can have a screenshot.

sms

Update: Several Gophers from the wonderful Go+ community commented that there are still more emulators that I haven’t mentioned.

Testing Go on the Raspberry Pi running FreeBSD

This afternoon Oleksandr Tymoshenko posted an update on the state of FreeBSD on ARMv6 devices. The takeaway for Raspberry Pi fans is things are working out nicely. A few days ago a usable image was published allowing me to do some serious testing of the Go freebsd/arm port.

So, what works? Pretty much everything

[root@raspberry-pi ~]# go run src/hello.go
Hello, 世界

For the moment cgo and hardware floating point is disabled. I disabled cgo support early in testing after some segfaults, but it shouldn’t be too hard to fix. The dist tool is currently failing to auto detect1 support for any floating point hardware.

[root@raspberry-pi ~]# go tool dist env
GOROOT="/root/go"
GOBIN="/root/go/bin"
GOARCH="arm"
GOOS="freebsd"
GOHOSTARCH="arm"
GOHOSTOS="freebsd"
GOTOOLDIR="/root/go/pkg/tool/freebsd_arm"
GOCHAR="5"
GOARM="5"

This could be because the auto detection is broken on freebsd/arm, but possibly this kernel image does not enable the floating point unit. I’ll update this post when I’ve done some more testing.

At the moment performance is not great, even by Pi standards. The SDCard runs in 1bit 25mhz mode, and I believe the caches are disabled or set to write though. The image has been stable for me, allowing me to compile Go, and various ports required by the build scripts.

[root@raspberry-pi ~/go/test/bench/go1]# go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkBinaryTree17    1        166473841000 ns/op
BenchmarkFannkuch11      1        83260837000 ns/op
BenchmarkGobDecode       5         518688800 ns/op           1.48 MB/s
BenchmarkGobEncode      10         225905200 ns/op           3.40 MB/s
BenchmarkGzip            1        16926476000 ns/op          1.15 MB/s
BenchmarkGunzip          1        2849252000 ns/op           6.81 MB/s
BenchmarkJSONEncode      1        3149797000 ns/op           0.62 MB/s
BenchmarkJSONDecode      1        6253162000 ns/op           0.31 MB/s
BenchmarkMandelbrot200   1        20880387000 ns/op
BenchmarkParse          10         250097600 ns/op           0.23 MB/s
BenchmarkRevcomp         5         279384200 ns/op           9.10 MB/s
BenchmarkTemplate        1        7347360000 ns/op           0.26 MB/s
ok      _/root/go/test/bench/go1        380.408s

If you are interested in experimenting with FreeBSD on your Pi, or testing Go on freebsd/arm, please get in touch with me.

Update: As of 6th Jan, 2013, benchmarks and IO have improved.

BenchmarkGobDecode             5         482796600 ns/op           1.59 MB/s
BenchmarkGobEncode            10         226637900 ns/op           3.39 MB/s
BenchmarkGzip          1        15986424000 ns/op          1.21 MB/s
BenchmarkGunzip        1        2553481000 ns/op           7.60 MB/s
BenchmarkJSONEncode            1        2967743000 ns/op           0.65 MB/s
BenchmarkJSONDecode            1        6014558000 ns/op           0.32 MB/s
BenchmarkMandelbrot200         1        19312855000 ns/op
BenchmarkParse        10         238778300 ns/op           0.24 MB/s
BenchmarkRevcomp               5         307852000 ns/op           8.26 MB/s
BenchmarkTemplate              1        6767514000 ns/op           0.29 MB/s

1. Did you know that Go automatically detects the floating point capabilities of the machine it is built on ?