Monthly Archives: March 2016

Threads are a strange abstraction

When you think about it, threads are a strange abstraction.

From the programmer’s point of view, threads are great. It’s as if you can create virtual CPUs, on the fly, and the operating system will take care of simulating these virtual CPUs on top of real ones.

But on an implementation level, what is a simple abstraction can lead you, the programmer, into a trap. You don’t have an infinite number of CPUs and applying threaded abstractions in an unbounded manner invites you to overwhelm the real CPU if you try to actually use all these virtual CPUs simultaneously, or overwhelm the address space if they sit idle due to the overhead needed to maintain the illusion.

Careful tuning of one or both of the number of threads in use by the program, or the amount of memory each thread is allocated is needed whenever threads as a concurrency primitive are used in anger. So much for abstraction.

Go’s goroutines sit right in the middle, the same programmer interface as threads, a nice imperative coding model, but also efficient implementation model based on coroutines.

Go project contributors by the numbers

In December 2014 the Go project moved from Google Code to GitHub. Along with the move to GitHub, the Go project moved from Mercurial to Git, which necessitated a move away from Rietveld to Gerrit for code review.

A healthy open source project lives and dies by its contributors. People come and people go as time, circumstance, their jobs, and their interests change. I wanted to investigate if these moves were a net positive for the project.

Go project contributors

As of writing 744 people have contributed to the Go project. This number is slightly lower than OpenHub‘s count, which also includes the golang.org/x sub-repositories that this analysis ignores. The number is slightly higher than GitHub‘s count for reasons which are unclear.

Go project contributors (click to enlarge)

This graph shows contributors over time. As a new contributor lands their first commit, a line representing the current count of contributors is placed on the graph. The graph also records the number of contributors whose email addresses end in golang.org or google.com as a proxy for the Go team and Google employees respectively.

Did the move to Git, Gerrit, and GitHub increase the number of contributors, and thereby contributions? Almost certainly. The graph shows a clear up-tick in the number of new contributors to the project from January 2015 onwards. However with so many factors in play it is not possible to identify a single cause.

It should also be noted that even prior to the move, the project has always attracted a healthy stream of new contributors.

Runtime contributors

Runtime contributors (click to enlarge)

From June 2014 to December 2014, the Go runtime was rewritten (with the help of some automated tools) from C to Go. This graph records the number of contributors to the parts of the standard library considered the runtime. This has changed over time as paths have been renamed, but is currently what we think of as the tree rooted at $GOROOT/src/runtime.

Did the rewrite of the runtime from a dialect of C that was compiled with the project’s own C compiler to Go increase the number of individual contributors willing to work on the runtime? Unfortunately not, although the number of Googlers contributing to the the runtime did increase slightly. An increase in individual runtime contributions did not occur till the following January once the move to GitHub was complete.

Compiler contributors

Compiler contributors (click to enlarge)

After the move to GitHub, the Go compiler was translated from C to Go for the Go 1.5 release. This process was completed by May 2015. Did this have an impact on the number of new contributors specifically targeting the compiler? Possibly, there was a short lived spurt of new contributions around July 2015. The reason for this could be attributed to the strong message from the Go team that the 1.5 release was focusing on the runtime, specifically the garbage collector, and that the current compiler was to be replaced with the SSA back end being developed for the 1.6 time frame (since delayed til 1.7).

This latter point is supported by the clear spike in contributors after February 2016, when the 1.7 tree opened for development with the SSA back end attracting a number of talented new contributors.

An open source project

At the moment there is no question that the largest contributors by total commits to Go are the Go team themselves. This stands to reason as they are sponsored by Google itself to develop the language. However, out of the current top 16 contributors to the project, the number 7th, 9th, 11th, 14th, and 16th contributors are neither members of the Go team or employed by Google.

The charge is commonly levelled at Google that Go is not an open source project. This analysis shows that claim to be false. The number of contributors from the Go team, or Google, continues to be a dwindling fraction of the total number of contributors to the project.

Must be willing to relocate

The must be willing to relocate to San Francisco meme has been doing the rounds on Twitter to great effect. The best jokes have a grain of truth to them. I think it is absurd to expect to draw on an infinite supply of debt burdened twenty somethings to relocate to the hottest real estate market on the planet.

A long time ago I worked for a company who hired globally and was willing to relocate people to work in Sydney on very generous terms and worked hard to make the process as painless as possible–and just to be clear, I’m not having a go at this companies’ policies, I like living in Sydney, I’m sure you would too.

What I want to discuss is the potential this creates for a conflict of interest.

Say you’ve up and moved your family to Australia. Your partner has had to leave their job, your kids have left their school, everyone’s left their circle of extended family and friends. It’s a huge upheaval.

Your company is not just your sponsor, but your spouse and your children’s. You’ve got a strong incentive to keep your employer happy with you–your contract stipulates a 6 month probationary period. Not to mention the 12 month lease you just signed on a three bedroom apartment.

You believe in the company, you just moved your family half way around the planet to prove it, and you want to succeed at this job. But, do you really want to take big risks if it could mean finding yourself having to explain to your partner that you have to find another job in the next two weeks or you all have to leave the country?

Employers, in asking people to relocate and placing them in the position of having to make long term commitments to work at your chosen location, are you putting those employees in a position where they can speak freely and act in your best interests?

Should methods be declared on T or *T

This post is a continuation of a suggestion I made on twitter a few days ago.

In Go, for any type T, there exists a type *T which is the result of an expression that takes the address of a variable of type T¹. For example:

type T struct { a int; b bool }
var t T    // t's type is T
var p = &t // p's type is *T

These two types, T and *T are distinct, but *T is not substitutable for T².

You can declare a method on any type that you own; that is, a type that you declare in your package³. Thus it follows that you can declare methods on both the type you declare, T, and its corresponding derived pointer type, *T. Another way to talk about this is to say methods on a type are declared to take a copy of their receiver’s value, or a pointer to their receiver’s value ⁴. So the question becomes, which is the most appropriate form to use?

Obviously if your method mutates its receiver, it should be declared on *T. However, if the method does not mutate its receiver, is it safe to declare it on T instead⁵?

It turns out that the cases where it is safe to do so are very limited. For example, it is well known that you should not copy a sync.Mutex value as that breaks the invariants of the mutex. As mutexes control access to other things, they are frequently wrapped up in a struct with the value they control:

package counter

type Val struct {
        mu  sync.Mutex
        val int
}

func (v *Val) Get() int {
        v.mu.Lock()
        defer v.mu.Unlock()
        return v.val
}

func (v *Val) Add(n int) {
        v.mu.Lock()
        defer v.mu.Unlock()
        v.val += n
}

Most Go programmers know that it is a mistake to forget to declare the Get or Add methods on the pointer receiver *Val. However any type that embeds a Val to utilise its zero value, must also only declare methods on its pointer receiver otherwise it may inadvertently copy the contents of its embedded type’s values.

type Stats struct {
        a, b, c counter.Val
}

func (s Stats) Sum() int {
        return s.a.Get() + s.b.Get() + s.c.Get() // whoops
}

A similar pitfall can occur with types that maintain slices of values, and of course there is the possibility for an unintended data race.

In short, I think that you should prefer declaring methods on *T unless you have a strong reason to do otherwise.

We say T but that is just a place holder for a type that you declare.
This rule is recursive, taking the address of a variable of type *T returns a result of type **T.
This is why nobody can declare methods on primitive types like int.
Methods in Go are just syntactic sugar for a function which passes the receiver as the first formal parameter.
If the method does not mutate its receiver, does it need to be a method?

Suggestions for contributing to an Open Source project

Occasionally I am asked for advice on how to get started contributing to an Open Source project. I thought it may be useful to write down my suggestions.

These points were written in the context of the Go programming language, but I think this advice is applicable to the majority of modern Open Source projects.

Pick an issue you know how to solve. The best way to get started with a project is to fix a bug. You’ll need to be self sufficient, so do some research and investigate the history behind a bug. Don’t pick an issue you have no familiarity with and then ask “Who can tell me how to solve this bug?”
Ask for more detail. Many bugs lack enough detail to be addressed, so promoting the reporter for more information is in itself a useful service. You may discover that the bug is a duplicate of another, in which case it can be closed. If you can distil the bug report into a reproduction or a test case that is a valuable contribution in itself.
Discuss your change first. When you have chosen a bug, discuss your change before starting to code. You can experiment privately, but do not send a change without discussing it first. Your can probably skip this with very trivial changes, like typos or adding a small test case to an existing package, but for anything larger the rule is: discuss, then code.
Always include a test. One of the first things a reviewer will do is patch in your test and verify that it fails before even looking at your fix. You should therefore write the failing test case first, then write the fix. It may be that you need to refactor the code to be able to write a failing test, which is fine, but brings me back to point 3; discuss your change first. If the project does not have a strong testing regime then you should describe how you went about verifying the fix so someone reviewing your change can do the same.
Change as little as possible. All things being equal, smaller changes are easier to review and are merged faster than large ones. You should aim to change as little as possible to keep the size of the change as small as possible. Avoid the temptation to include a bunch of unrelated changes.
Follow the existing style. Even with tools like gofmt, large projects will commonly exhibit minor stylistic differences. My rule of thumb is: always follow the predominant style of the file in question; if they use long identifiers, use long identifiers, if they use short ones, do so too, and so on. Above all, resist the temptation to include a large stylistic change along with your bug fix.
Be polite, but persistent. If you haven’t received feedback on your proposal after a few days, politely ask for a response. It may be that your proposal was overlooked, or that the project is currently in a feature freeze. Assuming you have followed the advice above, you should expect to get actionable advice on how to improve your change so it can be reviewed.

Dave Cheney

The acme of foolishness