Category Archives: Programming

Understand Go pointers in less than 800 words or your money back

This post is for programmers coming to Go who are unfamiliar with the idea of pointers or a pointer type in Go.

What is a pointer?

Simply put, a pointer is a value which points to the address of another. This is the textbook explanation, but if you’re coming from a language that doesn’t let you talk about address of a variable, it could very well be written in Cuneiform.

Let’s break this down.

What is memory?

Computer memory, RAM, can be thought of as a sequence of boxes, placed one after another in a line. Each box, or cell, is labeled with a unique number, which increments sequentially; this is the address of the cell, its memory location.

Each cell holds a single value. If you know the memory address of a cell, you can go to that cell and read its contents. You can place a value in that cell; replacing anything that was in there previously.

That’s all there is to know about memory. Everything the CPU does is expressed as fetching and depositing values into memory cells.

What is a variable?

To write a program that retrieves the value stored in memory location 200, multiples it by 3 and deposits the result into memory location 201, we could write something like this in pseudocode:

  • retrieve the value stored in address 200 and place it in the CPU.
  • multiple the value stored in the CPU by 3.
  • deposit the value stored in the CPU into memory location 201.


This is exactly how early programs were written; programmers would keep a list of memory locations, who used it, when, and what the value stored there represented.

Obviously this was tedious and error prone, and meant every possible value stored in memory had to be assigned an address during the construction of the program. Worse, this arrangement made it difficult to allocate storage to variables dynamically as the program ran– just imagine if you had to write large programs using only global variables.

To address this, the notion of a variable was created. A variable is just a convenient, alphanumeric pseudonym for a memory location; a label, or nickname.

Now, rather than talking about memory locations, we can talk about variables, which are convenient names we give to memory locations. The previous program can now be expressed as:

  • Retrieve the value stored in variable a and place it in the CPU.
  • multiple it by 3
  • deposit the value into the variable b.

This is the same program, with one crucial improvement–because we no longer need to talk about memory locations directly, we no longer need to keep track of them–that drudgery is left to the compiler.

Now we can write a program like

var a = 6
var b = a * 3

And the compiler will make sure that the variables a and b are assigned unique memory locations to hold their value for as long as needed.

What is a pointer?

Now that we know that memory is just a series of numbered cells, and variables are just nicknames for a memory location assigned by the compiler, what is a pointer?

A pointer is a variable that holds the address of another variable.

The pointer points to memory address of a variable, just as a variable represents the memory address of value.

Let’s have a look at this program fragment

func main() {
	a := 200
	b := &a
	*b++
	fmt.Println(a)
}

On the first line of main we declare a new variable a and assign it the value 200.

Next we declare a variable b and assign it the address a. Remember that we don’t know the exact memory location where a is stored, but we can still store a‘s address in b.

The third line is probably the most confusing, because of the strongly typed nature of Go. b contains the address of variable a, but we want to increment the value stored in a. To do this we must dereference b, follow the pointer from b to a.

Then we add one the value, and store it back in the memory location stored in b.

The final line prints the value of a, showing that it has increased to 201.

Conclusion

If you are coming from a language with no notion of pointers, or where every variable is implicitly a pointer don’t panic, forming a mental model of how variables and pointers relate takes time and practice. Just remember this rule:

A pointer is a variable that holds the memory address of another variable.

Why Slack is inappropriate for open source communications

Full disclosure: my employer makes a Slack alternative. All my concerns about the use of Slack type chat services apply equally to its competitors, including my employer’s.


I’ve tweeted a few times about my frustration with the movement of open source projects from open, asynchronous, communication tools like forums, mailing lists, and issue trackers, to closed, synchronous communication services like Slack. This post is a long form version of my gripe.

What is Slack good for?

Before I stick the boot in, let’s talk about the good things about synchronous chat applications like Slack, HipChat, and so on.

In a work context, chat applications take the place of @staff email blasts about fire system testing, broken lifts, and spontaneous availability of baked goods. This is a good thing as this kind of company spam is often impossible to unsubscribe from.

In the context of an open source project, Slack, HipChat, Gitter, etc, provide a forum for advocacy, gossip, informal discussion, and support. My complaints start when Slack and friends are promoted as the recommended way to communicate with the project.

Why is Slack bad for open source communication?

My complaint about the growing use of chat services like Slack, HipChat, and so on, for communication by open source projects is that these services are not open. As I see it there are two issues:

  1. Slack, et al, are paid services with closed memberships. Sure, there are lots of little apps running on Heroku dyno’s that automate the “send me an invite” process, but fundamentally these are closed systems.

    This means that the content inside those systems is closed. I cannot link to a discussion in a Slack channel in a tweet. I cannot refer to it in an issue report, and I cannot cite it in a presentation. Knowledge is silo’d to those who have the time and ability to participate in chat services in real time.

  2. Slack, et al, are based on synchronous communication, which discriminate against those who do not or can not take part of the conversation in real time. For example, real time chat discriminates against those who aren’t in the same time zone–you can’t participate fully in an open source project if all the discussion happens while you’re asleep.

    Even if you are in the same time zone, real time chat assumes a privilege that you have the spare time–or an employer who doesn’t mind you being constantly distracted–to be virtually present in a chat room. Online chat clients are resource hogs, and presume the availability of a fast computer and ample, always on, internet connection, again raising the bar for participation.

In my view these issues are inseparable. Calls to use IRC instead, miss the point that IRC is similarly real-time, just as efforts to create a post facto log of a Slack channel miss the fact that this is a record of a conversation which others cannot contribute equally. There is no solution for equitable open source communication that does not address both simultaneously.

Prefer asynchronous communication for open source projects

Instead of closed, synchronous, systems I recommend open source projects stick to asynchronous communication tools that leave a publicly linkable, searchable, url. The tools that fit this requirement best are; mailing list, issue trackers, and forums.

Why Go?

A few weeks ago I was asked by a friend, “why should I care about Go”? They knew that I was passionate about Go, but wanted to know why I thought other people should care. This article contains three salient reasons why I think Go is an important programming language.

Safety

As individuals, you and I may be perfectly capable of writing a program in C that neither leaks memory or reuses it unsafely. However, with more than 40 years of experience, it is clear that collectively, programmers working in C are unable to reliably do so en masse.

Despite static code analysis, valgrind, tsan, and -Werror being available for a decades, there is scant evidence those tools have achieved widespread acknowledgement, let alone widespread adoption. In aggregate, programmers have shown they simply cannot safely manage their own memory. It’s time to move away from C.

Go does not rely on the programmer to manage memory directly, instead all memory allocation is managed by the language runtime, initialized before use, and bounds checked when necessary. It’s certainly not the first mainstream language that offered these safety guarantees, Java (1995) is probably a contender for that crown. The point being, the world has no appetite for unsafe programming languages, thus Go is memory safe by default.

Developer productivity

The point at which developer time became more expensive than hardware time was crossed back in the late 1970s. Developer productivity is a sprawling topic but it boils down to this; how much time do you spend doing useful work vs waiting for the compiler or hopelessly lost in a foreign codebase.

The joke goes that Go was developed while waiting for a C++ program to compile. Fast compilation is a key feature of Go and a key recruiting tool to attract new developers. While compilation speed remains a constant battleground, it is fair to say that compilations which take minutes in other languages, take seconds in Go.

More fundamental to the question of developer productivity, Go programmers realise that code is written to be read and so place the act of reading code above the act of writing it. Go goes so far as to enforce, via tooling and custom, that all code by formatted in a specific style. This removes the friction of learning a project specific language sub-dialect and helps spot mistakes because they just look incorrect.

Due to a focus on analysis and mechanical assistance, a growing set of tools that exist to spot common coding errors have been adopted by Go developers in a way that never struck a chord with C programmers—Go developers want tools to help them keep their code clean.

Concurrency

For more than a decade, chip designers have been warning that the free lunch is over. Hardware parallelism, from the lowliest mobile phone to the most power hungry server, in the form of more, slower, cpu cores, is only available if your language can utilise them. Therefore, concurrency needs to be built into the software we write to run on today’s hardware.

Go takes a step beyond languages that expose the operating system’s multi-process or multi-threading parallelism models by offering a lightweight concurrency model based on coroutines, or goroutines as they are known in Go. Goroutines allows the programmer to eschew convoluted callback styles while the language runtime makes sure that there will be just enough threads to keep your cores active.

The rule of three

These were my three reasons for recommending Go to my friend; safety, productivity, and concurrency. Individually, there are languages that cover one, possibly two of these domains, but it is the combination of all three that makes Go an excellent choice for mainstream programmers today.

I’m speaking at GopherChina and GopherCon Singapore

In April and May I’ll be speaking at GopherChina and GopherCon Singapore, respectively. This post is a teaser for the talks that were selected by the organisers. If you’re in the area, I hope you’ll come and hear me speak.

GopherChina

GopherChina is the third event in this conference series and this year will return to Shanghai. I was lucky to attend the event in 2016 and am looking forward to 2017.

The hidden #pragmas of Go

Go isn’t like C. It doesn’t have a preprocessor, it doesn’t have macros, and it certainly doesn’t have #define, but Go does have pragmas.

What are pragmas? The name come from the #pragma declaration that tells C compilers to alter their interpretation of a piece of code. Now, Go doesn’t have a #pragma directive, but it does have ways of altering the operation of the Go compiler via directive syntax hidden in comments.

This talk will explore the history of these directives, how and why they are used, and how you can, but probably shouldn’t, use them in your own code.

GopherCon Singapore

GopherCon Singapore is the latest in the GopherCon franchise, and as flight times go, relatively close to home. I’m delighted to have the opportunity to present at their inaugural conference in May.

Concurrency made Easy

In my experience, many people who come to Go do so because they have a problem where being able to run more than one task at a time in their program would be beneficial. Ruby and Python programmers come to Go because the concurrency story is much better, the same is true of Node programmers; the event loop is still inherently single threaded.

But, most programmers who stick with Go for a while tend to look back on their early efforts and say things like “wow, I really went overboard with channels” or “I went crazy with goroutines when I started writing Go. It was impossible to understand what the program did”. For people who learn Go formally from an instructor or a book, the concurrency section is always the last section they cover.

So there is a dichotomy here. Go’s headline feature is simple, lightweight concurrency. As a product the language sells itself on that feature alone. On the other hand, there is a narrative that concurrency isn’t actually that easy to use, otherwise people wouldn’t make it the last things in their books or classes, or perhaps more accurately, concurrency is not the solution to every problem.

With this as a background, I’d like to explore some strategies for using concurrency in Go without the pitfalls of convoluted code, the importance of memory ownership, and the best way to structure a Go program using goroutines.

Context is for cancelation

In my previous post I suggested that the best way to break the compile time coupling between the logger and the loggee was passing in a logger interface when constructing each major type in your program. The suggestion has been floated several times that logging is context specific, so maybe a logger can be passed around via a context.Context. I think this suggestion is flawed (as are most uses of context.Value, but that’s another story). This post explains why.

context.Value() is goroutine thread local storage

Using context.Context to pass a logger into a function is a poor design pattern. In effect context.Context is being used as a conduit to arbitrarily extend the API of any method that takes a context.Context value. It’s like Python’s **kwargs, or whatever the name is for that Ruby pattern of always passing a hash. Using context.Context in this way avoids an API break by smuggling data in the unstructured bag of values attached to the context. It’s thread local storage in a cheap suit.

It’s not just that values are boxed into an interface{} inside context.WithValue that I object to. The far more serious concern is there is no schema to this data, so there is no way for a method that takes a context to ensure that it contains the specific key required to complete the operation. context.Value returns nil if the key is not found, which means any code doing the naïve

log := ctx.Value("logger").(log.Logger)
log.Warn("something you'll ignore later")

will blow up if the "logger" key is not present.

Sure, you can check that the assertion succeeded, but I feel pretty confident that if this pattern were to become popular then people would eschew the two arg form of type assertion and just expect that the key always returned a valid logger. This would be especially true as logging in error paths is rarely tested, so you’ll hit this when you need it the most.

In my opinion passing loggers inside context.Context would be the worst solution to the problem of decoupling loggers from implementations. We’d have gone from an explicit compile time dependency to an implicit run time dependency, one that could not be enforced by the compiler.

To quote @freeformz

Loggers should be injected into dependencies. Full stop.

It’s verbose, but it’s the only way to achieve decoupled design.

The package level logger anti pattern

This post is a spin-off from various conversations around improving (I’m trying not to say standardising, otherwise I’ll have to link to XKCD) the way logging is performed in Go projects.

Consider this familiar pattern for establishing a package level log variable.

package foo

import “mylogger”

var log = mylogger.GetLogger(“github.com/project/foo”)

What’s wrong with this pattern?

The first problem with declaring a package level log variable is the tight coupling between package foo and package mylogger. Package foo now depends directly on package mylogger at compile time.

The second problem is the tight coupling between package foo and package mylogger is transitive. Any package that consumes package foo is itself dependant on mylogger at compile time.

This leads to a third problem, Go projects composed of packages using multiple logging
libraries, or fiefdoms of projects who can only consume packages that use their particular logging library.

Avoid source level coupling

The solution to this anti pattern is to delay the binding between the type that does the logging, and the type that needs to log, until it is needed. That is, until the variable is declared.

package foo

import "github.com/pkg/log"

type T struct {
        logger log.Logger
        // other fields
}

Now, the consumer of  type T supplies a value of type log.Logger when constructing new T‘s, and the methods on T use the logger they were provided when they want to log.

Interfaces to the rescue

The eagle eyed reader will note that the previous selection removed the package level log variable, but the coupling between package foo and package log remains.

However, this can be remedied by the consumer of the logger type declaring its own interface for the behaviour it expects.

package foo

type logger interface {
        Printf(string, ...interface{})
}

type T struct {
        logger
        // other fields
}

As long as the type assigned to foo.T.logger implements foo.logger the decision for which specific type to use can be deferred until run time in the same way that io.Copy escapes any knowledge of the io.Reader and io.Writer implementations in use until it is invoked.

It’s not just logging

Logging is a cross cutting concern, but the anti patterns associated with it also apply to other common areas like metrics, telemetry, and auditing.

Get involved

The Go 1.9 development window is opening next month. If this topic is important to you, get involved.

Never start a goroutine without knowing how it will stop

In Go, goroutines are cheap to create and efficient to schedule. The Go runtime has been written for programs with tens of thousands of goroutines as the norm, hundreds of thousands are not unexpected. But goroutines do have a finite cost in terms of memory footprint; you cannot create an infinite number of them.

Every time you use the go keyword in your program to launch a goroutine, you must know how, and when, that goroutine will exit. If you don’t know the answer, that’s a potential memory leak.

Consider this trivial code snippet:

ch := somefunction()
go func() {
        for range ch { }
}()

This code obtains a channel of int from somefunction and starts a goroutine to drain it. When will this goroutine exit? It will only exit when ch is closed. When will that occur? It’s hard to say, ch is returned by somefunction. So, depending on the state of somefunction, ch might never be closed, causing the goroutine to quietly leak.

In your design, some goroutines may run until the program exits, for example a background goroutine watching a configuration file, or the main conn.Accept loop in your server. However, these goroutines are rare enough I don’t consider them an exception to this rule.

Every time you write the statement go in a program, you should consider the question of how, and under what conditions, the goroutine you are about to start, will end.

Thinking about $GOPATH

This is a short blog post about my thoughts on using Go in anger through several workplaces, as a developer and an advocate.

What is $GOPATH?

Back when Go was first announced we used Makefiles to compile Go code. These Makefiles referenced some shared logic stored in the Go distribution. This is where $GOROOT comes from.

Back then, if you wrote Go code, you’d probably also used these Makefiles, and while you could check out your source code anywhere, most people would put their own Go code in what today we’d call $GOROOT/src as you must’ve compiled Go from source, so this directory was always going to be present.

Towards the 1.0 release goinstall, then go get, solidified the use of domain names in import paths to provide a globally unique namespace. These tools introduced a new location into which Go code would be fetched. This location was separate from $GOROOT to make clear the distinction between code provided by the Go project, and code written by the developer. By the time Go 1.1 was released in 2013, $GOROOT was removed as a fallback option.

Why does $GOPATH exist?

$GOPATH exists for two main reasons:

  1. In Go, the import declaration references a package via its fully qualified import path. $GOPATH exist so that from any directory inside $GOPATH/src the go tool can compute the absolute import path of the package in question.1
  2. A location to store dependencies fetched by go get.

Having a per user $GOPATH environment variable also means developers could use the go tool from any directory on their system to build, test and install code, but I suspect only a minority utilise this feature.

What’s wrong with $GOPATH?

In my experience, many newcomers to Go are frustrated with the single workspace $GOPATH model. They are confused that $GOPATH doesn’t let them check out the source of a project in a directory of their choice like they are used to with other languages. Additionally, $GOPATH does not let the developer have more than one copy of a project (or its dependencies)  checked out at the same time without having to update $GOPATH constantly.

I think it is important to recognise that these issues are legitimate points of confusion for many newcomers (including those on the Go team) and act as a drag on Go adoption. As we’re on the cusp of a blessed dependency management tool for Go, I think it’s equally important to continue to question the base assumptions that this new tool will build on, namely requiring a $GOPATH.

In my opinion, any Go build tool needs to provide (in addition to actually building and testing code) a way for Go code checked out in an arbitrary location on disk to recover its intended fully qualified import path; the path other code will import it as.

The $GOPATH model answers this question by subtracting the prefix of $GOPATH/src from the path to the directory of the current package; the remainder is the package’s fully qualified import path. This is why if you check out a package outside a $GOPATH workspace, the go tool cannot figure out the packages’ fully qualified import path and everything falls apart.

What are some alternatives to $GOPATH?

I attempted to address both issues with gb, which gives developers the ability to check out a project anywhere you want, but has no solution for libraries, and gb projects were not go gettable. However gb showed that writing a new build tool that did not wrap the go tool meant it was not forced to reorganise the world to fit into the $GOPATH model allowing gb users to include the source of all their dependencies in their project without the pitfalls of the Go 1.6’s vendor/ directory.

Recently, on a suggestion from Bill Kennedy, I built an experimental build tool that recorded the expected import prefix in a manifest file. That prefix, rather than one computed by $GOPATH directory arithmetic, is used to determine the fully qualified import path.

I’m working on a similar tool (unfinished) based on a suggestion from Brad Fitzpatrick that uses the .git directory as a sentinel to determine the root of the project and hopefully infer the full import path from the git remote configuration.

While these experiments are unfinished, both demonstrate that you can avoid the $GOPATH restrictions and retain compatibility with the go get ecosystem. Potentially in the case of Kodos, even avoid a manifest file.

Conclusion

Kang and Kodos use a lot of forked code from gb, which I hope to rectify over the new years’ break. If you are interesting in contributing or better yet, building your own Go tool to explore this problem space, Kang, Kodos, and gb are permissively licensed.


Notes:

  1. This is notably different from the way imports work in scripting languages like Python and Ruby, which use directly scanning and inserting onto a global search path source code directories.

Declaration scopes in Go

This post is about declaration scopes and shadowing in Go.

package main

import "fmt"

func f(x int) {
	for x := 0; x < 10; x++ {
		fmt.Println(x)
	}
}

var x int

func main() {
	var x = 200
	f(x)
}

This program declares x four times. All four are different variables because they exist in different scopes.

package main

import "fmt"

func f() {
	x := 200
	fmt.Println("inside f: x =", x)
}

func main() {
	x := 100
	fmt.Println("inside main: x =", x)
	f()
	fmt.Println("inside main: x =", x)
}

In Go the scope of a declaration is bound to the closest pair of curly braces, { and }. In this example, we declare x to be 100 inside main, and 200 inside f.

What do you expect this program will print?

package main

import "fmt"

func main() {
	x := 100
	for i := 0; i < 5; i++ {
		x := i
		fmt.Println(x)
	}
	fmt.Println(x)
}

There are several scopes in a Go program; block scope, function scope, file scope, package scope, and universe scope. Each scope encompasses the previous. What you are seeing is called shadowing.

var x = 100

func main() {
        var x = 200
        fmt.Println(x)
}

Most developers are comfortable with a function scoped variable shadowing a package scoped variable.

func f() {
        var x = 99
        if x > 90 {
                x := 60
                fmt.Println(x)
        }
}

But a block scoped variable shadowing a function scoped variable may be surprising.

The justification for a declaration in one scope shadowing another is consistency, prohibiting just block scoped declarations from shadowing another scope, would be inconsistent.

Simulating minicomputers on microcontrollers

This is a short blog post to reference the slides from my builderscon 2016 presentation.

I had a great time at buildercon, the talks were varied and engaging from a wide selection of Japanese makers. I’m grateful to the builderscon organisers for accepting my talk and inviting me to present at the inaugural builderscon conference in Tokyo, Japan.

Slides:

Further reading: