Author Archives: Dave Cheney

About Dave Cheney

A chaotic neutral System Administrator with super cow powers. My weapons are: * fear * cynicism * an almost fanatical devotion to the command line

Wednesday pop quiz: spot the race

The following program contains a data race

package main

import (

type RPC struct {
        result int
        done   chan struct{}

func (rpc *RPC) compute() {
        time.Sleep(time.Second) // strenuous computation intensifies
        rpc.result = 42

func (RPC) version() int {
        return 1 // never going to need to change this

func main() {
        rpc := &RPC{done: make(chan struct{})}

        go rpc.compute()         // kick off computation in the background
        version := rpc.version() // grab some other information while we're waiting
        <-rpc.done               // wait for computation to finish
        result := rpc.result

        fmt.Printf("RPC computation complete, result: %d, version: %d\n", result, version)

Where is the data race, and what is the smallest change that will fix it ?


The example above is derived from a larger, and more confusing example. You may be interested in the original race report.

The Legacy of Go

In October this year I had the privilege of speaking at the GothamGo conference in New York City. As I talk quite softly, and there were a few problems with the recording, I decided to write up my slide notes and present them here.

If you want to see the video of this presentation, you can find it on youtube.

The Legacy of Go

I want to open with a question — How will Go be remembered ? 

In posing this question I do not wish to appear morbid, or to suggest that Go’s best days are already behind it. Rather, in the spirit of this conference’s theme, I want to speculate on what will be the biggest mark Go will leave on our profession.

So to restate the question —  What will be the legacy of Go ?

To set the stage to answer this question, let’s look at some historical examples.


The C Programming LanguageC, the progenitor of an entire family of curly brace languages. If you were to briefly summarise C’s legacy, how would you describe it ? 

One of my favourite descriptions of C is a portable high level assembly language. C certainly wasn’t the first high level language, but it was one of the first to take portability seriously. Additionally C has the distinction of being the language used to write the operating system of every computer in this room, and probably many of the micro controllers as well.

Before C there was assembly language, and there is no indication that we are in the period that could be classified as after C.

This is C’s legacy.


The C++ Programming LanguageLet’s try another. How would you describe C++’s legacy ?

I think for many C++ is considered the Rolls Royce of mainstream object orientation.

C++ also codified the ideas of zero cost abstractions. In fact, the only abstractions C++ developers prefer to use are the ones that come with no cost, assuming of course that you don’t consider compile time a cost.

Ruby on Rails

Ruby on RailsFor my third example, something a little different. Maybe you haven’t used Ruby on Rails, but I’d wager you’ve probably heard of it. How would you describe the legacy of Ruby on Rails ? 

My take away from Rails was the standardised project layout. Every Rails project had models in the models directory, controllers in the controllers directory, and views in the views directory. This was Rails’ mantra of convention over configuration

Compare this to the previous generation of web frameworks, all different in ways that are at best described as belligerent.

Now, post Rails, all web frameworks look alike. Words like routing, controllers, middleware, assets, are codified in our lexicon because of Rails.  This is the sort of legacy I’m talking about.

So, now I’ve established a framework, let’s turn to the conference’s namesake.

A simple programming language

At the start of the year I had the opportunity to give a talk in India entitled Simplicity and Collaboration.  As I was lucky enough to be giving the closing keynote this gave me an opportunity to try a real table thumping call to action.

I mean, who doesn’t want to be simple ? And what better way to frame a debate as simple; good, complexity; obviously bad. Could we say then that simplicity will be Go’s lasting legacy ?

History of Programming LanguagesPerhaps, but perhaps our frame of reference is a little skewed. In preparing this talk, I found numerous anecdotes from language designers, who, reflecting back on their own achievements, lamented complexity’s siren song.

Some argued that the solution to complexity was abstraction, others felt abstraction itself was the root of the problem. However, unanimously these luminaries believed that complexity must be avoided, and cautioned others to strive for simplicity in their designs.

Is Go is the language which delivers this long sought after promise ? I think it’s probably too early to say. The signs are positive, but this is probably not going to be the thing that I believe people will remember most about Go.

Fitness for purpose

Go will certainly not be remembered as an academic language, it breaks only the minimum of new ground, preferring instead to consolidate on a corpus of proven ideas.

One aspect which is contributing to our language’s success is what I term its fitness for purpose.

As Rob Pike wrote in 2012, Go is a language designed to integrate with an existing software development ecosystem. I believe Go’s popularity is due in large part to the care its designers gave to every aspect of the language’s interaction with the complete software development life cycle.

But sadly, this is not what I believe what people will remember our language for, because few outside the Go community appear to appreciate the holistic nature of the Go’s design.

However, in discussing the motivations that drove the design of the language, we see a clue to its possible legacy.


I think it is the tooling that has grown, not in spite of the language, but in deliberate symbiosis, which deserves recognition.

In his opening keynote at Gophercon this year Russ Cox spoke about the need for mechanical refactoring and code generation to be indistinguishable from code written by a human. In particular I feel go fmt deserves most of the credit here. While more powerful translations are possible, the low barrier of entry to using gofmt has ensured its ubiquity.

It’s not enough that the code is well formatted according to local custom, but instead that there is precisely one way Go code should be formatted. This cannot be understated.

The result is nowadays all Go code is go formatted, and the little which is not is viewed with deep suspicion.

Just as no web framework will call a controller something different, I believe that no future language will be considered complete without a canonical style, and a tool to enforce it.

Pop Culture

Pop CultureEarlier this year my colleague Katherine Cox-Buday gave a talk at Gophercon where she built upon a conjecture by Alan Kay to illustrate some concerning dogma that she observed amongst Go developers.

I enjoyed this talk very much, especially Kay’s rebuttal that our profession is not a science but a pop culture.

We live in what has been described as the information age. An age of digitisation. An age of the transistor and the computer. These are pervasive forces reshaping our society. Thus I think it is impossible to separate the role of those who program, from the impact of the computer itself.

Accepting Kay’s critique of our industry, pop culture is responsible for some of the most iconic ideas in our society, and in answering the question of Go’s own legacy, it makes good sense to investigate the role of the programmer with a wider social context.

To explain what I mean, permit me to digress for a moment.

Denim, sunglasses and portable music

In 1870 Jacob Davis, a Californian tailor, was asked by one of his customers to create a hard wearing pair of trousers for her husband.

PatentBy reinforcing the weak points on the seams and pockets with copper rivets, Davis created a durable garment that became an overnight sensation.

Within a year he was unable to keep up with demand and approached his supplier of denim, one Levi Strauss, for financial support in patenting his design.

Today we know them simply as Levi jeans. Although Davis’ signature rivet was later removed over concerns that it was damaging the furniture, the signature Levi 501 became a cultural icon.

A symbol of rugged individualism, and smart-casual iconoclasm.

In 1936, Bausch and Lomb, a medical instrument company from Rochester, working under contract for the US army, produced a pair of sunglasses designed to aid pilots suffering from eye strain and migraines.

Initially called the “anti-glare”, but renamed a year later when the eyewear division was spun out into a subsidiary, the “ray ban” company, we now know them, released their signature Ray Ban 3025 Aviator.

With the help of strong promotional efforts, Ray Ban’s Aviators became synonymous with action and adventure. The hero, and the maverick.


In 1979, the Sony Corporation released the TPS-L2. Better known in most markets as the Walkman, Sony had created a new genre of music consumption.


The walkman arrived at the perfect time to ride a wave of lifestyle advertising, allowing everyone to be a music enthusiast, not just the audiophile shut in.

Sony effectively created the idea of personal music, liberating it from the tyranny of the radio disc jockeys, and making it private, personal, and portable. But Sony’s hubris, and a misplaced focus on the music producers, not consumers, caused them to stumble with the mini disc. On that last point, I’ll leave you to draw your own parallels with the software industry.

These are three unrelated tales which weave a story of cultural memes in our society.

Aviator sunglasses, created for a utilitarian purpose have continued to represent an heroic, self confident, ideal. The same could be said for denim jeans. Both have continued to be remixed and riffed on in a way that would have pleased Warhol.

We see similar patterns in language design. Languages are not immune to the whim of popular fashion, unless of course they are Lisp, the perennial tie dye of languages.

A new language has to be sufficiently different

A new language, to be successful, has to be sufficiently different. Being only a minor improvement on a theme makes it too hard to capture mindshare.

Go epitomises this, it is a language that is heavily informed by the past, and at the same time, different enough to justify overcoming the opportunity cost of change.

A trend toward static typing

Types, like denim jeans, are the mainstay of our industry. Types have been decomposed, algebratised, templated, rejected, inferred, pushed to the side, then rediscovered. Types, like kitschy sunglasses or ripped jeans, may go out of style for a time, but never for very long.

Go sits atop the crest of a number of popular waves, one of these is a movement towards static type checking which other languages are now retrofitting. Retrofitting, not for performance, mind you — type inference has pretty much solved that problem, but instead for the productivity of their programmers.

But, I think it is unlikely that Go will be the poster child of this movement, it is merely a participant, and few medals are given out for simply being present.


For me, it is Go’s interfaces, which represent a refinement of C++’s virtual base classes, an evolution through Java’s interface type, and have reached a point where they are divorced completely from both value and hierarchy.

Interfaces represent pure behaviour. In some ways Go’s interfaces are closer to Smalltalk’s vision of objects; defined not by class membership, but instead by behaviour.

I believe interfaces are the iconic feature of Go. They represent a refinement of many previous attempts but are themselves unique among mainstream languages.

So that’s two possible ideas for the legacy of Go, I hope you will indulge me one more.

The things Go took away

In his essay The last programming language, Robert C. Martin asks:

Are languages successful because they offer programmers more choice, or are they successful for the opposite reason, they remove choice ?

And while I’ll have to disagree with Martin’s conclusion that Clojure represents the last programming language, he does make a compelling argument.

In hindsight, the programming languages which have been successful, which have been remembered, and which have established a legacy, are the languages which have successfully removed a commonly accepted tenet of the programming establishment.

Languages like FORTRAN and COBOL removed the requirement to program directly in assembly language, despite howls that efficient programs could never be written in a high level language.

A decade later, languages like Pascal, PL/I and Algol removed direct transfer of control, replacing it with the pillars of structured programming; sequence, selection and iteration, and they did so to cries that a real programmer would have goto prised from their cold dead fingers.

Can we find supporting evidence of Go’s legacy in the things that it chose to remove ?


I think a strong contender could be a lack of inheritance; Go took away subtypes. Everyone knows composition is more powerful than inheritance, Go just makes this non optional.

In the cacophony of hand wringing over a lack of templated types, nobody seems to be complaining that a lack of inheritance is hurting their ability to write programs in Go. It seems that nobody missed inheritance much after all.

However it also seems to me that people are not going to remember Go for taking away something that they never missed in the first place.


Go is a curly braced, block structured language, but with a cute trick of the lexer. The semicolons are still there, we just hide them from the author. But, this is also not a new trick. Javascript made semicolons optional, and sometimes it works.

Go’s implementation is also not an original idea, it was taken directly from Martin Richard’s BCPL.

So while Go finally removed semicolons, I don’t think it can claim credit for the idea.


In my mind, of all the possible candidates that Go has removed, it is the removal of threads that will be its most profound contribution.

This is not to say that Go programs do not use threads, any more than you can say structured programs are not compiled into branch and jump instructions.

But Go programmers no longer have to concern themselves with thread management, or as Uncle Bob would say, Go programmers are restricted from directly controlling the thread their code runs on.

Co-incidentally, the removal of threads from the Go programmers’ model means the removal of a requirement to care about the stack, unlocking the much older technique of recursion as an alternative to state machines or mutable state.

Goroutines are cheap, so cheap that we can structure our programs in a straightforward imperative fashion without having to worry about the overhead of one operating system thread per goroutine.

Go took away many things, but in removing threads and a need to care about the stack, replacing them instead with a more coherent idea of goroutines and channels, I think it has made its most powerful mark.

gofmt, interfaces, goroutines

To recap, go fmt, interfaces, and goroutines. Some of these are new ideas, others are not.

The goal of this presentation was not to identify three new innovations in Go, but rather to attempt to predict how a future generation of programmers will look back upon Go’s legacy.

Go is still young, with a long productive life ahead of it, and that means that almost all of the Go code that will be written, has yet to be written.

Similarly, while the community of Go users is growing, compared to the number of people who will use Go during its lifetime, we are but a tiny fraction. Therefore we should optimise for this larger group of people who have yet to write any Go code.

This means more examples, more blog posts, and more books.

This means more education and more training — I think Go is a fantastic language to teach the theory and the practice of programming, and we’ve barely scratched the surface here.

This means more user groups, more conferences, and more diversity — there is so much more to be done here, and we should look to established language communities, like Python for example, for guidance.

And this is the part where you come in. This is the time for you to lean in. This is the time for you to get involved.

Because this is your opportunity to decide how you want Go to be remembered.

Thank you.

Let’s talk about logging

This is a post inspired by a thread that Nate Finch started on the Go Forum. This post focuses on Go, but if you can see your way past that, I think the ideas presented here are widely applicable.

Why no love ?

Go’s log package doesn’t have leveled logs, you have to manually add prefixes like debug, info, warn, and error, yourself. Also, Go’s logger type doesn’t have a way to turn these various levels on or off on a per package basis. By way of comparison let’s look at a few replacements from third parties.

glog from Google provides the following levels:

  • Info
  • Warning
  • Error
  • Fatal (which terminates the program)

Looking at another library, loggo, which we developed for Juju, provides the following levels:

  • Trace
  • Debug
  • Info
  • Warning
  • Error
  • Critical

Loggo also provides the ability to adjust the verbosity of logging on a per package basis.

So here are two examples, clearly influenced by other logging libraries in other languages. In fact their linage can be traced back to syslog(3), maybe even earlier. And I think they are wrong.

I want to take a contradictory position. I think that all logging libraries are bad because the offer too many features; a bewildering array of choice that dazzles the programmer right at the point they must be thinking clearly about how to communicate with the reader from the future; the one who will be consuming their logs.

I posit that successful logging packages need far less features, and certainly fewer levels.

Let’s talk about warnings

Let’s start with the easiest one. Nobody needs a warning log level.

Nobody reads warnings, because by definition nothing went wrong. Maybe something might go wrong in the future, but that sounds like someone else’s problem.

Furthermore, if you’re using some kind of leveled logging then why would you set the level at warning ? You’d set the level at info, or error. Setting the level to warning is an admission that you’re probably logging errors at warning level.

Eliminate the warning level, it’s either an informational message, or an error condition.

Let’s talk about fatal

Fatal level is effectively logging the message, then calling os.Exit(1). In principal this means:

  • defer statements in other goroutines don’t run.
  • buffers aren’t flushed.
  • temporary files and directories aren’t removed.

In effect, log.Fatal is a less verbose than, but semantically equivalent to, panic.

It is commonly accepted that libraries should not use panic1, but if calling log.Fatal2 has the same effect, surely this should also be outlawed.

Suggestions that this cleanup problem can be solved by registering shutdown handlers with the logging system introduces tight coupling between your logging system and every place where cleanup operations happen; its also violates the separation of concerns.

Don’t log at fatal level, prefer instead to return an error to the caller. If the error bubbles all the way up to main.main then that is the right place to handle any cleanup actions before exiting.

Let’s talk about error

Error handling and logging are closely related, so on the face of it, logging at error level should be easily justifiable. I disagree.

In Go, if a function or method call returns an error value, realistically you have two options:

  • handle the error.
  • return the error to your caller. You may choose to gift wrap the error, but that is not important to this discussion.

If you choose to handle the error by logging it, by definition it’s not an error any more — you handled it. The act of logging an error handles the error, hence it is no longer appropriate to log it as an error.

Let me try to convince you with this code fragment:

err := somethingHard()
if err != nil {
        log.Error("oops, something was too hard", err)
        return err // what is this, Java ?

You should never be logging anything at error level because you should either handle the error, or pass it back to the caller.

To be clear, I am not saying you should not log that a condition occurred

if err := planA(); err != nil {
        log.Infof("could't open the foo file, continuing with plan b: %v", err)

but in effect log.Info and log.Error have the same purpose.

I am not saying DO NOT LOG ERRORS! Instead the question is, what is the smallest possible logging API ? And when it comes to errors, I believe that an overwhelming proportion of items logged at error level are simple done that way because they are related to an error. They are in fact, just informational, hence we can remove logging at error level from our API.

What’s left ?

We’ve ruled out warnings, argued that nothing should be logged at error level, and shown that only the top level of the application should have some kind of log.Fatal behaviour. What’s left ?

I believe that there are only two things you should log:

  1. Things that developers care about when they are developing or debugging software.
  2. Things that users care about when using your software.

Obviously these are debug and info levels, respectively.

log.Info should simply write that line to the log output. There should not be an option to turn it off as the user should only be told things which are useful for them. If an error that cannot be handled occurs, it should bubble up main.main where the program terminates. The minor inconvenience of having to insert the FATAL prefix in front of the final log message, or writing directly to os.Stderr with fmt.Fprintf is not sufficient justification for a logging package growing a log.Fatal method.

log.Debug, is an entirely different matter. It is for the developer or support engineer to control. During development, debugging statements should be plentiful, without resorting to trace or debug2 (you know who you are) level. The log package should support fine grained control to enable or disable debug, and only debug, statements at the package or possibly even finer scope.

Wrapping up

If this were a twitter poll, I’d ask you to choose between

  • logging is important
  • logging is hard

But the fact is, logging is both. The solution to this problem must be to de-construct and ruthlessly pair down unnecessary distraction.

What do you think ? Is this just crazy enough to work, or just plain crazy ?


  1. Some libraries may use panic/recover as an internal control flow mechanism, but the overriding mantra is they must not let these control flow operations leak outside the package boundary.
  2. Ironically while it lacks a debug level output, the Go standard log package has both Fatal and Panic functions. In this package the number of functions that cause a program to exit abruptly outnumber those that do not.

Programming language markets

Last night at the Sydney Go Users’ meetup, Jason Buberel, product manager for the Go project, gave an excellent presentation on a product manager’s perspective on the Go project.

As part of his presentation, Buberel broke down the marketplace for a programming language into seven segments.

Programming Language market for Go, courtesy Jason Buberel

Programming language market for Go, courtesy Jason Buberel

As a thought experiment, I’ve taken Buberel’s market segments and applied them across a bunch of contemporary languages.

Disclaimer: I’m not a product manager, I’ve just seen one on stage.

Language Embedded and
Systems and
Server and
Web and mobile4 Big data and
scientific computing
Desktop applications5 Mobile applications
Go 0 0 3 2 1 16 1
Rust 1 1 0 2 0 26, 11 0
Java 214 0 2 3 3 27 3
Python 1 0 312 3 3 26, 10 0
Ruby 0 0 3 3 0 16 0
Node.js (Javascript / v8) 113 0 0 2 0 0 28
Objective-C / Swift 0 3 2 29 0 3 3
C/C++ 3 3 3 2 3 3 2

Is your favourite language missing ? Feel free to print this table out and draw in the missing row.

Scoring system: 0 – no presence, lack of interest or technical limitation. 1 – emerging presence or proof of concept. 2 – active competitor. 3 – market leader.


If there is a conclusion to be drawn from this rather unscientific study, every language is in competition to be the language of the backend. As for the other market segments, everyone competes with C and C++, even Java.


  1. The internet of things that are too small to run linux; micrcontrollers, arduino, esp8266, etc.
  2. Can you write a kernel, kernel module, or operating system in it ?
  3. Monitoring systems, databases, configuration management systems, that sort of thing.
  4. Web application backends, REST APIs, microservices of all sorts.
  5. Desktop applications, including games, because the mobile applications category would certainly include games.
  6. OpenGL libraries or SDL bindings.
  7. Swing, ugh.
  8. Phonegap, React Native.
  9. Who remembers WebObjects ?
  10. Python is a popular scripting language for games.
  11. Servo, the browser rendering engine is targeting Firefox.
  12. Openstack.
  13. Technically Lua pretending to be Javascript, but who’s counting.
  14. Thanks to @rakyll for reminding me about the Blu Ray drives, and j2me running in everyone’s credit cards.

Bootstrapping Go 1.5 on non Intel platforms

This post is a continuation of my previous post on bootstrapping Go 1.5 on the Raspberry Pi.

Now that Go 1.5 is written entirely in Go there is a bootstrapping problem — you need Go to build Go. For most people running Windows, Mac or Linux, this isn’t a big issue as the Go project provides installers for Go 1.4. However, if you’re using one of the new platforms added in Go 1.5; ARM64 or PPC64, there is no pre built installer as Go 1.4 did not support those platforms.

This post explains how to bootstrap Go 1.5 onto a platform that has no pre built version of Go 1.4. The process assumes you are building on a Darwin or Linux host, if you’re on Windows, sorry you’re out of luck.

We use the names host and target to describe the machine preparing the bootstrap installation and the non Intel machine receiving the bootstrap installation respectively.

As always, please uninstall any version of Go you may have on your workstation before beginning; check your $PATH and do not set $GOROOT.

Build Go 1.4 on your host

After uninstalling any previous version of Go from the host, fetch the source for Go 1.4 and build it locally.

% git clone $HOME/go1.4
% cd $HOME/go1.4/src
% git checkout release-branch.go1.4
% ./make.bash

This process will build Go 1.4 into the directory $HOME/go1.4.

  • This procedure assumes that Go 1.4 is built in $HOME/go1.4, if you choose to use another path, please adjust accordingly.
  • We use ./make.bash to skip running the full unit tests, you can use ./all.bash to run the unit tests if you prefer.
  • Do not add $HOME/go1.4/bin to your $PATH.

Build Go 1.5 on your host

Now you have Go 1.4 on your host, you can use that to bootstrap Go 1.5 on your host.

% git clone $HOME/go
% cd $HOME/go/src
% git checkout release-branch.go1.5
% env GOROOT_BOOTSTRAP=$HOME/go1.4 ./make.bash

This process will build Go 1.5 into the directory $HOME/go.

  • Again, we use ./make.bash to skip running the full unit tests, you can use ./all.bash to run the unit tests if you prefer.
  • You should add $HOME/go/bin to your $PATH to use the version of Go 1.5 you just built as your Go install.

Build a bootstrap distribution for your target

From your Go 1.5 installation, build a bootstrap version of Go for your target.

The process is similar to cross compiling and uses the same GOOS and GOARCH environment variables. In this example we’ll build a bootstrap for linux/ppc64. To build for other architectures, adjust accordingly.

% cd $HOME/go/src
% env GOOS=linux GOARCH=ppc64 ./bootstrap.bash
Bootstrap toolchain for linux/ppc64 installed in /home/dfc/go-linux-ppc64-bootstrap.
Building tbz.
-rw-rw-r-- 1 dfc dfc 46704160 Oct 16 10:39 /home/dfc/go-linux-ppc64-bootstrap.tbz

The bootstrap script is hard coded to place the output tarball two directories above ./bootstrap.bash, which will be $HOME/go-linux-ppc64-bootstrap.tbz in this case.

Now scp go-linux-ppc64-bootstrap.tbz to the target, and unpack it to $HOME.

  • This bootstrap distribution should only be used for bootstrapping Go 1.5 on the target.

Build Go 1.5 on your target

On the target you should have the bootstrap distribution in your home directory, ie $HOME/go-linux-ppc64-bootstrap. We’ll use that as our GOROOT_BOOTSTRAP and build Go 1.5 on the target.

% git clone $HOME/go
% cd $HOME/go/src
% git checkout release-branch.go1.5
% env GOROOT_BOOTSTRAP=$HOME/go-linux-ppc64-bootstrap ./all.bash

Now you’ll have Go 1.5 built natively on your target, in this case linux/ppc64, but this procedure has been tested on linux/arm64 and also linux/ppc64le.

Padding is hard

We all know that the empty struct consumes no storage, right ? Here is a curious case where this turns out to not be true.

Memory Profile

This shows the “after” graph, prior to CL 15522 each allocation was 320 bytes.

This is a story about trying to speed up the Go compiler. Since Go 1.5 we’ve had the great concurrent GC, which reduces the cost of garbage collection, but no matter how efficient a garbage collector is, it doesn’t make allocations free of cost.

The specific allocation I was targeting was the obj.Prog structure which represents a quasi machine code instruction late in the compile cycle. As you can see, for this compilation (which was cmd/compile/internal/gc) obj.Prog values constitute 34.72% of the total allocations for this program—reducing the size of obj.Prog will pay off in real terms.

obj.Prog itself contains a field of type obj.ProgInfo, which is what this blog post will focus on today. Here is the definition for obj.ProgInfo

type ProgInfo struct {
        Flags    uint32   // flag bits
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        _        struct{} // to prevent unkeyed literals

How much space will values of this type consume ? TL;DR, here is a table of the results:

Go 1.4.x 28 bytes 32 bytes
Go 1.5.x 32 bytes 40 bytes

Go 1.4 32 bit says the value is 28 bytes large, but Go 1.4 64 bit says it’s 32 bytes large. Worse, Go 1.5 32 bit says the value is 32 bytes, and Go 1.5 64 bit says it’s 40 bytes! This type uses fixed sized integer types, but its total size changes every time we compile it. What the heck is going on here ?


Alignment and padding

The reason for the difference between the 32 bit and 64 bit answers is alignment. The Go spec says the address of a struct’s fields must be naturally aligned. So on a 64 bit machine, the address of any uint64 field must be a multiple of 8 bytes. In effect the compiler sees this:

type ProgInfo struct {
        Flags    uint32   // flag bits
        _        [4]byte  // padding added by compiler
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        _        struct{} // to prevent unkeyed literals

On 32 bit machines, word boundaries occur every 4 bytes, so there is no requirement to pad for alignment. Note that operations on 64 bit quantities may still require the value to be properly aligned, this is the infamous issue 599.

So, that explains the difference in results between 32 bit and 64 bit machines, Flags (4 bytes) + Reguse (8 bytes) + Regset (8 bytes) + Regindex (8 bytes) == 28 bytes on 32 bit machines, plus another 4 bytes for padding for 64 bit machines.

Well, except that Go 1.5 seems to be consuming another word over and above the Go 1.4 requirements, where is this memory going ?

To give you a hint, I’ll rearrange the definition slightly:

type ProgInfo struct {
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        Flags    uint32   // flag bits
        _        struct{} // to prevent unkeyed literals

Now the results are:

Go 1.4.x 28 bytes 32 bytes
Go 1.5.x 32 bytes 32 bytes

Hmm, so some progress. We’ve managed to reduce the storage in the Go 1.5 64 bit case, but the others appear unchanged, especially Go 1.5’s 32 bit case.

For the Go 1.5 64 bit case, the padding that we saw above is still there, but it has moved. Effectively what the compiler is seeing is this:

type ProgInfo struct {
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        Flags    uint32   // flag bits
        _        [4]byte  // padding for alignment of ProgInfo
        _        struct{} // to prevent unkeyed literals

The reason this padding is still present is to preserve the alignment of the other fields in this structure. Remember that uint64 fields always have to be naturally aligned, every 4 bytes on 32 bit platforms, and every 8 bytes on 64 bit platforms. Consider this small fragment:

var pp [4]ProgInfo	
fmt.Println(&pp[1].Reguse) // address must be a multiple of 8 on 64 bit platforms

The compiler adds padding at the end of the structure to bring its size up to a multiple of the largest element in the structure.

So, that explains everything except for the Go 1.5 32 bit result. For 32 bit platforms, the structure needs to be aligned on a 4 byte boundary, not 8, so no padding should be required. The size under both Go 1.4 and Go 1.5 should be 28 bytes, not 32 bytes … what is going on ?

That empty struct

You’ve probably figured out by now that the only place the additional storage could come from is the unnamed struct{} field. What is it about the presence of an empty struct at the bottom of the type that causes it to increase the size of the struct ?

The answer is that while empty struct{} values consume no storage, you can take their address. That is, if you have a type

type T struct {
      X uint32
      Y struct{}

var t T

It is perfectly valid to take the address of t.Y, and as such, the address of t.Y would point beyond the end of the struct!1

Now, Go code cannot do anything useful with this invalid pointer, but as it is a pointer, the garbage collector has to follow it, and that may lead it to access unmapped memory or dereference an invalid pointer.

The fix for this is detailed in issue 9401 and delivered in Go 1.5. In summary, zero sized values, like struct{} and [0]byte occurring at the end of a structure are assumed to have a size of one byte2. With this knowledge, we can now explain the behaviour of the compiler for the rearranged type.

type ProgInfo struct {
        Reguse   uint64   // registers implicitly used by this instruction
        Regset   uint64   // registers implicitly set by this instruction
        Regindex uint64   // registers used by addressing mode
        Flags    uint32   // flag bits
        _        struct{} // to prevent unkeyed literals
        _        [1]byte  // to prevent unkeyed literals  

For a 64 bit compiler, it would observe that the largest field inside ProgInfo is 8 bytes, so it would add three additional bytes of padding after the [1]byte to round up to 32 bytes. This is fine, it was already planning to add 4 bytes after Flags.

For a 32 bit compiler, it would observe the largest field inside ProgInfo is 8 bytes, however the natural alignment of 32 bit machines is 4 bytes and the sum of the fields inside the type is 28 bytes, which is what Go 1.4 reported. However because the final field is of zero length, the compiler has replaced it with a [1]byte, causing the total size of the structure to be 29 bytes, forcing the compiler to add three more bytes of padding to round up to the next word boundary, 32 bytes.

Fortunately this last issue can be corrected by moving the zero field to the top of the structure, restoring the original sizes we observed in Go 1.4.

Wrapping up

Do Go programmers need to care about this minutiae ? Probably not. This is the stuff of brain teasers.

Certainly if you are looking to optimise the memory usage of your program, you need to be looking closely at the definitions of all your data structures, but it is unlikely that you’ll encounter the perfect storm of all these factors.

Not to be outdone by this morning’s excitement, Matthew Dempsky put together a quick tool to spot this inefficiency, and has found two other candidates.

One more thing

At the start of this piece I mentioned that CL 15522 resulted in a saving of 32 bytes, yet the biggest saving demonstrated above was 8 bytes. How did CL 15522 achieve such a reduction ? The answer is a thing known inside the runtime as size classes.

For small allocations, that is, less than a few kilobytes, the overhead of tracking the allocation becomes a significant percentage of the object itself. Imagine if every allocation had one word of overhead, every time you put an int into an interface3 value, you’d incur 100% overhead!

To reduce this overhead, rather than tracking every object on the heap individually, the runtime allocates things of the same size together. This leads to lower overhead—one bit per object of a certain size, and excellent space efficiency—all objects are of the same size, eliminating fragmentation.

However, as Go types can be any size, potentially the runtime would have to allocate a pool for every possible size (obviously not until they are actually allocated), and this would undo the efficiency of the mechanism as each pool is a number of machine pages, yet most pools would be largely unused.

The solution to this problem is to not create pools for every possible size, but to start to round up the allocation to the next largest size class, for example an allocation of 40 bytes may be rounded up to 48 bytes. The details of this mechanism are left to the curiosity of the reader4, but suffice to say, as allocations get larger, the distance between size classes increases.

Hence, by shaving 8 bytes from obj.ProgInfo, the size of the enclosing obj.Prog type decreased from 296 bytes to 288 bytes, bringing it down to from the previous 320 byte size class, to the 288 byte size class, a saving of 32 bytes on 64 bit platforms.

  1. The explanation for why this is not a violation of Go’s memory safety guarantee is left as an exercise for the reader.
  2. The explanation for why this does not break the semantics of Go programs is left as an exercise for the reader.
  3. The explanation for why storing an int into an interface causes a heap allocation is left as an exercise for the reader.
  4. runtime/msize.go

Building Go 1.5 on the Raspberry Pi

This is a short post to describe my recommended method for building Go on the Raspberry Pi. This method has been tested on the Raspberry Pi 2 Model B (900Mhz, 1Gb ram) and the older Raspberry Pi 1 Model B+ (700Mhz, 512Mb ram).

This method will build Go 1.5 into you home directory, $HOME/go.

As always, please don’t set $GOROOT. You never need to set $GOROOT when building from source.

Step 1. Getting the bootstrap Go 1.4 compiler

Go 1.5 requires an existing Go 1.4 (or later) compiler to build Go 1.5. If you have built Go from source on your host machine you can generate this tarball directly, but to save time I’ve done it for you.

% cd $HOME
% curl | tar xj

Step 2. Fetch the Go 1.5 source

Fetch the Go 1.5 source tarball and unpack it to $HOME/go

% cd $HOME
% curl | tar xz

Step 3. Configure your environment and build

Go 1.5 builds cleanly on arm devices, this is verified by the build dashboard, however if you want to see ./all.bash pass on the Raspberry Pi, some additional configuration is recommended.

Lower the default stack size from 8mb to 1mb.

This is necessary because the runtime tests create many native operating system threads which at 8mb per thread can exhaust the 32bit user mode address space (especially if you are running a recent Raspbian kernel). See issue 11959 for the details.

% ulimit -s 1024     # set the thread stack limit to 1mb
% ulimit -s          # check that it worked

Increase the scaling factor to avoid test timeouts.

The default scaling factor is good for powerful amd64 machines, but is too aggressive for small 32 bit machines. This is done with the GO_TEST_TIMEOUT_SCALE environment variable.

Step 4. Build

% cd $HOME/go/src
% env GO_TEST_TIMEOUT_SCALE=10 GOROOT_BOOTSTRAP=$HOME/go-linux-arm-bootstrap ./all.bash
# Building C bootstrap tool.

# Building compilers and Go bootstrap tool for host, linux/arm.
##### ../test

##### API check
Go version is "go1.5", ignoring -next /home/pi/go/api/next.txt


Installed Go for linux/arm in /home/pi/go
Installed commands in /home/pi/go/bin

On the Raspberry Pi 2 Model B, this should take around an hour, for the older Raspberry Pi 1 Model B+, it takes more than five!

Alternatively, you can substitute ./make.bash above to skip running the tests, which constitutes more than half of the time — or you could cross compile your program from another computer.

As a final step you should add $HOME/go to your $PATH, and to save disk space you can remove $HOME/go-linux-arm-bootstrap.

Cross compilation with Go 1.5

Now that Go 1.5 is out, lots of gophers are excited to try the much improved cross compilation support. For some background on the changes to the cross compilation story you can read my previous post, or Rakyll’s excellent follow up piece.

I’ll assume that you are using the binary version of Go 1.5, as distributed from the Go website. If you are using Go 1.5 from your operating system’s distribution, or homebrew, the process will be the same, however paths may differ slightly.

How to cross compile

To cross compile a Go program using Go 1.5 the process is as follows:

  1. set GOOS and GOARCH to be the values for the target operating system and architecture.
  2. run go build -v YOURPACKAGE

If the compile is successful you’ll have a binary called YOURPACKAGE (possibly with a .exe extension if you’re targeting Windows) in your current working directory.

-o may be used to alter the name and destination of your binary, but remember that go build takes a value that is relative to your $GOPATH/src, not your working directory, so changing directories then executing the go build command is also an option.


I prefer to combine the two steps outlined above into one, like this:

% env GOOS=linux GOARCH=arm go build -v
% file ./gb
./gb: ELF 32-bit LSB  executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped

and that’s all there is to it.

Using go build vs go install

When cross compiling, you should use go build, not go install. This is the one of the few cases where go build is preferable to go install.

The reason for this is go install always caches compiled packages, .a files, into the pkg/ directory that matches the root of the source code.

For example, if you are building $GOPATH/src/, then the compiled package will be installed into $GOPATH/pkg/$GOOS_$GOARCH/

This logic also holds true for the standard library, which lives in /usr/local/go/src, so will be compiled to /usr/local/go/pkg/$GOOS_$GOARCH. This is a problem, because when cross compiling the go tool needs to rebuild the standard library for your target, but the binary distribution expects that /usr/local/go is not writeable.

Using go build rather that go install is the solution here, because go build builds, then throws away most of the result (rather than caching it for later), leaving you with the final binary in the current directory, which is most likely writeable by you.

Ugh, this is really slow!

In the procedure described above, cross compilation always rebuilds that standard library for the target every time. Depending on your workflow this is either not an issue, or a really big issue. If it’s the latter then I recommend you remove the binary distribution and build from source into a path that is writeable by you, then you’ll have the full gamut of go commands available to you.

Cross compilation support in gb is being actively developed and will not have this restriction.

What about GOARM?

The go tool chooses a reasonable value for GOARM by default. You should not change this unless you have a good reason.

But but, what about GOARM=7?

Sure, knock yourself out, but that means your program won’t run on all models of the Raspberry Pi. The difference between GOARM=6 (the default) and GOARM=7 is enabling a few more floating point registers, and a few more operations that allow floating point double values to be passed to and from the ARMv7 (VPFv3) floating point co processor more efficiently. IMO, with the current Go 1.5 arm compiler, it’s not worth the bother.

Using gb as my every day build tool

For the longest time I had this alias in my .bashrc

alias gb='go install -v'

as an homage to John Asmuth’s gb tool which I was very fond of way back before we had the go tool.

Once gb was written, I had to remove that alias and live in a world where I used the go tool and gb concurrently. Now, with the improvements that have landed since GopherCon I’ve decided it’s time to get serious about eating my own dog food and use gb as a full time replacement for the go tool.

I use gb not just to build gb projects, but to work on Juju, my $DAYJOB, as well as using gb to develop gb itself. Because the definition of a gb project is backwards compatible with $GOPATH I don’t even need to rewrite every Go package I want to work on as a gb project, I can work with them with their source happily in my $GOPATH.

gb isn’t perfect yet, there are still some major missing pieces. I hope by aggressively dogfooding I’ll be able to close the gaps before the end of the year. Specifically:

  • gb build works well, but the gb test story is less solid; I hope to reach parity with go test in the next few months by adding support for test flags, -race and coverage options.
  • Support for cross compilation was always high on my list, and is being actively worked on, and will be available to experiment within the next two weeks. At this point I envisage it will be Go 1.5 only, Go 1.4 support may come in time, but is not a high priority.

If others want to join me and make gb their default Go build tool, I’d welcome the company, and of course, your bug reports.

Performance without the event loop

This article is also available in Japanese, イベントループなしでのハイパフォーマンス – C10K問題へのGoの回答

This article is based on a presentation I gave earlier this year at OSCON. It has been edited for brevity and to address some of the points of feedback I received after the talk.

A common refrain when talking about Go is it’s a language that works well on the server; static binaries, powerful concurrency, and high performance.

This article focuses on the last two items, how the language and the runtime transparently let Go programmers write highly scalable network servers, without having to worry about thread management or blocking I/O.

An argument for an efficient programming language

But before I launch into the technical discussion, I want to make two arguments to illustrate the market that Go targets.

Moore’s Law

The oft mis quoted Moore’s law states that the number of transistors per square inch doubles roughly every 18 months.

However clock speeds, which are a function of entirely different properties, topped out a decade ago with the Pentium 4 and have been slipping backwards ever since.

From space constrained to power constrained

Sun e450

Sun Enterprise e450—about the size of a bar fridge, about the same power consumption. Image credit: eBay

This is the Sun e450. When I started my career, these were the workhorses of the industry.

These things were massive. Three of them, stacked one on top of another would consume an entire 19″ Rack. They only consumed about 500 Watts each.

Over the last decade, data centres have moved from space constrained, to power constrained. The last two data centre rollouts I was involved in, we ran out of power when the rack was barely 1/3rd full.

Because compute densities have improved so rapidly, data centre space is no longer a problem. However, modern servers consume significantly more power, in a much smaller area, making cooling harder yet at the same time critical.

Being power constrained has effects at the macro level—you can’t get enough power for a rack of 1200 Watt 1RU servers—and at the micro level, all this power, hundreds of watts, is being dissipated in a tiny silicon die.

Where does this power consumption come from ?


CMOS Inverter. Image credit: Wikipedia

This is an inverter, one of the simplest logic gates possible. If the input, A, is high, then the output, Q, will be low, and vice versa.

All today’s consumer electronics are built with CMOS logic. CMOS stands for Complementary Metal Oxide Semiconductor. The complementary part is the key. Each logic element inside the CPU is implemented with a pair of transistors, as one switches on, another switches off.

When the circuit is on or off, no current flows directly from the source to the drain. However, during transition there is a brief period where both transistors are conducting, creating a direct short.

Power consumption, and thus heat dissipation, is directly proportional to number of transition per second—the cpu clock speed1.

CPU feature size reductions are primarily aimed at reducing power consumption. Reducing power consumption doesn’t just mean “green”. The primary goal is to keep power consumption, and thus heat dissipation, below levels that will damage the CPU.

With clock speeds falling, and in direct conflict with power consumption, performance improvements come mainly from microarchitecture tweaks and esoteric vector instructions, which are not directly useful for the general computation. Added up, each microarchitecture (5 year cycle) change yields at most 10% improvement per generation, and more recently barely 4-6%.

“The free lunch is over”

Hopefully it is clear to you now that hardware is not getting any faster. If performance and scale are important to you, then you’ll agree with me that the days of throwing hardware at the problem are over, at least in the conventional sense. As Herb Sutter put it “The free lunch is over”.

You need a language which is efficient, because inefficient languages just do not justify themselves in production, at scale, on a capital expenditure basis.

An argument for a concurrent programming language

My second argument follows from my first. CPUs are not getting faster, but they are getting wider. This is where the transistors are going and it shouldn’t be a great surprise.

credit intel

Image credit: Intel

Simultaneous multithreading, or as Intel calls it hyper threading allows a single core to execute multiple instruction streams in parallel with the addition of a modest amount of hardware. Intel uses hyper threading to artificially segment the market for processors, Oracle and Fujitsu apply hyper threading more aggressively to their products using 8 or 16 hardware threads per core.

Dual socket has been a reality since the late 1990s with the Pentium Pro and is now mainstream with most servers supporting dual or quad socket designs. Increasing transistor counts have allowed the entire CPU to be co-located with siblings on the same chip. Dual core on mobile parts, quad core on desktop parts, even more cores on server parts are now the reality. You can buy effectively as many cores in a server as your budget will allow.

And to take advantage of these additional cores, you need a language with a solid concurrency story.

Processes, threads and goroutines

Go has goroutines which are the foundation for its concurrency story. I want to step back for a moment and explore the history that leads us to goroutines.


In the beginning, computers ran one job at a time in a batch processing model. In the 60’s a desire for more interactive forms of computing lead to the development of multiprocessing, or time sharing, operating systems. By the 70’s this idea was well established for network servers, ftp, telnet, rlogin, and later Tim Burners-Lee’s CERN httpd, handled each incoming network connections by forking a child process.

In a time-sharing system, the operating systems maintains the illusion of concurrency by rapidly switching the attention of the CPU between active processes by recording the state of the current process, then restoring the state of another. This is called context switching.

Context switching


Image credit: Immae (CC BY-SA 3.0)

There are three main costs of a context switch.

  • The kernel needs to store the contents of all the CPU registers for that process, then restore the values for another process. Because a process switch can occur at any point in a process’ execution, the operating system needs to store the contents of all of these registers because it does not know which are currently in use 2.
  • The kernel needs to flush the CPU’s virtual address to physical address mappings (TLB cache) 3.
  • Overhead of the operating system context switch, and the overhead of the scheduler function to choose the next process to occupy the CPU.

These costs are relatively fixed by the hardware, and depend on the amount of work done between context switches to amortise their cost—rapid context switching tends to overwhelm the amount of work done between context switches.


This lead to the development of threads, which are conceptually the same as processes, but share the same memory space. As threads share address space, they are lighter to schedule than processes, so are faster to create and faster to switch between.

Threads still have an expensive context switch cost; a lot of state must be retained. Goroutines take the idea of threads a step further.


Rather than relying on the kernel to manage their time sharing, goroutines are cooperatively scheduled. The switch between goroutines only happens at well defined points, when an explicit call is made to the Go runtime scheduler. The major points where a goroutine will yield to the scheduler include:

  • Channel send and receive operations, if those operations would block.
  • The go statement, although there is no guarantee that new goroutine will be scheduled immediately.
  • Blocking syscalls like file and network operations.
  • After being stopped for a garbage collection cycle.

In other words, places where the goroutine cannot continue until it has more data, or more space to put data.

Many goroutines are multiplexed onto a single operating system thread by the Go runtime. This makes goroutines cheap to create and cheap to switch between. Tens of thousands of goroutines in a single process are the norm, hundreds of thousands are not unexpected.

From the point of view of the language, scheduling looks like a function call, and has the same semantics. The compiler knows the registers which are in use and saves them automatically. A thread calls into the scheduler holding a specific goroutine stack, but may return with a different goroutine stack. Compare this to threaded applications, where a thread can be preempted at any time, at any instruction.

This results in relatively few operating system threads per Go process, with the Go runtime taking care of assigning a runnable goroutine to a free operating system thread.

Stack management

In the previous section I discussed how goroutines reduce the overhead of managing many, sometimes hundreds of thousands of concurrent threads of execution. There is another side to the goroutine story, and that is stack management.

Process address space

processThis is a diagram of the typical memory layout of a process. The key thing we are interested in is the locations of the heap and the stack.

Inside the address space of a process, traditionally the heap is at the bottom of memory, just above the program code and grows upwards.

The stack is located at the top of the virtual address space, and grows downwards.

guard-pageBecause the heap and stack overwriting each other would be catastrophic, the operating system arranges an area of inaccessible memory between the stack and the heap.

This is called a guard page, and effectively limits the stack size of a process, usually in the order of several megabytes.

Thread stacks

threadsThreads share the same address space, so for each thread, it must have its own stack and its own guard page.

Because it is hard to predict the stack requirements of a particular thread, a large amount of memory must be reserved for each thread’s stack. The hope is that this will be more than needed and the guard page will never be hit.

The downside is that as the number of threads in your program increases, the amount of available address space is reduced.

Goroutine stack management

The early process model allowed the programmer to view the heap and the stack as large enough to not be a concern. The downside was a complicated and expensive subprocess model.

Threads improved the situation a bit, but require the programmer to guess the most appropriate stack size; too small, your program will abort, too large, you run out of virtual address space.

We’ve seen that the Go runtime schedules a large number of goroutines onto a small number of threads, but what about the stack requirements of those goroutines ?

Goroutine stack growth

stack-growth Each goroutine starts with an small stack, allocated from the heap. The size has fluctuated over time, but in Go 1.5 each goroutine starts with a 2k allocation.

Instead of using guard pages, the Go compiler inserts a check as part of every function call to test if there is sufficient stack for the function to run. If there is sufficient stack space, the function runs as normal.

If there is insufficient space, the runtime will allocate a larger stack segment on the heap, copy the contents of the current stack to the new segment, free the old segment, and the function call is restarted.

Because of this check, a goroutine’s initial stack can be made much smaller, which in turn permits Go programmers to treat goroutines as cheap resources. Goroutine stacks can also shrink if a sufficient portion remains unused. This is handled during garbage collection.

Integrated network poller

In 2002 Dan Kegel published what he called the c10k problem. Simply put, how to write server software that can handle at least 10,000 TCP sessions on the commodity hardware of the day. Since that paper was written, conventional wisdom has suggested that high performance servers require native threads, or more recently, event loops.

Threads carry a high overhead in terms of scheduling cost and memory footprint. Event loops ameliorate those costs, but introduce their own requirements for a complex, callback driven style.

Go provides programmers the best of both worlds.

Go’s answer to c10k

In Go, syscalls are usually blocking operations, this includes reading and writing to file descriptors. The Go scheduler handles this by finding a free thread or spawning another to continue to service goroutines while the original thread blocks. In practice this works well for file IO as a small number of blocking threads can quickly exhaust your local IO bandwidth.

However for network sockets, by design at any one time almost all of your goroutines are going to be blocked waiting for network IO. In a naive implementation this would require as many threads as goroutines, all blocked waiting on network traffic. Go’s integrated network poller handles this efficiently due to the cooperation between the runtime and net packages.

In older versions of Go, the network poller was a single goroutine that was responsible for polling for readiness notification using kqueue or epoll. The polling goroutine would communicate back to waiting goroutines via a channel. This achieved the goal of avoiding a thread per syscall overhead, but used a generalised wakeup mechanism of channel sends. This meant the scheduler was not aware of the source or importance of the wakeup.

In current versions of Go, the network poller has been integrated into the runtime itself. As the runtime knows which goroutine is waiting for the socket to become ready it can put the goroutine back on the same CPU as soon as the packet arrives, reducing latency and increasing throughput.

Goroutines, stack management, and an integrated network poller

In conclusion, goroutines provide a powerful abstraction that free the programmer from worrying about thread pools or event loops.

The stack of a goroutine is as big as it needs to be without being concerned about sizing thread stacks or thread pools.

The integrated network poller lets Go programmers avoid convoluted callback styles while still leveraging the most efficient IO completion logic available from the operating system.

The runtime make sure that there will be just enough threads to service all your goroutines and keep your cores active.

And all of these features are transparent to the Go programmer.


  1. CMOS power consumption is not only caused by the short circuit current when the circuit is switching. Additional power consumption comes from charging the output capacitance of the gate, and leakage current through the MOSFET gate increases as the size of the transistor decreases. You can read more about this from in a the lecture materials from CMU’s ECE322 course. Bill Herd has a published a series of articles on how CMOS works.
  2. This is an oversimplification. In some cases the operating system can avoid saving and restoring infrequently used architectural registers by starting the the process in a mode where access to floating point or MMX/SSE registers will cause the program to fault, thereby informing the kernel that the process will now use those registers and it should from then on save and restore them.
  3. Some CPUs have what is known as a tagged TLB. In the case of tagged TLB support the operating system can tell the processor to associate particular TLB cache entries with an identifier, derived from the process ID, rather than treating each cache entry as global. The upside is this avoids flushing out entries on each process switch if the process is placed back on the same CPU in short order.