Monthly Archives: January 2018

Containers versus Operating Systems

What does a distro provide?

The most popular docker base container image is either busybox, or scratch. This is driven by a movement that is equal parts puritanical and pragmatic. The puritan asks “Why do I need to run init(1) just to run my process?” The pragmatist asks “Why do I need a 700 meg base image to deploy my application?” And both, seeking immutable deployment units ask “Is it a good idea that I can ssh into my container?” But let’s step back for a second and look at the history of how we got to the point where questions like this are even a thing.

In the very beginnings, there were no operating systems. Programs ran one at a time with the whole machine at their disposal. While efficient, this created a problem for the keepers of these large and expensive machines. To maximise their investment, the time between one program finishing and another starting must be kept to an absolute minimum; hence monitor programs and batch processing was born.

Monitors started as barely more than watchdog timers. They knew how to load the next program off tape, then set an alarm if the program ran too long. As time went on, monitors became job control–quasi single user operating systems where the operators could schedule batch jobs with slightly more finesse than the previous model of concatenating them in the card reader.¹

In response to the limitations of batch processing, and with the help of increased computing resources, interactive computing was born. Interactive computing allowing multiple users to interact with the computer directly, time slicing, or time sharing, the resources between users to present the illusion of each program having a whole computer to itself.

“The UNIX kernel is an I/O multiplexer more than a complete operating system. This is as it should be.”
Ken Thompson, BSTJ, 1978

interactive computing in raw terms was less efficient than batch, however it recognised that the potential to deliver programs faster outweighed a less than optimal utilisation of the processor; a fact borne out by the realisation that programming time was not benefiting from the same economies of scale that Moore’s law was delivering for hardware. Job control evolved to became what we know as the kernel, a supervisor program which sits above the raw hardware, portioning it out and mediating access to hardware devices.

With interactive users came the shell, a place to start programs, and return once they completed. The shell presented an environment, a virtual work space to organise your work, communicate with others, and of course customise. Customise with programs you wrote, programs you got from others, and programs that you collaborated with your coworkers on.

Interactive computing, multi user systems and then networking gave birth to the first wave of client/server computing–the server was your world, your terminal was just a pane of glass to interact with it. Thus begat userspace, a crowded bazaar of programs, written in many languages, traded, sold, swapped and sometimes stolen. A great inter breeding between the UNIX vendors produced a whole far larger than the sum of its parts.

Each server was an island, lovingly tended by operators, living for years, slowly patched and upgraded, becoming ever more unique through the tide of software updates and personnel changes.

Skip forward to Linux and the GNU generation, a kernel by itself does not serve the market, it needs a user space of tools to attract and nurture users accustomed to the full interactive environment.

But that software was hard, and messy, and spread across a million ftp, tucows, sourceforge, and cvs servers. Their installation procedures are each unique, their dependencies are unknown or unmanaged–in short, a job for an expert. Thus distributions became experts at packaging open source software to work together as a coherent interactive userspace story.

Container sprawl

We used to just have lots of servers, drawing power, running old software, old operating systems, hidden under people’s desks, and sometimes left running behind dry wall. Along came virtualisation to sweep away all the old, slow, flaky, out of warranty hardware. Yet the software remained, and multiplied.

Vmsprawl, it was called. Now free from a purchase order and a network switch port, virtual machines could spawn faster than rabbits. But their lifespan would be much longer.

Back when physical hardware existed, you could put labels on things, assign them to operators, have someone to blame, or at least ask if the operating system was up to date, but virtual machines became ephemeral, multitudinous, and increasingly, redundant,

Now that a virtual machines’ virtual bulk has given way to containers, what does that mean for the security and patching landscape? Surely it’s as bad, if not worse. Containers can multiply even faster than VMs and at such little cost compared to their bloated cousins that the problem could be magnified many times over. Or will it?

The problem is maintaining the software you didn’t write. Before containers that was everything between you and the hardware; obviously a kernel, that is inescapable, but the much larger surface area (in recent years ballooning to a DVD’s girth) was the userland. The gigabytes of software that existed to haul the machine onto the network, initialise its device drivers, scrub its /tmp partition, and so on.

But what if there was no userland? What if the network was handled for you, truly virtualised at layer 3, not layer 1. Your volumes were always mounted and your local storage was fleeting, so nothing to scrub. What would be the purpose of all those decades of lovingly crafted userland cruft?

If interactive software goes unused, was it ever installed at all?

Immutable images

Netflix tells us that immutable images are the path to enlightenment. Built it once, deploy it often. If there is a problem, an update, a software change, a patch, or a kernel fix, then build another image and roll it out. Never change something in place. This mirrors the trend towards immutability writ large by the functional programming tidal wave.

While Netflix use virtual machines, and so need software to configure their (simulated) hardware and software to plumb their (simulated) network interfaces to get to the point of being able to launch the application, containers leave all these concerns to the host. A container is spawned with any block devices or network interfaces required already mounted or plumbed a priori.

So, if you remove the requirement, and increasingly, the ability, to change the contents of the running image, and you remove the requirement to prepare the environment before starting the application, because the container is created with all its facilities already prepared, why do you need a userland inside a container?

Debugging? Possibly.

Today there are many of my generation who would feel helpless without being about to ssh to a host, run their favourite (and disparate) set of inspection tools. But a container is just a process inside a larger host operating system, so do you diagnosis there instead. Unlike virtual machines, these are not black boxes, the host operating system has far more capability to inspect and diagnose a guest than the guest itself–so leave your diagnosis tools on the host. And your ssh daemon, for good measure.

Updates, updates. Updates!

Why do we have operating system distros? In a word, outsourcing.

Sure, every admin could subscribe to the mailing lists of all the software packages installed on the servers they maintain (you do know all the software installed on the machines you are responsible for, right?) and then download, test, certify, upgrade the software promptly after being notified. Sound’s simple. Any admin worth hiring should be able to do this.

Sure, assuming you can find an admin who wants to do this grunt work, and that they can keep up with the workload, and that they can service more than a few machines before they’re hopelessly chasing their tails.

No, of course not, we outsource this to operating system vendor. In return for using outdated versions of software, distros will centralise the triage, testing and preparation of upgrades and patches.

This is the reason that a distro and its package management tool of choice are synonymous. Without a tool to automate the dissemination, installation and upgrade of packaged software, distro vendors would have no value. And without someone to marshal unique snowflake open source software into a unified form, package management software would have no value.

No wonder that the revenue model for all open source distro vendors centers around tooling that automates the distribution of update packages.

The last laugh

Ironically, the last laugh in this tale may be the closed source operating system vendors. It was Linux and open source that destroyed the proprietary UNIX market after the first dot com crash.

Linux rode Moore’s law to become the server operating system for the internet, and made kings of the operating system distributors. But it’s Linux that is driving containers, at least in their current form, and Linux containers, or more specifically a program that communicates directly with the kernel syscall api inside a specially prepared process namespace, is defining the new normal for applications.

Containers are eating the very Linux distribution market which enabled their creation.

OSX and Windows may be relegated to second class citizens–the clients in the client/server or client/container equation–but at least nobody is asking difficult questions about the role of their userspace.

Whither distros

What is the future of operating system distributions? Their services, while mature, scalable, well integrated, and expertly staffed, will unfortunately be priced out of the market. History tells us this.

In the first dot com bust, companies retreated from expensive proprietary software, not because it wasn’t good, not because it wasn’t extensible or changeable, but because it was too expensive. With no money coming in, thousands of dollars of opex walking out the door in licence fees was unsustainable.

The companies that survived the crash, or were born in its wreckage, chose open source software. Software that was arguably less mature, less refined, at the time, but with a price tag that was much more approachable. They kept their investors money in the bank, rode the wave of hardware improvements, and by pulling together in a million loosely organised software projects created a free (as in free puppy) platform to build their services on top–trading opex for some risk that they may have no-one to blame if their free software balloon sprang a leak.

Now, it is the Linux distributors who are chasing the per seat or per cpu licence fees. Offering scaled out professional services in the form of a stream of software updates and patches, well tested and well integrated.

But, with the exception of the kernel–which is actually provided by the host operating system, not the container–all those patches and updates are for software that is not used by the container, and in the case of our opening examples, busybox and scratch. not present. The temptation to go it alone, cut out the distro vendors, backed by the savings in licence fees is overwhelming.

What can distros do?

What would you do if you woke up one day to find that you owned the best butchers shop in a town that had decided to become vegetarian en mass?

Go’s hidden #pragmas

This is an article about compiler directives; or as they are commonly known, pragmas. It’s derived from a talk of a similar name that I gave last year at GopherChina in Shanghai.

But first, a history lesson

Before we talk about Go, let’s talk a little about pragmas, and their history. Many languages have the notion of an attribute, or directive, that changes the way source code is interpreted during compilation. For example, Perl has the use function:

use strict;
use strict "vars";
use strict "refs";
use strict "subs";

use enable features, or makes the compiler interpret the source of the program differently, by making the compiler more pedantic or enabling a new syntax mode.

Javascript has something similar. ECMAScript 5 extended the language with optional modes, such as:

"use strict";

When the Javascript interpreter comes across the words "use strict"; it enables, so called, Strict Mode when parsing your Javascript source. ²

Rust is similar, it uses the attributes syntax to enable unstable features in the compiler or standard library.

#[inline(always)]
 fn super_fast_fn() { ... }

#[cfg(target_os = "macos")]
mod macos_only { ... }

The inline(always) attribute tells the compiler that it must inline super_fast_fn. The target_os attribute tells the compiler to only compile the macos_only module on OS X.

The name pragma comes from ALGOL 68, where they were called pragmats, which was itself shorthand for the word pragmatic. When they were adopted by C in the 1970’s, the name was shortened again to #pragma, and due to the widespread use of C, became fully integrated into the programmer zeitgeist.

#pragma pack(2)
 struct T {
    int i;
    short j;
double k; 
};

This example says to the compiler that the structure should be packed on a two byte boundary; so the double, k, will start at an offset of 6 bytes from the address of T, not the usual 8.

C’s #pragma directive spawned a host of compiler specific extensions, like gcc’s __builtin directive.

Does Go have pragmas?

Now that we know a little bit of the history of pragmas, maybe we can now ask the question, does Go have pragmas?

You saw earlier that #pragma, like #include and #define are implemented in C style languages with a preprocessor, but Go does not have a preprocessor, or macros, so, the question remains, does Go have pragmas?

It turns out that, yes, even though Go does not have macros, or a preprocessor, Go does indeed support pragmas. They are implemented by the compiler as comments.

Just to drive home the point, they’re actually called pragmas in the source of the Go compiler.

So, clearly the name pragma, along with the idea, isn’t going away.

This article focuses on a only a few of the pragmas that the compiler recognises, partly because the list changes frequently, but mostly because not all of them are usable by you as programmers.

Here are some examples to whet your appetite

//go:noescape 
func gettimeofday(tv *Timeval) (err Errno)

This is an example of the noescape directive on the gettimeofday stub from the syscall package.

//go:noinline
 func lshNop1(x uint64) uint64 { 
        // two outer shifts should be removed
         return (((x << 5) >> 2) << 2) 
}

This is an example of the noinline directive from a test fixture in the compiler tests.

//go:nosplit
 func atomicstorep(ptr unsafe.Pointer, new unsafe.Pointer) {
         writebarrierptr_prewrite((*uintptr)(ptr), uintptr(new))
         atomic.StorepNoWB(noescape(ptr), new) 
}

This is an example of the nosplit directive inside the runtime’s atomic support functions.

Don’t worry if this was all a bit quick, we’re going to explore these examples, and more, during the remainder of this article.

A word of caution ?

Before I continue, I want to offer a word of caution.

Pragmas are not part of the language. They might be implemented the gc compiler, but you will not find them in the spec. At a higher level, the idea of adding pragmas to the language caused considerable debate, especially after the first few established a precedent. In a debate about adding the //go:noinline directive Rob Pike opined,

“Useful” is always true for a feature request. The question is, does the usefulness justify the cost? The cost here is continued proliferation of magic comments, which are becoming too numerous already.
–Rob Pike

I’ll leave you to decide if adding pragmas to Go was a good idea or not.

As I mentioned earlier pragma directives are placed in Go comments with a precise syntax. The syntax has the general form:

//go:directive

The go: prefix can be replaced with another, so you can see that the Go team were at least considering future growth, even though they don’t encourage it. It’s also important to note that there is no space between the // and the go keyword. This is partly an accident of history, but it also makes it less likely to conflict with a regular comment.

Lastly, some of these directives require you to do one or more of the following:

import the unsafe package.
compile with the undocumented -+ flag.
be part of the runtime package.

If you get it wrong, your directive might be ignored, and in most cases you code will compile but might be slower or behave incorrectly.

//go:noescape

Enough with the preflight safety checks.

Early in Go’s life, the parts that went into a complete Go program would include Go code (obviously), some C code from the runtime, and some assembly code, again from the runtime or syscall package. The take away is it was expected that inside a package, you’d occasionally find functions which were not implemented in Go.

Now, normally this mixing of languages wouldn’t be a problem, except when it interacts with escape analysis. In Go it’s very common to do something like this,

func NewBook() (*Book) {
        b := Book{ Mice: 12, Men: 9 }
        return &b
}

That is, inside NewBook we declare and initialise a new Book variable b, then return the address of b. We do this so often inside Go it probably doesn’t sink in that if you were to do something like this in C, the result would be pernicious memory corruption as the address returned from NewBook would point to the location on the stack where b was temporarily allocated.

Escape analysis

Escape analysis identifies variables whose lifetimes will live beyond the lifetime of the function in which it is declared, and moves the location where the variable is allocated from the stack to the heap. Technically we say that b escapes to the heap.

Obviously there is a cost; heap allocated variables have to be garbage collected when they are no longer reachable, stack allocated variables are automatically free’d when their function returns. Keep that in mind.

func BuildLibrary() {
        b := Book{Mice: 99: Men: 3} 
        AddToCollection(&b) 
}

Now, lets consider a slightly different version of what we saw above. In this contrived example, BuildLibrary declares a new Book, b, and passes the address of b to AddToCollection. The question is, “does b escape to the heap?”

The answer is, it depends. It depends on what AddToCollection does with the *Book passed to it. If AddToCollection did something like this,

func AddToCollection(b *Book) {
        b.Classification = "fiction" 
}

then that’s fine. AddToCollection can address those fields in Book irrespective of if b points to an address on the stack or on the heap. Escape analysis would conclude that the b declared in BuildLibrary did not escape, because AddToCollection did not retain a copy of the *Book passed to it, and can therefore be allocated cheaply on the stack.

However, if AddToCollection did something like this,

var AvailableForLoan []*Book

func AddToCollection(b *Book) {
        AvailableForLoan = append(AvailableForLoan, b) 
}

that is, keep a copy of b in some long lived slice, then that will have an impact on the b declared in BuildLibrary. b must be allocated on the heap so that it lives beyond the lifetime of AddToCollection and BuildLibrary. Escape analysis has to know what AddToCollection does, what functions it calls, and so on, to know if a value should be heap or stack allocated. This is the essence of escape analysis.

os.File.Read

That was a lot of background, let’s get back to the //go:noescape pragma. Now we know that the call stack of functions affects whether a value escapes or not, consider this very common situation (error handling elided for brevity),

f, _ := os.Open("/tmp/foo")
 buf := make([]byte, 4096)
 n, _ := f.Read(buf)

We open a file, make a buffer, and we read into that buffer. Is buf allocated on the stack, or on the heap?

As we saw above, it depends on what happens inside os.File.Read. os.File.Read calls down through a few layers to syscall.Read, and this is where it gets complicated. syscall.Read calls down into syscall.Syscall to do the operating system call. syscall.Syscall is implemented in assembly. Because syscall.Syscall is implemented in assembly, the compiler, which works on Go code, cannot “see” into that function, so it cannot see if the values passed to syscall.Syscall escape or not. Because the compiler cannot know if the value might escape, it must assume it will escape.

This was the situation in issue 4099. If you wanted to write a small bit of glue code in assembly, like the bytes, md5, or syscall package, anything you passed to it would be forced to allocated on the heap even if you knew that it doesn’t.

package bytes
//go:noescape
 // IndexByte returns the index of the first instance of c in s, 
// or -1 if c is not present in s. 
func IndexByte(s []byte, c byte) int // ../runtime/asm_$GOARCH.s

So this is precisely what the //go:noescape pragma does. It says to the compiler, “the next function declaration you see, assume that none of the arguments escape.” We’ve said to the compiler; trust us, IndexByte and its children do not keep a reference to the byte slice.

In this example from Go 1.5 you can see that bytes.IndexByte is implemented in assembly ³. By marking this function //go:noescape, it will avoid stack allocated []byte slices escaping to the heap unnecessarily.

Can you use //go:noescape in your code?

Can you use //go:noescape in your own code? Yes, but it can only be used on the forward declarations.

package main

import "fmt"

//go:noescape
func length(s string) int // implemented in an .s file

func main() {
        s := "hello world"
        l := length(s)
        fmt.Println(l)
}

Note, you’re bypassing the checks of the compiler, if you get this wrong you’ll corrupt memory and no tool will be able to spot this.

//go:norace

Forking in a multithreaded program is complicated. The child process gets a complete, independent, copy of the parent’s memory, so things like locks, implemented as values in memory can become corrupt when suddenly two copies of the same program see locks in different state.

Fork/exec in the Go runtime is handled with care by the syscall package which coordinates to make sure that the runtime is in quiescent state during the brief fork period. However, when the race runtime is in effect, this becomes harder.

To spot races, when compiling in race mode, the program is rewritten so every read and write goes via the race detector framework to detect unsafe memory access. I’ll let the commit explain.

// TODO(rsc): Remove. Put //go:norace on forkAndExecInChild instead.
 func isforkfunc(fn *Node) bool {
         // Special case for syscall.forkAndExecInChild.
         // In the child, this function must not acquire any locks, because 
        // they might have been locked at the time of the fork. This means 
        // no rescheduling, no malloc calls, and no new stack segments. 
        // Race instrumentation does all of the above. 
        return myimportpath != "" && myimportpath == "syscall" &&
                fn.Func.Nname.Sym.Name == "forkAndExecInChild" 
}

As Russ’s comment shows above, the special casing in the compiler was removed in favour of a directive on the syscall.forkAndExecInChild functions in the syscall package.

// Fork, dup fd onto 0..len(fd), and exec(argv0, argvv, envv) in child. 
// If a dup or exec fails, write the errno error to pipe. 
// (Pipe is close-on-exec so if exec succeeds, it will be closed.)
 // In the child, this function must not acquire any locks, because 
// they might have been locked at the time of the fork. This means
 // no rescheduling, no malloc calls, and no new stack segments.
 // For the same reason compiler does not race instrument it. 
// The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
 //go:norace 
func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir 
        *byte, attr *ProcAttr, sys *SysProcAttr, pipe int)
        (pid int, err Errno) {

This was replaced by the annotation //go:norace by Ian Lance Taylor in Go 1.6, which removed the special case in the compiler, however //go:norace is still only used in one place in the standard library.

Should you use //go:norace in your own code?

Should you use //go:norace in your own code? Using //go:norace will instruct the compiler to not annotate the function, thus will not detect any data races if they exist. This program contains a data race, which will not be reported by the race detector because of the //go:norace annotation.

package main

var v int

//go:norace
func add() {
        v++
}

func main() {
        for i := 0; i < 5; i++ {
                go add()
        }
}

Given the race detector has no known false positives, there should be very little reason to exclude a function from its scope.

//go:nosplit

Hopefully by now everyone knows that a goroutine’s stack is not a static allocation. Instead each goroutine starts with a few kilobytes of stack and, if necessary, will grow.

The technique that the runtime uses to manage a goroutine’s stack relies on each goroutine keeping track of its current stack usage. During the function preamble, a check is made to ensure there is enough stack space for the function to run. If not, the code traps into the runtime to grow the current stack allocation.

"".fn t=1 size=120 args=0x0 locals=0x80 
        0x0000 00000 (main.go:5)  TEXT    "".fn(SB), $128-0
         0x0000 00000 (main.go:5)  MOVQ    (TLS), CX
         0x0009 00009 (main.go:5)  CMPQ    SP, 16(CX)
         0x000d 00013 (main.go:5)  JLS     113

Now, this preamble is quite small, as we see it’s only a few instructions on x86.

A load from an offset of the current g register, which holds a pointer to the current goroutine.
A compare against the stack usage for this function, which is a constant known at compile time.
And a branch to the slow path, which is rare and easily predictable.

But sometimes even this overhead is unacceptable, and occasionally, unsafe, if you’re the runtime package itself. So a mechanism exists, via an annotation in the compiled form of the function to skip the stack check preamble. It should also be noted that the stack check is inserted by the linker, not the compiler, so it applies to assembly functions and, while they existed, C functions.

Up until Go 1.4, the runtime was implemented in a mix of Go, C and assembly.

// All reads and writes of g's status go through readgstatus, casgstatus 
// castogscanstatus, casfromgscanstatus.
 #pragma textflag NOSPLIT
 uint32 runtime·readgstatus(G *gp) {
         return runtime·atomicload(&gp->atomicstatus); 
}

In this example, runtime.readgstatus, we can see the C style #pragma textflag NOSPLIT. ⁴

When the runtime was rewritten in Go, a way to say that a particular function should not have the stack split check was still required. This was often needed as taking a stack split inside the runtime was forbidden because a stack split implicitly needs to allocate memory, which would lead to recursive behaviour. Hence #pragma textflag NOSPLIT became //go:nosplit.

// All reads and writes of g's status go through 
// readgstatus, casgstatus, castogscanstatus,
 // casfrom_Gscanstatus.
 //go:nosplit 
funcreadgstatus(gp *g) uint32 {
         return atomic.Load(&gp.atomicstatus)
 }

But this leads to a problem, what happens if you run out of stack with //go:nosplit?

If a function, written in Go or otherwise, uses //go:nosplit to say “I don’t want to grow the stack at this point”, the compiler still has to ensure it’s safe to run the function. Go is a memory safe language, we cannot let functions use more stack than they are allowed just because they want to avoid the overhead of the stack check. They will almost certainly corrupt the heap or another goroutine’s memory.

To do this, the compiler maintains a buffer called the redzone, a 768 byte allocation ⁵ at the bottom of each goroutines’ stack frame which is guaranteed to be available.

The compiler keeps track of the stack requirements of each function. When it encounters a nosplit function it accumulates that function’s stack allocation against the redzone. In this way, carefully written nosplit functions can execute safely against the redzone buffer while avoiding stack growth at inconvenient times.

This program uses nosplit to attempt to avoid stack splitting,

package main

type T [256]byte // a large stack allocated type

//go:nosplit
func A(t T) {
        B(t)
}

//go:nosplit
func B(t T) {
        C(t)
}

//go:nosplit
func C(t T) {
        D(t)
}

//go:nosplit
//go:noinline
func D(t T) {}

func main() {
        var t T
        A(t)
}

But will not compile because the compiler detects the redzone would be exhausted.

# command-line-arguments
main.C: nosplit stack overflow
        744     assumed on entry to main.A (nosplit)
        480     after main.A (nosplit) uses 264
        472     on entry to main.B (nosplit)
        208     after main.B (nosplit) uses 264
        200     on entry to main.C (nosplit)
        -64     after main.C (nosplit) uses 264

We occasionally hit this in the -N (no optimisation) build on the dashboard as the redzone is sufficient when optimisations are on, generally inlining small functions, but when inlining is disabled, stack frames are deeper and contain more allocations which are not optimised away.

Can you use //go:nosplit in your own code?

Can you use //go:nosplit in your own functions? Yes, I just showed you that you can. But it’s probably not necessary. Small functions would benefit most from this optimisation are already good candidates for inlining, and inlining is far more effective at eliminating the overhead of function calls than //go:nosplit.

You’ll note in the example above I showed I had to use //go:noinline to disable inlining which otherwise would have detected that D() actually did nothing, so the compiler would optimise away the entire call tree.

Of all the pragmas this one is the safest to use, as it will get spotted at compile time, and should generally not affect the correctness of your program, only the performance.

//go:noinline

This leads us to inlining.

Inlining ameliorates the cost of the stack check preamble, and in fact all the overheads of a function call, by copying the code of the inlined function into its caller. It’s a small trade off of possibly increased program size against reduced runtime by avoiding the function call overhead. Inlining is the key compiler optimisation because it unlocks many other optimisations.

Inlining is most effective with small, simple, functions as they do relatively little work compared to their overhead. For large functions, inlining offers less benefit as the overhead of the function call is small compared to the time spent doing work. However, what if you don’t want a function inlined? It turned out this was the case when developing the new SSA backend, as inlining would cause the nascent compiler to crash. I’ll let Keith Randall explain.

We particularly need this feature on the SSA branch because if a function is inlined, the code contained in that function might switch from being SSA-compiled to old-compiler-compiled. Without some sort of noinline mark the SSA-specific tests might not be testing the SSA backend at all.

The decision to control what can be inlined is made by a function inside the compiler called, ishairy. Hairy statements are things like closures, for loops, range loops, select, switch, and defer. If you wanted to write a small function that you do not want to be inlined, and don’t want the to add any overhead to the function, which of those would you use? It turns out, the answer is switch.

Prior to the SSA compiler, switch {} would prevent a function being inlined, whilst also optimising to nothing, and this was used heavily in compiler test fixtures to isolate individual operations.

func f3a_ssa(x int) *int {
         switch {
         }
         return &x
 }

With the introduction of the SSA compiler, switch was no longer considered hairy as switch is logically the same as a list of if ... else if statements, so switch{} stopped being a placeholder to prevent inlining. The compiler developers debated how to represent the construct “please don’t inline this function, ever”, and settled on a new pragma, //go:noinline.

Can you use //go:noinline in your own code?

Absolutely, although I cannot think of any reason to do so off hand, save silly examples like this article.

But what about …

But wait, there are many more pragmas that Go supports that aren’t part of this set we’re discussing.

+build is implemented by the Go tool, not the compiler, to filter files passed to the compiler for build or test

//go:generate uses the same syntax as a pragma, but is only recognised by the generate tool.

package pdf // import "rsc.io/pdf"

What about the canonical import pragma added in Go 1.4, to force the go tool to refuse to compile packages not imported by their “canonical” name

//line /foo/bar.go:123

What about the //line directive that can renumber the line numbers in stack traces?

Wrapping up

Pragmas in Go have a rich history. I hope the retelling of this history has been interesting to you.

The wider arc of Go’s pragmas is they are used inside the standard library to gain a foothold to implement the runtime, including the garbage collector, in Go itself. Pragmas allowed the runtime developers to extend, the language just enough to meet the requirements of the problem. You’ll find pragmas used, sparingly, inside the standard library, although you’ll never find them listed in godoc.

Should you use these pragmas in your own programs? Possibly //go:noescape is useful when writing assembly glue, which is done quite often in the crypto packages. For the other pragmas, outside demos and presentations like this, I don’t think there is much call for using them.

But please remember, magic comments are not part of the language spec, if you use GopherJS, or llgo, or gccgo, your code will still compile, but may operate differently. So please use this advice sparingly.

Caveat emptor.

If aligned memory writes are atomic, why do we need the sync/atomic package?

This is a post inspired by a question on the Go Forum. The question, paraphrased, was “If properly aligned writes are guaranteed to be atomic by the processor, why does the race detector complain?”

The answer is, there are two uses of the word atomic in play here. The first, the one the OP references, is a property of most microprocessors that, as long as the address of the write is naturally aligned–if it’s a 32-bit value, say, then it is always written to an address which is a multiple of four–then nothing will observe a half written value.

To explain what that means, consider the opposite, an unaligned write where a 32-bit value is written to an address whose bottom two bits are not zero. In this case the processor has to split the write into two, spanning the boundary. This is known as a torn write as an observer on the bus could see this partially updated value.⁶

These words comes from a time before multiple processors were common. At that time the observers of a torn read or write would most likely be other agents on the ISA, VESA, or PCI bus like disk controllers or video cards. However, we now live in the multi-core age so we need to talk about caches and visibility.

Since almost the beginning of computing, the CPU has run faster than main memory. That is to say, the performance of a computer is strongly related to the performance of its memory. This is known as the processor/memory gap. To bridge this gap processors have adopted caches which store recently accessed memory in a small, fast, store, closer to the processor.⁷ Because caches also buffer writes back to main memory, while the property that an aligned address will be atomic remains, when that write occurs has become less deterministic.⁸ This is the domain of second use of the word atomic, the one implemented by the sync/atomic package.

In a modern multiprocessor system, a write to main memory will be buffered in multiple levels of caches before hitting main memory. This is done to to hide the latency of main memory, but in doing so it means that communicating between processors using main memory is now imprecise; a value read from memory may have already been overwritten by one processor, however the new value has not made its way through the various caches yet.

To solve this ambiguity you need to use a memory fence, also known as a memory barrier. A memory write barrier operation tells the processor that it has to wait until all the outstanding operations in its pipeline, specifically writes, have been flushed to main memory. This operation also invalidates the caches

⁹

held by other processors, forcing them to retrieve the new value directly from memory. The same is true for reads, you use a memory read barrier to tell the processor to stop and synchronise with any outstanding writes to memory.

In terms of Go, read and write memory barrier operations are handled by the sync/atomic package, specifically the family of atomic.Load and atomic.Store functions respectively.

¹⁰

In answer to the OP’s question: to safely use a value in memory as a communication channel between two goroutines, the race detector will complain unless the sync/atomic package is used.

Dave Cheney

The acme of foolishness