Author Archives: Dave Cheney

About Dave Cheney

A chaotic neutral System Administrator with super cow powers. My weapons are: * fear * cynicism * an almost fanatical devotion to the command line twitter.com/davecheney

How to install multiple versions of Go

Introduction

This post presents one technique for installing and using multiple versions of Go on a machine. This is a technique I use often as we have standardised on Go 1.2.1 for developing Juju, but develop on the tip of Go itself.

You may find this technique useful for do comparisons between various Go versions for performance or validation.

This procedure mainly applies to Unix installations of Go, however assuming you have the correct toolchain, Windows users can apply it also.

Prerequests

There are two prerequisite before you begin

  1. Do not set GOROOT.
    You must not set $GOROOT. Unset it in your environment.
  2. Remove any existing versions of Go on your system.
    If you have installed Go via your operating system’s package manager, or via a tool like Homebrew, uninstall it before proceeding.

Installation

In this example I’m going to build the latest release of Go, 1.2.1, and the previous stable release 1.1.2. You can extend this pattern to handle other versions.

  1. Clone the Go sources.
    hg clone https://code.google.com/p/go $HOME/go
  2. Clone release working copies.
    Using the clone from the previous step, clone each version of Go using its specific release tag.

    hg clone $HOME/go -r go1.1.2 $HOME/go-1.1.2
    hg clone $HOME/go -r go1.2.1 $HOME/go-1.2.1
  3. Build each version of Go.
    cd $HOME/go-1.1.2/src && ./make.bash
    cd $HOME/go-1.2.1/src && ./make.bash

    If you prefer you can use ./all.bash to run the test suite.

  4. Setup aliases.
    Now you have built Go 1.2.1 and Go 1.1.2, you need to add the go tool for each version to your $PATH. A good way to add them to your path is to use an alias

    alias go-1.1.2=$HOME/go-1.1.2/bin/go
    alias go-1.2.1=$HOME/go-1.2.1/bin/go

    As an alternative, you could use ln -s to setup a symlink.

Usage

Now you can use these versions of the go tool anywhere you would use the normal go tool. For example

$ go-1.2.1 test $PACKAGE # compile and test $PACKAGE with Go 1.2.1
$ go-1.1.2 build $PACKAGE # build $PACKAGE with Go 1.1.2 (results in $CWD)

Caveats

There are several caveats when using this method.

  • Older versions of Go may not work with your system compiler.
    As you are building older software with newer tools you may encounter compilation failures.
    For example, Go versions less than 1.2 probably won’t work with XCode 5 due to the switch to clang.
  • Older versions of Go may not work with the Go subrepositories, or may not support newer features.
    For example the code.google.com/p/go.net/crypto/ssh package requires additional cipher suites added in Go 1.2 and won’t compile with Go 1.1.

Associative commentary follow up

This post is a follow up to Friday’s post on comments in Go.

Keith Rarick and Nate Finch pointed out that I had neglected to include two important practical use cases.

Build tags

I’ve previously written about how to use // +build tags to perform conditional compilation. In light of the previous post it’s probably worth recapping them here.

  • Build tags must use the // form.
    // +build right
    /* +build wrong */
  • Build tags must be their own comment, they must not be associated with a declaration.
    // Copyright Microsoft 1981
    
    // +build !darwin
    
    // Package basic implements Dartmouth's BASIC interpreter.  
    package basic
  • Build tags must occur early in the file. Only the first few lines of the file are scanned when filtering files by build tags.
    package wrong
    
    import "io"
    
    // +build whoops too,late

Copyright headers

The second is managing procedural issues if your licence requires you to include a copyright block at the top of source.

This was also briefly covered in the conditional compilation article. To recap

  • Most licences that recommend copyright headers require them to be at the top of the file, this means they must come before a package declaration, and its comment.
  • You probably don’t want the copyright header being part of your godoc, so the comment block holding the copyright header and the package declaration should be separated by a newline.
  • If you have any build tags, they should also appear between the copyright block and the package declaration. As all three are separate comment, they should be separated by a newline.
    // Copyright Commodore Inc, 1982
    
    // +build 6502
    
    // Package c64 is the computer for the masses, not the classes.
    package c64
  • If this leads to a verbose combination of copyright header, build tag, and package comment for godoc, consider moving the comment on the package declaration to a separate file. This is traditionally named doc.go and contains only the package declaration and its commentary.

Associative commentary

This is a quick post to discuss the rules of comments in Go.

To quickly recap, Go comments come in two forms

// everything from the double slash to the end of line is a comment
/* everything from the opening slash star, to the closing one is a comment */

As the first form declares that the remainder of the line is a comment, if you want to comment out more than one line, you need to do this

// this comment form is useful for
// commenting out sections of your code
// as you are working

The second form is also useful, and generally preferred, for large blocks of commentary

/*
The generally accepted rule when writing large
comment blocks in this form is to leave a newline
at the start and the end of your comment
*/

One important thing to note is that comments do not nest, thus

// // This is fine because everything from the double 
// // slash to the end of line is ignored

/* 
But, if you you were to start a new
/* comment inside an old one 
the closing star slash would close the comment block and */
this line would generate a syntax error 
*/

Association

A feature of the tools that consume Go source code, not the language, is the convention that a comment which directly preceeds a declaration is associated with that declaration.

// A Foo is a Fooid in the class of Endofoos.
func Foo() { .... }

Conversely, a comment followed by a newline stands alone, it’s just a comment.

package foo

// utility foos

func Quxx() { ... }

godoc allows comments to be associated with any of the top level declarations; package, var, const, type, and func.

import “C”

The trap that catches people out when they are using cgo is they don’t realise the significance of the newline, or more correctly, the lack of newline between their block of C code and the import "C" declaration.

package main

/*
#include "stdio.h"
*/
import "C"

func main() {
        C.printf(C.CString("Hello world\n"))
}

In this example the import "C" declaration is immediately preceded by the comment block containing our C code, in this case including stdio.h to obtain a reference to the printf function.

If there was a newline between the comment block and import "C" then the cgo preprocessor would not associate the comment with the import declaration and act as if #include "stdio.h" was never there.

% go run cgo.go
# command-line-arguments
37: error: use of undeclared identifier 'printf'

Update don’t forget to read the follow up to this post.

The empty struct

Introduction

This post explores the properties of my favourite Go data type, the empty struct.

The empty struct is a struct type that has no fields. Here are a few examples in named and anonymous forms

type Q struct{}
var q struct{}

So, if an empty struct contains no fields, contains no data, what use is it ? What can we do with it ?

Width

Before diving into the empty struct itself, I wanted to take a brief detour to talk about width.

The term width comes, like most terms, from the gc compiler, although its etymology probably goes back decades.

Width describes the number of bytes of storage an instance of a type occupies. As a process’s address space is one dimensional, I think width is a more apt than size.

Width is a property of a type. As every value in a Go program has a type, the width of the value is defined by its type and is always a multiple of 8 bits.

We can discover the width of any value, and thus the width of its type using the unsafe.Sizeof() function.

var s string
var c complex128
fmt.Println(unsafe.Sizeof(s))	 // prints 8
fmt.Println(unsafe.Sizeof(c))	 // prints 16

http://play.golang.org/p/4mzdOKW6uQ

The width of an array type is a multiple of its element type.

var a [3]uint32
fmt.Println(unsafe.Sizeof(a)) // prints 12

http://play.golang.org/p/YC97xsGG73

Structs provide a more flexible way of defining composite types, whose width is the sum of the width of the constituent types, plus padding

type S struct {
        a uint16
        b uint32
}
var s S
fmt.Println(unsafe.Sizeof(s)) // prints 8, not 6

The example above demonstrates one aspect of padding, that a value must be aligned in memory to a multiple of its width. In this case there are two bytes of padding added by the compiler between a and b.

Update Russ Cox has kindly written to explain what width is unrelated to alignment. You can read his comment below.

An empty struct

Now that we’ve explored width it should be evident that the empty struct has a width of zero. It occupies zero bytes of storage.

var s struct{}
fmt.Println(unsafe.Sizeof(s)) // prints 0

As the empty struct consumes zero bytes, it follows that it needs no padding. Thus a struct comprised of empty structs also consumes no storage.

type S struct {
        A struct{}
        B struct{}
}
var s S
fmt.Println(unsafe.Sizeof(s)) // prints 0

http://play.golang.org/p/PyGYFmPmMt

What can you do with an empty struct

True to Go’s orthogonality, an empty struct is a struct type like any other. All the properties you are used to with normal structs apply equally to the empty struct.

You can declare an array of structs{}s, but they of course consume no storage.

var x [1000000000]struct{}
fmt.Println(unsafe.Sizeof(x)) // prints 0

http://play.golang.org/p/0lWjhSQmkc

Slices of struct{}s consume only the space for their slice header. As demonstrated above, their backing array consumes no space.

var x = make([]struct{}, 1000000000)
fmt.Println(unsafe.Sizeof(x)) // prints 12 in the playground

http://play.golang.org/p/vBKP8VQpd8

Of course the normal subslice, len, and cap builtins work as expected.

var x = make([]struct{}, 100)
var y = x[:50]
fmt.Println(len(y), cap(y)) // prints 50 100

http://play.golang.org/p/8cO4SbrWVP

You can take the address of a struct{} value, when it is addressable, just like any other value.

var a struct{}
var b = &a

Interestingly, the address of two struct{} values may be the same.

var a, b struct{}
fmt.Println(&a == &b) // true

http://play.golang.org/p/uMjQpOOkX1

This property is also observable for []struct{}s.

a := make([]struct{}, 10)
b := make([]struct{}, 20)
fmt.Println(&a == &b)       // false, a and b are different slices
fmt.Println(&a[0] == &b[0]) // true, their backing arrays are the same

http://play.golang.org/p/oehdExdd96

Why is this? Well if you think about it, empty structs contain no fields, so can hold no data. If empty structs hold no data, it is not possible to determine if two struct{} values are different. They are in effect, fungible.

a := struct{}{} // not the zero value, a real new struct{} instance
b := struct{}{}
fmt.Println(a == b) // true

http://play.golang.org/p/K9qjnPiwM8

note: this property is not required by the spec, but it does note that Two distinct zero-size variables may have the same address in memory.

struct{} as a method receiver

Now we’ve demonstrated that empty structs behave just like any other type, it follows that we may use them as method receivers.

type S struct{}

func (s *S) addr() { fmt.Printf("%p\n", s) }

func main() {
        var a, b S
        a.addr() // 0x1beeb0
        b.addr() // 0x1beeb0
}

http://play.golang.org/p/YSQCczP-Pt

In this example it shows that the address of all zero sized values is 0x1beeb0. The exact address will probably vary for different versions of Go.

Wrapping up

Thank you for reading this far. At close to 800 words this article turned out to be longer than expected, and there was still more I was planning to write.

While this article concentrated on language obscura, there is one important practical use of empty structs, and that is the chan struct{} construct used for signaling between go routines

I’ve talked about the use of chan struct{} in my Curious Channels article.


Update Damian Gryski pointed out that I had omitted Brad Fitzpatrick’s iter package. I’ll leave it as an exercise to the reader to explore the profound implications of Brad’s contribution.

Thoughts on Go package management six months on

It has been roughly six months since I wrote about the problems I saw with package management in Go.

In the intervening months there has been lots of discussion; the issue continues to be one of the two most continually and hotly debated on the golang-nuts and go-pm mailing lists. No prizes for guessing what the other topic is.

In the current climate of mistrust there appears to be little support for delegating the problem of package management to a central repository. Informed by the soap box drama of the npm repository it looks like the days of standing up a central repository for a new language are over. Perl, Python, Java; enjoy it while it lasts.

In lieu of this, two camps have formed around complementary ideas.

The first camp, popularised by tools like Gustavo Niemeyer’s gopkg.in redirector, places stability of an import path, a versioned API if you like, as paramount. However this arrangement does not adequately address issues of build reproducibility or multiple revisions of a package being present in your final executable.

This camp also expresses a profound dislike for any sort of manifest file or other metadata in their repo. I find this position odd as most Go repos I find on GitHub are sprayed with Dockerfiles, Makefiles, Gruntfiles, Travisfiles, and all manner of CI or build metadata.

The second camp, informed by the statements of the Go team themselves, choose to vendor, or bring into their own repo the source of packages they depend on. The leading tool in this area is Keith Rarick’s godep.

This model should be very familiar to anyone in the Java community who used Ant. It is hard to argue that it was not a success for Java, at a cost of jar files committed to your repo (Hi SVN!). At least with godep you always carry around the source of your package, not some binary jar.

If this then is the current state of Go package management, so be it.

Channel Axioms

Most new Go programmers quickly grasp the idea of a channel as a queue of values and are comfortable with the notion that channel operations may block when full or empty.

This post explores four of the less common properties of channels:

  • A send to a nil channel blocks forever
  • A receive from a nil channel blocks forever
  • A send to a closed channel panics
  • A receive from a closed channel returns the zero value immediately

A send to a nil channel blocks forever

The first case which is a little surprising to newcomers is a send on a nil channel blocks forever.

This example program will deadlock on line 5 because the zero value for an uninitalised channel is nil.

package main

func main() {
        var c chan string
        c <- "let's get started" // deadlock
}

http://play.golang.org/p/1i4SjNSDWS

A receive from a nil channel blocks forever

Similarly receiving from a nil channel blocks the receiver forever.

package main

import "fmt"

func main() {
        var c chan string
        fmt.Println(<-c) // deadlock
}

http://play.golang.org/p/tjwSfLi7x0

So why does this happen ? Here is one possible explanation

  • The size of a channel’s buffer is not part of its type declaration, so it must be part of the channel’s value.
  • If the channel is not initalised then its buffer size will be zero.
  • If the size of the channel’s buffer is zero, then the channel is unbuffered.
  • If the channel is unbuffered, then a send will block until another goroutine is ready to receive.
  • If the channel is nil then the sender and receiver have no reference to each other; they are both blocked waiting on independent channels and will never unblock.

A send to a closed channel panics

The following program will likely panic as the first goroutine to reach 10 will close the channel before its siblings have time to finish sending their values.

package main

import "fmt"

func main() {
        var c = make(chan int, 100)
        for i := 0; i < 10; i++ {
                go func() {
                        for j := 0; j < 10; j++ {
                                c <- j
                        }
                        close(c)
                }()
        }
        for i := range c {
                fmt.Println(i)
        }
}

http://play.golang.org/p/hxUVqy7Qj-

So why isn’t there a version of close() that lets you check if a channel is closed ?

if !isClosed(c) {
        // c isn't closed, send the value
        c <- v
}

But this function would have an inherent race. Someone may close the channel after we checked isClosed(c) but before the code gets to c <- v.

Solutions for dealing with this fan in problem are discussed in the 2nd article linked at the bottom of this post.

A receive from a closed channel returns the zero value immediately

The final case is the inverse of the previous. Once a channel is closed, and all values drained from its buffer, the channel will always return zero values immediately.

package main

import "fmt"

func main() {
        	c := make(chan int, 3)
	        c <- 1
        	c <- 2
	        c <- 3
	        close(c)
	        for i := 0; i < 4; i++ {
		                fmt.Printf("%d ", <-c) // prints 1 2 3 0
	        }
}

http://play.golang.org/p/ajtVMsu8EO

The correct solution to this problem is to use a for range loop.

for v := range c {
        	// do something with v
}

for v, ok := <- c; ok ; v, ok = <- c {
	        // do something with v
}

These two statements are equivalent in function, and demonstrate what for range is doing under the hood.

Further reading

Pointers in Go

This blog post was originally a comment on a Google Plus page, but apparently one cannot create a href to a comment so it was suggested I rewrite it as a blog post.


Go pointers, like C pointers, are values that, uh, point to other values. This is a tremendously important concept and shouldn’t be considered dangerous or something to get hung up on.

Here are several ways that Go improves over C pointers, and C++, for that matter.

  1. There is no pointer arithmetic. You cannot write in Go
    var p *int
    p++

    That is, you cannot alter the address p points to unless you assign another address to it.

  2. This means there is no pointer/array duality in Go. If you don’t know what I’m talking about, read this book. Even if you have no intention of programming in C or Go, it will enrich your life.
  3. Once a value is assigned to a pointer, with the exception of nil which I’ll cover in the next point, Go guarantees that the thing being pointed to will continue to be valid for the lifetime of the pointer. So
    func f() *int { 
            i := 1
            return &i
    }

    is totally safe to do in Go. The compiler will arrange for the memory location holding the value of i to be valid after f() returns.

  4. Nil pointers. Yes, you can still have nil pointers and panics because of them, however in my experience the general level of hysteria generated by nil pointer errors, and the amount of defensive programming present in other languages like Java is not present in Go.

    I believe this is for two three reasons

    1. multiple return values, nil is not used as a sentinel for something went wrong. Obviously this leaves the question of programmers not checking their errors, but this is simply a matter of education.
    2. Strings are value types, not pointers, which is the, IMO, the number one cause of null pointer exceptions in languages like Java and C++.
      var s string // the zero value of s is "", not nil
    3. In fact, most of the built in data types, maps, slices, channels, and arrays, have a sensible default if they are left uninitialized. Thanks to Dustin Sallings for pointing this out.

avr11: performance measurements

Mea culpa

In my first post I said that I believed the simulator performance was 10x slower than a real PDP-11/40, sadly it looks like that estimate was well off by at least another factor of 10. Yup, 100x slower than the machine I tried to simulate. At least.

More accurate profiling

avr11 and home brew frequency counter

avr11 and home brew frequency counter

After my last post a commenter suggested that my counter based approach could be improved. It had a high overhead, and, as I discovered, was overstating the performance of the simulator.

Adapting Joey’s approach a little I built a simple contentious frequency counter by adapting this Instructable.

Doing some calibration at the local hacker space with some other frequency counters and generators I believe the counter is accurate in the hundreds of kilohertz range, so certainly good enough for the job at hand.

The results

As I mentioned in a previous post there are two important timing points in the avr11 bootup cycle. The first is sitting at the

@

prompt, waiting for someone to type unix. At this stage avr11 running on the Atmega 2560 was processing 15,477 instruction/second. At this point the program is executing from a low area of memory and the MMU is not enabled.

Once unix is entered and the kernel has booted to the

#

prompt, the simulation rate drops to around 13,337 instructions/second. Executing a simple command like DATE, the simulation drops again to between 10,500 and 11,000 instructions/second.

Bringing a knife to a gun fight

Arduino Due

Arduino Due

As much as I love the minimalist idea of building a ’70′s era mini computer on an 8 bit microcontroller, it looks like this just isn’t going to be practical to build a usable simulator on the 16mhz Atmel 2560.

So, it was time to bring out the big guns. A quick visit to the Little Bird Electronics store and I had an Arduino Due on order.

The SAM3X chip at the heart of the Arduino Due is a full 32bit ARM processor which runs the Thumb2 instruction set. It also runs at a much higher clock rate, 84Mhz, vs the 16Mhz of the Atmega parts1.

avr11 running on an Arduino Due with a Bus Pirate frequency counter.

avr11 running on an Arduino Due with a Bus Pirate frequency counter.

The night the Arduino Due arrived I modified avr11 to run on it. The result, with just a recompilation of the code for the SAM3X processor; 88,000 instructions/second.

Depending on how you cut it, this is between 5 and 8 times faster

 

So just how fast was a PDP-11/40

I recently came across Appendix C, in the 1972 PDP-11/40 processor handbook which provides formulas for calculating instruction timings taking into account the time to fetch the operands and process the instructions.

Source and destination operand times depending on the mode (register, indirect, register indirect, absolute, etc)

Source and destination operand times depending on the mode (register, indirect, register indirect, absolute, etc)

Screenshot from 2014-02-16 12:20:55

Sample instruction timings, these times are in addition to the time to fetch the source and destination operand.

So, now we can compute how long a PDP-11/40 took to execute an instruction, maybe this could be used to give some idea of how well avr11 was performing in simulation.

Taking the instruction

ADD R0, R1

Which adds the value in R0 to R1 and stores the result back in R1 should take 0.99us as R0 and R1 are registers (mode 0). For this simple instruction, assuming ideal conditions; no interrupts, no contention on the UNIBUS, etc, means the PDP-11/40 could have executed 1 million 16bit ADDs per second.

So, what can avr11 running on a 84Mhz Arduino Due do ?

I modified avr11 to execute ADD R0, R1 over and over again (effectively disabling the program counter increment) and timed the results.

Freq: 85344

Well, that isn’t great, 8.5% of the real simulation speed. However, that was for a best case instruction with no operand overhead. What if the instruction was more complex, for example ADD (R0), (R1)2, add the value at the address stored in R0 to the value in the address at R1. Using the tables above the timing on a real PDP-11/40 would have been 3.32 microseconds, 3.32x times slower, just over 300,000 instructions a second.

Altering avr11 to execute this new instruction sequence results in 63,492 instructions/second. Not exactly the result we were looking for, but putting the results into a table reveals something interesting.

Instruction PDP-11/40 avr11 (Arduino Due) Relative performance
ADD R0, R1 1,000,000 hz 85,344 hz 8.5%
ADD (R0), (R1) 301,204 hz 63,493 hz 21%

So, perhaps all is not lost. Maybe with a more realistic instruction stream the performance of avr11 is not in the single digits anymore. Being able to deliver 25%, 30% or even 40% of a real PDP-11/40 would be a significant milestone, and maybe one that is possible.

Next steps

Now that I have switched to the Arduino Due I’m going to have to revisit several solved issues.

The first is memory. The Due only has 96kb of SRAM, and while I can boot V6 UNIX in that tiny amount of memory, there is roughly 10.2 kilobytes of memory free for user programs once you get to the shell. For the short term I’ll have to revert to my SPI SRAM shield, modifying it to use the Arduino R3 spec’s IOREF pin rather than blindly dumping 5v across the input pins.

The second problem is the micro SD card. This was a question I had dodged originally by using the Freetronics EtherMega, but as the Ardunio Due has no onboard microSD card adapter I’m going to use something like the Sparkfun microSD shield3.


  1. I did briefly consider the Freetronics Goldilocks which is clocked at 24Mhz in a more 5v friendly format, but they aren’t easily available.
  2. In the 1970′s this instruction was written as ADD @R0, @R1 but I’ve chosen to use the more familiar GNU as form.
  3. The Sparckfun sheild has to be used in ‘soft SPI’ mode as the board itself expects the Arduino Uno style SPI interface broken out on pins D9 – D12 which is not available on any of the boards in the Due/Mega extended form factor.

avr11: profiling on the Atmega

This is a post about the performance of my avr11 simulator. Specifically about the performance improvements I’ve made since my first post, and the surprises I’ve encountered during the process.

Old school profiling

Because avr11 runs directly on the Atmega 2560 microcontroller, there is no simple way to measure the performance of various pieces of code externally.

I am aware that Atmel studio contains a fairly accurate simulator, but that package only runs on Windows. It also wasn’t clear if it can simulate the microSD card and xmem boards that avr11 requires. That left me needing to improvise some way of measuring the relative performance of avr11 while I made changes to the code.

The solution I came up with was a counter that increments every time cpu::step(), the function that processes one instruction, is called. The counter is defined as uint16_t so rolls over every 2^16 instructions. Combined with the built in millis() function, which prints the number of milliseconds since reset, I had a crude way of timing how long avr11 takes to dispatch instructions.

    cpu::step();
    if (INSTR_TIMING && (++instcounter == 0)) {
      Serial.println(millis());
    }

From there the process became very iterative. Each evening I would spend a few hours playing with various tweaks in the Go version of avr11, then I would transpose them over to the C++ code a piece at a time, testing as I went and watching the cycle count.

Some promising results

TL;DR - Instruction dispatch was an average of 144 microseconds with the mmu disabled1, 160 with the mmu enabled. It is now 60 microseconds with the mmu disabled, and a few microseconds more with the mmu enabled. 2.4x improvement.

The first big improvement came from switching from the Arduino SD classes to the SdFat library. SdFat gives you more control over the interactions with the card, and also lets you set the speed of the SPI bus, on 16 Mhz Atmels, to full speed, rather then the previous 1/2 speed maximum. This gave me an 8-10% improvement in memory access times to the SPI SRAM shield.

The next big improvement came from switching from the SPI SRAM shield to a Rugged Circuits’ QuadRAM board. This eliminates the SPI bus entirely by extending the address space of the Atmega 2560 to the full 64 kilobytes and adding banking to access up to 512 kilobytes. This gave another 20% improvement.

After that things got harder. The remaining 30 microsecond improvements came from careful rewriting of all the hot paths and reducing the data types involved to their smallest possible type.

A surprising discovery

The most surprising discovery of all was made as I started to comment out pieces of the code to get a baseline for the inner loop of the simulator.

After whittling it down to simply fetching the instruction at the current PC I’d arrived at a baseline of 21 microseconds. That is just under 50 kilohertz simulated performance; not great, especially considering this isn’t processing the instruction.

Digging a little further I discovered that this one shift to set the correct memory bank costs 4-5 microseconds. Out of a total time of 21 microseconds, that is close to 25% in just one line.

  if (a < 0760000 ) {     
    // a >> 15 costs nearly 5 usec !!
    uint8_t bank = a >> 15;
    xmem::setMemoryBank(bank, false);
    return intptr[(a & 0x7fff) >> 1];
  }

In retrospect this shouldn’t have been a surprise. The Atmega processor is an 8 bit processor. It has some provisions for 16 bit quantities, but they are expensive. 32 bit quantities probably receive no hardware support, and I think in this instance avr-gcc is calling a helper function for the unaligned shift.

A quick hack using a cast and some shifts shaved 4 microseconds off the inner loop, clearly attributable to this inefficient shift. The proper fix will probably involve more radical surgery using a union datatype.

Conclusion

If this post has a moral, it would have to be

Don’t guess, always profile your code.

As for the performance of avr11, it stands a 16 kilohertz simulated clock speed. Possibly with some extreme C surgery this can be improved to 20 kilohertz. Past that, the possibilities running on the Atmega 2560 look grim.

I’d be interested in hearing from other Atmel users about their profiling methods.


  1. The PDP-11/40 I am simulating has an 18 bit address space. However the CPU is only 16 bit and cannot directly generate 18 bit addresses so a memory management unit is used to rewrite addresses as they leave for the UNIBUS. The MMU adds a small overhead to memory reads and writes when enabled. In the original hardware that was somewhere on the order of 90 nanoseconds. In simulation it’s probably under 5 microseconds.