avr11: how to add 256 kilobytes of ram to an Arduino

18 bits of core memory

In¬†Schmidt’s original javascript simulator, and my port to Go, the 128 kilowords (256 kilobytes) of memory connected to the PDP-11 is modeled using an array. This is a very common technique as most simulators execute on machines that have many more resources than the machines they impersonate.

However, when I started to port my Go based simulator to the Arduino, the problem I faced was the Atmel does not support an address space larger than 64 kilobytes, and more immediate, all the 8 bit Atmega models ship with somewhere between 2kb and 8kb of addressable memory.

Version 0, use the Arduino itself

Deciding to put that problem to the side until I saw if the job of rewriting (and dusting off my long obsolete C coding skills) was achievable, the first version of the simulator I wrote did use a simple array for UNIBUS memory.

#define MEMSIZE 2048
uint16_t memory[MEMSIZE];

Using an Atmega2560 I was able to create a memory of 4096 bytes, which was enough to bring up the simulator and run the short 29 word bootstrap program which loaded the V6 Unix bootloader into memory.

Sadly the bootloader would fault the simulated CPU almost immediately as the first thing the bootloader does is zero the entire address space, quickly running past the end of the array and overwriting something important.1

However, this did let me get to the point that the CPU and RK11 drive simulators were working well, not to mention figuring out how to write a large multi file program using the Arduino IDE environment.

Memory lives somewhere else

A revelation I have recently arrived at is that, from the point of view of a CPU, memory is not part of the processor. Data in a real CPU moves into and out of the device in a very orchestrated manner and in avr11 this is no different.

Any instruction that references memory, either directly loading data into a register via the MOV instruction, or indirectly using one of the PDP-11’s addressing modes always boiled down to a read or write function which linked the CPU to the simulated UNIBUS.

For example, in the Go version of the simulator, memory []uint16 belongs to the unibus struct. In the C++ version for Atmel this is enforced further by there being no extern uint16_t memory[MEMSIZE]; definition exposed in unibus.h.

In short, there is no way for the CPU to observe memory, it has to ask the UNIBUS to read or write data on its behalf, and this gave me the opportunity to solve the problem of limited memory space available on the Atmel devices I had access to.

Version 1, I am a bad person

At this point I’m sort of telling the story backwards. I had found a product which would give me far more memory than I needed for this project, but it took several weeks to arrive and comes as a kit, which will involve some tricky SMD soldering.

In the interim I found myself during the Christmas to New Years break with a simulator that I felt was working well enough to try something more adventurous if I could only find some way to emulate the backing array for the core memory. I didn’t really care about speed, I just wanted to see if the simulator could handle the more complicated instructions of the Unix kernel.

“Why not use the SD card?” I said to myself. I was after all already loading some of the blocks off the RK05 disk pack image from the card, so why not just make another image file and make that back the core memory. The mini SD card probably wouldn’t last very long, but I have a pile of cheap cards so why not try it.

 void pdp11::unibus::write8(uint32_t a, uint16_t v) {
    if (a < 0760000) {
       if (a & 1) {
         core.write(v & 0xff);
         //memory[a >> 1] &= 0xFF;
         //memory[a >> 1] |= v & 0xFF << 8;

All it took was setting up a new SD::File, called core and rewriting the access to the memory array with seeks and writes to the backing file (obviously doing the same for the read paths).

Amazingly it worked, on the second or third attempt, and although it was very slow I was able to use this technique to boot the simulator a very long way into the Unix boot process. I posted a video of the bootup to instagram.

Even more amazingly I didn’t wear out the mini SD card, and still haven’t. This is probably mostly due to the wear leveling built into the card2 but I also stumbled into a fortuitous property of the SD card itself, and the Arduino drivers on top.

All SD cards, well certainly SD and mini SD cards, mandate that you read and write to them in units of pages. Pages happen to be 512 bytes, a unit which clearly descends from the days of CF cards which emulated IDE drives.

This means the Arduino SD class maintains a buffer of 512 bytes, (which comes out of your precious SRAM allotment) that in effect operated as a cache for my horrible all swap based memory system. For example, when the bootloader program zeros all the memory in the machine, rather than writing to the SD card 253,952 times3, the number of writes was probably much smaller, say 500 writes.

Obviously as it was not designed for this purpose the cache would fail badly during a later part of the bootup where the kernel code is copied (about 90 kilowords of it) from one memory area to another. Each read or write would land on a different SD card page, causing it to flush the old buffer, read in the new buffer, then reverse the process.

But it worked, and gave me confidence to investigate some more ambitious designs for a memory solution.

In my next blog post I’ll talk about version 2 of my memory system, the one that I finally got me booting to the # prompt.

  1. I considered using a SAM3X atmel32 style board, like the Arduino Due as they have both a more powerful CPU and close to 96 kilobytes of addressable memory, but that is only 48 kilowords, less than half of what I need to simulate the full 128 kiloword address space of the PDP-11.
  2. The internet is divided on the question of “Do cheap mini SD cards have wear leveling?”. Part of the problem is the definition of cheap changes rapidly over time, making advice written 12 months ago inaccurate. My view is that cards of any capacity you can buy today require so much error correction logic that you get the wear leveling logic for free.
  3. On the PDP the top 4 kilo words of memory (8kb) is reserved for the IO devices, so while the UNIBUS talks in 18 bit addresses, the top 4096 words is not mapped to memory, and doesn’t need to be cleared. In fact clearing the IO page memory would be catastrophic.