In my first post I said that I believed the simulator performance was 10x slower than a real PDP-11/40, sadly it looks like that estimate was well off by at least another factor of 10. Yup, 100x slower than the machine I tried to simulate. At least.
More accurate profiling
After my last post a commenter suggested that my counter based approach could be improved. It had a high overhead, and, as I discovered, was overstating the performance of the simulator.
Adapting Joey’s approach a little I built a simple contentious frequency counter by adapting this Instructable.
Doing some calibration at the local hacker space with some other frequency counters and generators I believe the counter is accurate in the hundreds of kilohertz range, so certainly good enough for the job at hand.
As I mentioned in a previous post there are two important timing points in the avr11 bootup cycle. The first is sitting at the
prompt, waiting for someone to type
unix. At this stage avr11 running on the Atmega 2560 was processing 15,477 instruction/second. At this point the program is executing from a low area of memory and the MMU is not enabled.
unix is entered and the kernel has booted to the
prompt, the simulation rate drops to around 13,337 instructions/second. Executing a simple command like
DATE, the simulation drops again to between 10,500 and 11,000 instructions/second.
Bringing a knife to a gun fight
As much as I love the minimalist idea of building a ’70’s era mini computer on an 8 bit microcontroller, it looks like this just isn’t going to be practical to build a usable simulator on the 16mhz Atmel 2560.
So, it was time to bring out the big guns. A quick visit to the Little Bird Electronics store and I had an Arduino Due on order.
The SAM3X chip at the heart of the Arduino Due is a full 32bit ARM processor which runs the Thumb2 instruction set. It also runs at a much higher clock rate, 84Mhz, vs the 16Mhz of the Atmega parts1.
The night the Arduino Due arrived I modified avr11 to run on it. The result, with just a recompilation of the code for the SAM3X processor; 88,000 instructions/second.
Depending on how you cut it, this is between 5 and 8 times faster
So just how fast was a PDP-11/40
I recently came across Appendix C, in the 1972 PDP-11/40 processor handbook which provides formulas for calculating instruction timings taking into account the time to fetch the operands and process the instructions.
So, now we can compute how long a PDP-11/40 took to execute an instruction, maybe this could be used to give some idea of how well avr11 was performing in simulation.
Taking the instruction
ADD R0, R1
Which adds the value in
R1 and stores the result back in
R1 should take 0.99us as
R1 are registers (mode 0). For this simple instruction, assuming ideal conditions; no interrupts, no contention on the UNIBUS, etc, means the PDP-11/40 could have executed 1 million 16bit ADDs per second.
So, what can avr11 running on a 84Mhz Arduino Due do ?
I modified avr11 to execute
ADD R0, R1 over and over again (effectively disabling the program counter increment) and timed the results.
Well, that isn’t great, 8.5% of the real simulation speed. However, that was for a best case instruction with no operand overhead. What if the instruction was more complex, for example
ADD (R0), (R1)2, add the value at the address stored in
R0 to the value in the address at
R1. Using the tables above the timing on a real PDP-11/40 would have been 3.32 microseconds, 3.32x times slower, just over 300,000 instructions a second.
Altering avr11 to execute this new instruction sequence results in 63,492 instructions/second. Not exactly the result we were looking for, but putting the results into a table reveals something interesting.
|Instruction||PDP-11/40||avr11 (Arduino Due)||Relative performance|
||1,000,000 hz||85,344 hz||8.5%|
||301,204 hz||63,493 hz||21%|
So, perhaps all is not lost. Maybe with a more realistic instruction stream the performance of avr11 is not in the single digits anymore. Being able to deliver 25%, 30% or even 40% of a real PDP-11/40 would be a significant milestone, and maybe one that is possible.
Now that I have switched to the Arduino Due I’m going to have to revisit several solved issues.
The first is memory. The Due only has 96kb of SRAM, and while I can boot V6 UNIX in that tiny amount of memory, there is roughly 10.2 kilobytes of memory free for user programs once you get to the shell. For the short term I’ll have to revert to my SPI SRAM shield, modifying it to use the Arduino R3 spec’s IOREF pin rather than blindly dumping 5v across the input pins.
The second problem is the micro SD card. This was a question I had dodged originally by using the Freetronics EtherMega, but as the Ardunio Due has no onboard microSD card adapter I’m going to use something like the Sparkfun microSD shield3.
- I did briefly consider the Freetronics Goldilocks which is clocked at 24Mhz in a more 5v friendly format, but they aren’t easily available.
- In the 1970’s this instruction was written as
ADD @R0, @R1but I’ve chosen to use the more familiar GNU as form.
- The Sparckfun sheild has to be used in ‘soft SPI’ mode as the board itself expects the Arduino Uno style SPI interface broken out on pins D9 – D12 which is not available on any of the boards in the Due/Mega extended form factor.