One Week Wonder

Saturday, 5 July 2014

BootJacker: The Amazing AVR Bootloader Hack!

There's an old adage that says if you don't know it's impossible you could end up achieving it. BootJacker is that kind of hack: a way for ordinary firmware on an AVR to reprogram its bootloader. It's something Atmel's manual for AVR microcontrollers with Bootloaders says is impossible (Note the italics):

27.3.1 Application Section.
The Application section is the section of the Flash that is used for storing the application code. The protection level for the Application section can be selected by the application Boot Lock bits (Boot Lock bits 0), see Table 27-2 on page 284. The Application section can never store any Boot Loader code since the SPM instruction is disabled when executed from the Application section.

Here's the background: I'm the designer of FIGnition, the definitive DIY 8-bit computer. It's not cobbled together from hackware from around the web, instead three years of sweat and bare-metal development has gone into this tiny 8-bitter. I've been working on firmware version 1.0.0 for a few months; the culmination of claiming that I'll put audio data transfer on the machine (along with a fast, tiny Floating point library about 60% of the size of the AVR libc one).

Firmware 1.0.0 uses the space previously occupied by its 2Kb USB bootloader and so, needs its own migration firmware image to copy the V1.0.0 firmware to external flash. The last stage is to reprogram the bootloader with a tiny 128b bootloader which reads the new image from external flash. Just as I got to the last stage I came across section 27.3.1, which let me know in no uncertain terms that I was wasting my time.

I sat around dumbstruck for a while ("How could I have not read that?") before wondering whether[1], crazies of crazy, imagining that a solution to the impossible might actually lead me there. And it turns out it does.

The solution is actually conceptually fairly simple. A bootloader, by its very nature is designed to download new firmware to the device. Therefore it will contain at least one spm instruction. Because the spm configuration register must be written no more than 4 cycles before the spm instruction it means there are very few sequences that practically occur: just sts, spm or out, spm sequences. So, all you need to is find the sequence in the bootloader section; set up the right registers and call it.

However, it turned out there was a major problem with that too. The V-USB self-programming bootloader's spm instructions aren't a neat little routine, but are inlined into the main code; so calling it would just cause the AVR to crash as it tried to execute the rest of the V-USB bootloader.

Nasty, but again there's a solution. By using a timer clocked at the CPU frequency (which is easy on an AVR), you can create a routine in assembler which sets up the registers for the Bootloader's out, spm sequence; calls it and just at the moment when it's executed the first cycle of the spm itself, the timer interrupt goes off and the AVR should jump to your interrupt routine (in Application space). The interrupt routine pops the bootloader address and then returns to the previous code - which is the routine that sets up the out, spm sequence. This should work, because when you apply spm instructions to the bootloader section the CPU is halted until it's complete.

Here's the key part of BootJacker:

The code uses the Bootloader's spm to first write a page of flash which also contains a usable out, spm sequence and then uses that routine to write the rest (because of course you might end up overwriting the bootloader with your own new bootloader!)

BootJacker involves cycle counting, I used a test routine to figure out the actual number of instructions executed after you set the timer for x cycles in the future (it's x-2). In addition I found there was one other oddity: erase and writes always have a 1 cycle latency after the SPM in a bootloader. I fixed this with a nop instruction in my mini bootloader.

This algorithm, I think is pretty amazing. It means that most bootloaders can in fact be overwritten using application firmware containing a version of BootJacker!

[1] As a Christian, I also have to fess' up that I prayed about it too. Not some kind of desperation thing, but some pretty calm prayer, trusting it'll get sorted out :-)

Saturday, 31 May 2014

Gini Sim

The Gini Index is a measure of wealth distribution.

GiniSim is a simple Javascript program which demonstrates, in simple terms, flaws in free market economics, by showing that trading freely will lead to gross inequality. Copy the program into a .html file; save it and then open it in a browser: you can stop it by pressing the Stop button. Alternatively, you can download and run a simple Java version, GiniSim.jar from here.

Each bar is the wealth of a person, and the simulation starts with everyone having $10 (or £10, or 10€).

Each step simulates a free trade transaction, two monetary notes are picked at random and the person the first one belongs to pays the person the second one belongs to. Intuitively, you’d think this would average out: sometimes some people win and sometimes others would.

In reality what happens is that whenever a person accumulates wealth, it makes it more likely that someone poorer will give money to them. This is due to the fact that the chances of being paid in a transaction is proportional to their wealth - so if someone loses money from a transaction, they become less likely to gain in a future transaction.

How does this correspond to an idealism of the free market? It corresponds, because exchanges take place on the basis of being indifferent towards cash. Therefore when people gain wealth, their wealth acts as a bigger wealth footprint and people (who want to buy things) notice the cash more. If you’re poorer, that doesn’t happen, you’re not noticed, because your visible presence is the cash you hold - you literally disappear.

The important thing to note is that GiniSim demonstrates inequality without the agents involved behaving maliciously. All it does is play fairly towards cash (rather than people). It's a demonstration of a power law.

How does GiniSim1 not correspond to classical economics? GiniSim1 correspond to free trade under mercantilism, which is a zero-sum economic theory.

Here’s GiniSim, copy everything in yellow:

<!DOCTYPE html>

<html>

<body>

Your browser does not support the HTML5 canvas tag.

</canvas>

var c=document.getElementById("myCanvas");

var ctx=c.getContext("2d");

var timer;

var rects=0;

var cash=[0],people=[0];

var kStartCash=10;

var kNumPeople=100;

function initGini() {

var ix,iy;

for(ix=0;ix<kNumPeople;ix++) {

people[ix]=kStartCash; // everyone starts with $10.

for(iy=0;iy<kStartCash;iy++) {

cash[ix*kStartCash+iy]=ix; // each $1 is owned by a person.

}

function incomeSwap(from,too) {

var ix;

for(ix=0;ix<kNumPeople*kStartCash;ix++) {

if(cash[ix]==from)

cash[ix]=too;

else if(cash[ix]==too)

cash[ix]=from;

}

ix=people[from];

people[from]=people[too];

people[too]=ix;

}

function exchange() {

var aNote=Math.floor(Math.random()*kStartCash*kNumPeople);

var aOwner=cash[aNote];

var aNewNote=Math.floor(Math.random()*kStartCash*kNumPeople);

var aNewOwner=cash[aNewNote];

if(people[aOwner]>0 ) {

// can't take cash from people who have nothing.

//}while(aNewOwner==aOwner);

people[aOwner]-=1;

people[aNewOwner]+=1;

cash[aNote]=aNewOwner;

while(aOwner>0 && people[aOwner]<people[aOwner-1]) {

incomeSwap(aOwner,aOwner-1);

aOwner--;

if(aOwner==aNewOwner)

aNewOwner++;

}

while(aNewOwner<kNumPeople-1 && people[aNewOwner]>people[aNewOwner+1]) {

incomeSwap(aNewOwner,aNewOwner+1);

aNewOwner++;

}

function drawImage() {

var ix;

for(ix=0;ix<kNumPeople;ix++) {

ctx.fillStyle="#c0c0c0";

ctx.fillRect(ix*8,0,8,600-people[ix]);

ctx.fillStyle="#000000";

ctx.fillRect(ix*8,600-people[ix],8,600);

}

exchange();

}

initGini();

timer=setInterval(drawImage,1);

</script>

</body>

</html>

[Edit: Added link to GiniSim.jar on 20150321. Edit: I knew from the simulations that wealth always gravitates to the rich, but my reasoning for the mechanism was incorrect, because I was trying to derive it from the probabilities of a single transaction. In reality, the mechanism is due to how the probabilities change between one transaction and the next. 20150626.]

Tuesday, 1 April 2014

The Royal Fracking Society

This blog takes a slightly more provocative title to raise questions about the role of the UKs Royal Society with respect to hydraulic fracturing approvals.

It is estimated that there are 11,000Km3 of extractable methane gas reserves in the United Kingdom. In view of increasing questions about energy security worldwide, and the reported economic boom for shale gas/oil in the United States, the British government is keen to exploit these reserves as much as possible.

This gas will be extracted using a technique called Hydraulic Fracturing - or Fracking - which is proving controversial not least because of concerns about water supplies being contaminated by fracking chemicals, but health related issues, questionable economics and geological issues such as induced earthquakes. As as result there have been widespread protests around the world, notably in the USA and more recently in Europe and here in the United Kingdom.

Large-scale Fracking the UK is in its early stages and activists have been able to draw attention to the cause by protesting at the first few sites, most notably in Balcome in West Sussex and Barton Moss in Manchester. However, the government has already approved of over 600 Fracking permits and it will not be possible to protest at anything more than a small fraction of them.

Protests work, really by massing public conscience against (or in favour of) a cause rather than by physically forcing organisations to comply. The key thing is being able to raise a person's conscience enough to act. It is therefore politically important for the government to argue the case for Fracking: positively, by citing potential economic and employment benefits; energy security, cost and safety regulations as well as negatively by portraying protestors as being disruptive or irrelevant etc. At the moment there is little consensus on fracking amongst the British public.

Today I came across this report on Fracking approvals at Barton Moss in Manchester and one paragraph particularly caught my attention:

This 3-D seismic will also fulfil UKOOG Shale Gas guidelines, the recommendations of the Royal Society and Royal Academy of Engineers and is a requirement of the Department of Energy and Climate Change consent process prior to any shale gas hydraulic fracturing and flow testing operations being undertaken.

It looks to me that Fracking consent can only be granted following the recommendations of the Royal Society (amongst other organisations). Now my thinking is that the Royal Society's brief is that they are only allowed to object to a fracking operation on scientific grounds and the grounds they'd be asked to review would be its viability and potential safety.

However, because it is a scientific body, its responsibility is broader than a narrow remit given by the government; it can (and in fact has a duty to) object on any scientific ground if the scientific consensus warrants it. And, given that the consensus amongst (albeit climate) scientists is around 97% or more that the continued use of fossil fuels will be globally catastrophic, it seems to me that there are good grounds to make the case to the Royal Society itself and get them to act on the basis of their scientific conscience. Here it's important to state that the extra-remit grounds for objection would be on the basis of carbon emissions, not health, economic or geological concerns. The objective basis is that we cannot burn more than a tiny fraction of fossil fuel reserves without causing dangerous, essentially irreversable climate change.

Because, without their approval the DECC can't currently give consent and there's already a high degree of consensus. I imagine that if they were persuaded to publicly object, it would mean that the government would change the law so as not to require the Royal Society's recommendations, but this would have potentially severe negative consequences for British public opinion and also actual safety issues. This would be a major step forward for eliminating Fracking in the UK.

[Note: additional links and labels to be added in a later update]

Tuesday, 18 March 2014

Caller Convention Calamities

Hello AVR people, let's talk about interrupts and the mess calling conventions have made of them!

Background

Back in the early 1990s I had my first long-term job working at a place called Micro-Control Systems developing early solid-state media drivers. These were long ISA cards for PCs stuffed with either battery-backed Static RAM, EPROM or Intel Flash chips that gave you a gargantuan 3Mb per card up to 12Mb of storage with 4 cards.

These cards were bootable (they emulated hard disks) and the firmware was written entirely in 16-bit 8086 assembler with a pure caller-save convention. The thinking behind caller-save conventions is that a subroutine doesn't save registers on entry; instead, the caller of a subroutine saves any registers it's using that are also being used by the callee before doing the call and then restoring them as necessary later. Let's assume for example, we have ProcTop which calls ProcMid a few times which calls ProcLeaf a few times, which doesn't call anything. Caller-save conventions aim to improve performance because leaf procedures, like ProcLeaf here don't need to save registers.

However, I found that caller-saving lead to a large number of hard to trace bugs. This happens because every time you change ProcLeaf you have the potential to use new registers and this can have an effect on the registers ProcMid needs to save or potentially the registers ProcTop needs to save. But also, if you change ProcMid and use new registers you might find you need to save them whenever you call ProcLeaf (if ProcLeaf uses them) as well as having to check ProcTop for conflicts.

This means you need to check an entire call tree whenever you change a subroutine and if you need to save additional registers in ProcMid or ProcTop you might end up restructuring that code etc (which means more testing).

Nasty, nasty, nasty and all because a caller-save convention is used. In the assembler code I wrote (and still write), I use a pure callee-register saving convention. Ironically, caller-saving doesn't even save much performance because pushing and popping registers at the beginning and the end usually occupies only a small fraction of the time spent within a routine.

AVR Interrupts

GCC 'C' calling conventions use a mixture of caller-saving and callee-saving conventions. Most registers below Reg 18 are caller-saved; most of the rest are callee-saved. This, I think, is seen to be a compromise between performance and code-density. I personally wouldn't use caller saving at all, even in a compiler, but for interrupts it's an absolute disaster for the AVR.

That's because every time you need to use a subroutine within an interrupt the interrupt routine itself must then save absolutely every caller-saving register, just in case something used by any of the interrupt's call tree uses them; because of course when dealing with interrupts the compiler can't make assumptions about what registers are safe to use. As a result interrupt latency on an AVR shifts from being excellent (potentially as little as around 12 clock cycles, under 1µs at 20MHz, to 3 times as long, 36 clock cycles, around 1.8µs at 20MHz).

This kind of nonsense isn't just reserved for AVR cpus, the rather neat Cortex M0/M3 etc architectures save, as standard, 8x32-bit registers on entry to every subroutine for the same reason to make it easy for compilers to target Cortex M0 for real-time applications.

What I really want when I write interrupt routines, is to have some control over performance degradation. I want additional registers to be saved only when they need to be as only as much as is actually needed. In short, I want callee-saving and avr-gcc (amongst its zillions of options) doesn't provide that.

For the up-and-coming FIGnition Firmware 1.0.0 I decided to create a tool which would do just that. You use it by first getting GCC to generate assembler code using a compile command such as:

avr-gcc -Wall -Os -DF_CPU=20000000 -mmcu=atmega168 -fno-inline -Iinc/ -x assembler-with-cpp -S InterruptSubroutineCode.c -o InterruptSubroutineCode.s

The interrupt code should be structured so that the top-level interrupt subroutine ( called IntSubProc below) is listed last and all the entire call-tree required by IntSubProc is contained within that file. Then you apply the command line tool:

$./IntWrap IntSubProc InterruptSubroutineCode.s src/InterruptSubroutineCodeModified.s

Where IntSubProc is the name of the interrupt subroutine that's called by your primary interrupt routine. The interrupt routine itself has an assembler call somewhere in it, e.g

asm volatile("call IntSubProc");

That way, GCC won't undermine your efforts by saving and restoring all the caller-saved registers.

IntWrap analyses the assembler code in InterruptSubroutineCode.s and works out which caller-saved registers actually need to be saved according to the call-tree in the code in InterruptSubroutineCode.s. The analysis stops after the ret command for IntSubProc.

The current version of IntWrap is written using only the standard C library and is currently, I would say, Alpha quality. It works for FIGnition, the DIY 8-bit computer from nichemachines :-)

Download from Here.

How Does It Work?

IntWrap trawls through the assembler code looking for subroutines and determining which registers have been modified by the subroutine. The registers that need to be saved by IntSubProc are all the caller-saved registers that have been modified by IntSubProc's call tree, but haven't been saved. To make it work properly, IntWrap must eliminate registers that were saved mid-way down the call-tree. Consider: IntSubProc saves/restores r20 and calls ProcB which saves/restores r18, but modifies r19 and ProcB calls ProcC which modifies r18, r20 and r21. IntWrap should save/restore r19 because ProcB modifies it and should save/restore r21 because ProcC modifies it. But it doesn't need to save r18, because even though ProcC modified it, ProcB save/restored it.

The algorithm works by using bit masks for the registers. For every procedure, it marks which registers have been save/restored and which have been modified and the subroutine's modified registers are modifiedRegs&~saveRestoredRegs . Call instructions can be treated the same way as normal assembler instructions.

IntWrap avoids having to construct a proper call-tree graph by re-analyzing the code if it finds that it can't fully evaluate a call to a subroutine. In this way the modified register bitmasks bubble up through the call-tree with repeated analysis until it's all been solved.

Thursday, 30 January 2014

Rain, Rain Won't Go Away

A couple of weeks ago I thought the UK was starting to turn a corner in recognizing the possibility that our weather is being affected by climate change. The connection between climate change and extreme weather reporting had declined in the 3 years from 2009 from 25% to about 11% in 2012, despite the extensive floods we had that year.

2013 had Century-level floods in Eastern Europe, India, China, Russia, Canada and Oregon but we were largely spared. However, in October however we had the worst storm since 1987; followed by the worst storm surge in 60 years; followed by persistent flooding in Scotland and Southern England over December along with a second storm surge that destroyed Aberyswyth's sea front and caused extensive damage elsewhere.

Since then parts of the country have had continual flooding to the extent that by early January David Cameron was admitting this could be due to climate change; which was backed up by the MET office which called for attribution studies to prove it.

But then at the end of January it was suddenly all put down to not dredging rivers. If that's true, then failing to dredge the River Severn has lead to Jet Stream blocking patterns and our wettest January on record.

So, I decided to take a look at MET office rainfall anomaly images for both 2012 and the end of 2013. I'm picking selected months. Let's see them:


April 2012 vs 1961-1990	April 2012 vs 1981-2010

June 2012 vs 1961-1990	June 2012 vs 1981-2010

July 2012 vs 1961-1990	July 2012 vs 1981-2010

August 2012 vs 1961-1990	August 2012 vs 1981-2010

October 2012 vs 1961-1990	October 2012 vs 1981-2010

November 2012 vs 1961-1990	November 2012 vs 1981-2010

December 2012 vs 1961-1990	December 2012 vs 1981-2010

The above images are for 2012 and tell us some interesting things. Firstly, the three months April, June and July were exceptionally wet. You can see how blue the country is. Secondly, the comparison with 1961-1990 is almost always bluer than 1981-2010. This gives us an indication that the UK was wetter over these months in 1981-2010 compared with 1961-1990. That's because the corresponding months in 2012 are less wet when compared against the more recent range. Now let's look at the flooding in 2013:


October 2013 vs 1961-1990	October 2013 vs 1981-2010

November 2013 vs 1961-1990	November 2013 vs 1981-2010

December 2013 vs 1961-1990	December 2013 vs 1981-2010

Again, we see the same sorts of patterns. We can see how extremely wet October 2013 has been (compared with October 2012). We can also see how the rainfall pattern has been so much more damaging in December 2013 compared with 2012 even though December 2012 looks generally bluer. Finally, also note that November has been getting wetter according to the graph, since November 2013 is relatively dryer compared against the 1981-2010 range vs the 1961-1990, i.e. 1981-2010 was a wetter period.

Conclusion.

These images could tell us a couple of important aspects about climate change in the UK:

It's generally getting wetter for certain months in the year since the range 1981-2010 is wetter than 1961-1990.
We've been seeing some pretty bad weather: all those blue regions tell us it really has been getting worse.
Flooding can't just be due to a lack of dredging in the river Severn, because we're looking at pictures of rainfall, not flooding and these images easily explain why it's been so bad.

Edit

At the time of publication it wasn't possible to report the images for January 2014 as they hadn't been published by the MET. It is possible now. You can see the same trends are in effect: the anomaly for January 2014 is astonishing in both cases, but less so compared with the average rainfall over 1981 to 2010 (which implies that that period was a bit wetter than 1961 to 1990). In early March it should be possible to add the graphs for February rainfall (which won't be as extreme).


January 2014 vs 1961-1990	January 2014 vs 1981-2010

Edit 2

And a day later the March 2014 data became available. Again, the same trends are evident. Firstly, Rainfall for the month is extreme - in fact more extreme than January and more extreme than I anticipated just yesterday as it covers Northern Ireland and Great Britain with the exception of the east coast and the North West of England. Secondly, rainfall is less extreme relative to the 1981 to 2010 period which means that that period is wetter. Truly astounding.


February 2014 vs 1961-1990	February 2014 vs 1981-2010

Monday, 30 December 2013

Slowtake QuickTake!

A Digital Preservation Story.

About 7 years ago a Manchester friend, Sam Rees gave me an Apple QuickTake 150; one of the earliest color digital cameras from around 1995, but he didn't have the right drivers so I've never known if it works or if it's just junk. A few months ago I tracked down the drivers on the Macintosh Garden website so yay, in theory I could test it!

But obtaining the drivers is only a small part of the problem. The QuickTake only works with Macs before 1998, and even if you have one you have to find a compatible media to transfer the downloaded drivers in the right data format. All this is challenging. The download itself comes as a .sit (Stuffit) file, which Modern Macs don't support as standard. When you decompress them you find that inside, the actual software and drivers are disk image files, but not in a disk image format that is understood by the older Mac I have (a Mac running Mac OS 9 could work, but my LCII only runs up to Mac OS 7.5.3).

In the end I used a 2002 iMac to decompress the .sit, because at least that was possible. The plan was to connect a USB Zip 250 drive to the iMac, copy the images to a Zip 100 disk, then use a SCSI Zip 100 drive on the LCII to load in the drivers.

However, I couldn't convert the floppy disk images to Disk Copy 4.2 format for my LCII, so I took a chance that simply saving the files in each floppy disk image as a set of folders might work.

Even getting an old circa 1993 Macintosh to work is a challenge. I'm fortunate in that I have an old, but still working SCSI Hard Disk. But, I still needed a special Mac to VGA monitor adapter to see its output (which I connected to a small LCD TV) and still had to spend some time hunting down the right kind of SCSI cable (D-type to D-type rather than D-type to Centronics) to hook up the Zip 100 drive.

After all this & the 30minutes it took to install all the QuickTake software (yes, just putting all the files in folders worked!) I was finally able to test it (no manuals, had to guess) and with a bit of fiddling was able to load wonderful fixed-focus VGA images from the camera in mere seconds (each image approx 60Kb). Opening and decompressing them took about 90s each on my slow LCII though!

Here's a picture of my family and our cats taken with the QuickTake 150 December 28, 2013. I used the 10s timer mode to take the photo, with the camera balanced on a book on an armchair - so apologies for the framing :-)

As you can see, the clarity of the image is actually pretty good. The native image required roughly 64Kb, which given an original 24-bit image means the QuickTake camera must have compressed images by about 14x.

When viewed on the LCII, the images appeared rather speckled due to the PhotoFlash software implementing a fairly crude dithering algorithm (simulated here using GIMP).

Thus ends a 7 year quest to test an Apple QuickTake 150 digital camera, thanks Sam!

Tuesday, 10 December 2013

Z80 Dhrystones

In the early 80s, my schoolmate David Allery's dad's workplace had a pdp-11/34, a minicomputer designed in the 1970s. All the reports at the time implied that a pdp-11 anything had absolutely awesome performance compared with the humble 8-bit computers of our day.

Yet decades later, when you look at the actual performance of a pdp-11/34, it seems pretty bad in theory. You can download the pdp-11 handbook from 1979 which covers it.

First, a brief introduction to computer processors, the CPU, which executes the commands that make up programs. I'll assume you understand something of early 80s BASIC. CPUs execute code by reading in a series of numbers from memory, each of which it looks up and translates into switching operations which perform relatively simple instructions. These instructions are at the level of regn=PEEK(addr), POKE(addr,regn), GOTO/GOSUB addr, RETURN; regn = regn+/-/*/divide/and/or/xor/shift regm; compare regn,regm/number. And not much else.

The pdp-11 was a family of uniform 16-bit computers with 8x16-bit registers, 16-bit instructions and a 16-bit (64Kb) address space (though the 11/34 had built-in bank switching to extend it to 18-bits). The "/number" refers to the particular model.

On the pdp-11/34, an add rn,rm took 2µs; add rn,literalNumber took 3.33µs and an add rn,PEEK(addr) took 5.8µs. Branches took 2.2µs and Subroutines+Return took 3.3µs+3.3µs.
That's not too different to the Z80 in a ZX Spectrum, which can perform a (16-bit) add in 3µs; load literal then add in 6µs, load address then add in 7.7µs; Branch in 3.4µs and subroutine/return in 4.3µs+2.8µs.

So, let's check this.

A 'classic' and simple benchmarking test is the Dhrystone test, a simple synthetic benchmark written in 'C'. A VAX 11/780 was defined as having 1 dhrystone MIP and other computers are calculated according to that.

If you do a search, you'll find the pdp-11/34 managed 0.25 dhrystone MIPs. To compare with a ZX Spectrum I used a modern Z80 'C' compiler: SDCC; compiled a modern version of dhrystone (changed only to comply with modern 'C' syntax) and then ran it on a Z80 emulator. I had to modify the function declarations a little to get it to compile as an ANSI 'C' program, but once it did I was able to ascertain that it could run 1000 dhrystones in 13 959 168 TStates.

The result was that if the emulator was running at 3.5MHz, it would execute 0.142dhrystone MIPs, or about 57% of the speed of a pdp-11/34. Of course perhaps a more modern pdp-11 compiler would generate a better result for the pdp-11, but at least these results largely correlate with my sense that the /34 isn't that much faster :-) !

Compiling SDCC Dhrystone

SDCC supports a default 64Kb RAM Z80 target, basically a Z80 attached to some RAM. I could compile Dhrystone 2.0 with this command line:

/Developer/sdcc/bin/sdcc -mz80 --opt-code-speed -DNOSTRUCTASSIGN -DREG=register dhry.c -o dhry.hex

The object file is in an Intel Hex format, so I had to convert it to a binary format first (using an AVR tool):

avr-objcopy -I ihex dhry.hex -O binary

SDCC also provides a z80 ucSim simulator, but unfortunately it's not cycle-accurate (every instruction executes in one 'cycle'). So, I wrote a simulated environment for libz80, which turned out to be quite easy. I used the following command line to run the test:

./rawz80 dhry.bin 0x47d

The command line simply provides the binary file and a breakpoint address. The total number of TStates is listed at the end.

The entire source code is available from the Libby8 Google site (where you can also find out about the FIGnition DIY 8-bit computer).

So Why Did People Feel The Pdp-11 Was So Fast Then?

By rights the pdp-11 shouldn't have been fast at all.

The pdp-11 was typical for the minicomputer generation: the CPU (and everything else) was built from chains of simple, standard TTL logic chips, which weren't very fast.
It was designed for magnetic core memory and that was slow, with a complete read-write cycle taking around 1µs.
It was part of DECs trend towards more sophisticated processors, which took a lot of logic and slowed it down further.

But (3) is also what made it a winner. It's sophistication meant that it was a joy to program and develop high-quality programming tools for. That's probably a good reason for why both the language 'C' and the Unix OS started out on a pdp-11.

By contrast, although early 8-bit microprocessors were built from custom logic and faster semiconductor memory, the sophistication of the CPUs were limited by the fabrication technology of the day. So, a Z80 had only 7000 transistors and an architecture geared for assembler programming rather than compiled languages.

And there's one other reason. The pdp-11 supported a fairly fast floating-point processor and could execute, for example, a floating point multiply in typically 5.5µs, something a Z80 certainly can't compete with.