Tuesday 18 March 2014

Caller Convention Calamities

Hello AVR people, let's talk about interrupts and the mess calling conventions have made of them!

Background

Back in the early 1990s I had my first long-term job working at a place called Micro-Control Systems developing early solid-state media drivers. These were long ISA cards for PCs stuffed with either battery-backed Static RAM, EPROM or Intel Flash chips that gave you a gargantuan 3Mb per card up to 12Mb of storage with 4 cards.

These cards were bootable (they emulated hard disks) and the firmware was written entirely in 16-bit 8086 assembler with a pure caller-save convention. The thinking behind caller-save conventions is that a subroutine doesn't save registers on entry; instead, the caller of a subroutine saves any registers it's using that are also being used by the callee before doing the call and then restoring them as necessary later. Let's assume for example, we have ProcTop which calls ProcMid a few times which calls ProcLeaf a few times, which doesn't call anything. Caller-save conventions aim to improve performance because leaf procedures, like ProcLeaf here don't need to save registers.

However, I found that caller-saving lead to a large number of hard to trace bugs. This happens because every time you change ProcLeaf you have the potential to use new registers and this can have an effect on the registers ProcMid needs to save or potentially the registers ProcTop needs to save. But also, if you change ProcMid and use new registers you might find you need to save them whenever you call ProcLeaf (if ProcLeaf uses them) as well as having to check ProcTop for conflicts.

This means you need to check an entire call tree whenever you change a subroutine and if you need to save additional registers in ProcMid or ProcTop you might end up restructuring that code etc (which means more testing).

Nasty, nasty, nasty and all because a caller-save convention is used. In the assembler code I wrote (and still write), I use a pure callee-register saving convention. Ironically, caller-saving doesn't even save much performance because pushing and popping registers at the beginning and the end usually occupies only a small fraction of the time spent within a routine.

AVR Interrupts

GCC 'C' calling conventions use a mixture of caller-saving and callee-saving conventions. Most registers below Reg 18 are caller-saved; most of the rest are callee-saved. This, I think, is seen to be a compromise between performance and code-density. I personally wouldn't use caller saving at all, even in a compiler, but for interrupts it's an absolute disaster for the AVR.

That's because every time you need to use a subroutine within an interrupt the interrupt routine itself must then save absolutely every caller-saving register, just in case something used by any of the interrupt's call tree uses them; because of course when dealing with interrupts the compiler can't make assumptions about what registers are safe to use. As a result interrupt latency on an AVR shifts from being excellent (potentially as little as around 12 clock cycles, under 1µs at 20MHz, to 3 times as long, 36 clock cycles, around 1.8µs at 20MHz).

This kind of nonsense isn't just reserved for AVR cpus, the rather neat Cortex M0/M3 etc architectures save, as standard, 8x32-bit registers on entry to every subroutine for the same reason to make it easy for compilers to target Cortex M0 for real-time applications.

What I really want when I write interrupt routines, is to have some control over performance degradation. I want additional registers to be saved only when they need to be as only as much as is actually needed. In short, I want callee-saving and avr-gcc (amongst its zillions of options) doesn't provide that.

For the up-and-coming FIGnition Firmware 1.0.0 I decided to create a tool which would do just that. You use it by first getting GCC to generate assembler code using a compile command such as:

avr-gcc -Wall -Os -DF_CPU=20000000 -mmcu=atmega168 -fno-inline -Iinc/ -x assembler-with-cpp -S InterruptSubroutineCode.c -o InterruptSubroutineCode.s

The interrupt code should be structured so that the top-level interrupt subroutine ( called IntSubProc below) is listed last and all the entire call-tree required by IntSubProc is contained within that file. Then you apply the command line tool:

$./IntWrap IntSubProc InterruptSubroutineCode.s src/InterruptSubroutineCodeModified.s

Where IntSubProc is the name of the interrupt subroutine that's called by your primary interrupt routine. The interrupt routine itself has an assembler call somewhere in it, e.g

asm volatile("call IntSubProc");

That way, GCC won't undermine your efforts by saving and restoring all the caller-saved registers.

IntWrap analyses the assembler code in InterruptSubroutineCode.s and works out which caller-saved registers actually need to be saved according to the call-tree in the code in InterruptSubroutineCode.s. The analysis stops after the ret command for IntSubProc.

The current version of IntWrap is written using only the standard C library and is currently, I would say, Alpha quality. It works for FIGnition, the DIY 8-bit computer from nichemachines :-)



Download from Here.

How Does It Work?

IntWrap trawls through the assembler code looking for subroutines and determining which registers have been modified by the subroutine. The registers that need to be saved by IntSubProc are all the caller-saved registers that have been modified by IntSubProc's call tree, but haven't been saved. To make it work properly, IntWrap must eliminate registers that were saved mid-way down the call-tree. Consider: IntSubProc saves/restores r20 and calls ProcB which saves/restores r18, but modifies r19 and ProcB calls ProcC which modifies r18, r20 and r21. IntWrap should save/restore r19 because ProcB modifies it and should save/restore r21 because ProcC modifies it. But it doesn't need to save r18, because even though ProcC modified it, ProcB save/restored it.

The algorithm works by using bit masks for the registers. For every procedure, it marks which registers have been save/restored and which have been modified and the subroutine's modified registers are modifiedRegs&~saveRestoredRegs . Call instructions can be treated the same way as normal assembler instructions.

IntWrap avoids having to construct a proper call-tree graph by re-analyzing the code if it finds that it can't fully evaluate a call to a subroutine. In this way the modified register bitmasks bubble up through the call-tree with repeated analysis until it's all been solved.