In this section we look at some other popular processors: the MSP430 (another RISC-but-in-many-ways-not MCU); the venerable 8051 and the current king of MCUs, the ARM (Cortex M0).
The MSP430 is like a stripped-down pdp-11 with 16x16-bit registers and single-cycle instructions. But it's not great for computing 16-bit CRCs. This routine weighs in at 58 bytes and 28 cycles per byte, making it the same length and relatively the same performance as an H8. Only the 6502 is longer! The Msp430 has some features going for it, like it's ability to swap bytes and shift whole words in a single cycle, but the reason it's not as fast as a PIC or AVR is primarily because the MSP430 has no nybble operations and because sometimes its byte operations (which modify the whole word of the destination register, by zero-extending the result) get in the way. The MSP430's straight-forward <<5 is faster than the rotate right 3 times technique used on some 8-bit CPUs, because it doesn't need to save an intermediate and apply mask operations. But the neatest trick here is the >>4 . By zero-extending r4.b when moving into r7, we clear the upper byte, so '0' bits get shifted into bits 4..7 of r7. Then by zero-extending r7 again, we clear r7's original bits 3..0; and so we get a byte shift, in 1 cycle less than applying a mask.
swpb r5 ;crc=(crc>
mov.b @r4+,r7 ;r7=new byte.
xor r7,r5 ;r5=crc^=newByte.
mov.b r5,r7 ;r7=crc and 255;
mov.b r7,r7 ; clears bits 7 to 15 giving the >>4
mov.b r5,r7 ;Sign extend again.
swpb r7 ;<<8
add r7,r7 ;<<12
mov r5,r7 ;crc and 255 again.
add r7,r7 ;<<5
The 8051 version takes 31 bytes and 27 instruction cycles per byte - it's the shortest implementation so far (the PIC version uses at least 12-bit opcodes). On an original 12MHz 8051, this would have meant 27µs per byte, about 76% faster than a pdp-11.
anl a,240 ;<<12
rl a ;<<5
mov r0,a ;save
xrl a, r1
The Arm Thumb version. Using just the original Thumb instruction set, it can perform a CRC16 in just 18 cycles (46b), making it about 5.1x more efficient than a 6502 and nearly 7.4x more efficient than a 68000. This is mostly because the ARM thumb can perform multi-bit shifts in a single cycle. The processor is only let down when swapping the Crc16 high and low bytes (which is done just like in 'C', crc16=(crc16 >> 8) | (crc16 <<8)&0xffff. Later versions of Thumb provided zero-extend and byte swap instructions.
c65535: .word 65536
Finally, an AVR version. The webpage I based all of these on uses CRC code that operates on a byte at a time, so you wouldn't think there's any point in publishing one here, but my DIY FIGnition computer uses a more efficient one in its audio loading firmware:
;z^Data, r25:r24=Crc16, r26=Len.In the above version (designed for a Mega AVR), we can save a cycle from the mov, swap, andi sequence by making use of the mul instruction. We can save 5 cycles in the <<5 code. This gives us 19cycles per byte, just 1 cycle longer than an ARM thumb (and the same length). Yet it's 7 cycles shorter than the other AVR version (53% faster!).
push r31 ;used to hold buff.
mov Crc16Hi, temp ; Swap hi and lo.
eor Crc16Lo, temp ; crc^=ser_data;
mul r16,Crc16Lo ; Faster than executing lsr 4 times.
eor Crc16Lo, r1 ; crc^=(crc&0xff)>>4
mul r16,Crc16Lo ;(crc&0xff)<<4
eor Crc16Hi,r0 ;crc^=(crc<<8)<<4
;(r0 wonderfully contains the other 4 bits)
mul r27,Crc16Lo ; (crc&0xff)<<5