Showing posts with label recursion. Show all posts
Showing posts with label recursion. Show all posts

Wednesday, 26 October 2022

ZX81, 1K Hanoi

 In my previous blog-post I described a coding convention for the Intel 8048 and used an embedded implementation of the Towers of Hanoi program as an example. It had a rudimentary user interface; outputting merely the column numbers between each step, which advanced on a button press.

Towers of Hanoi is an interesting toy program, and here we explore a different implementation, targeted at a 1K ZX81.

A standard ZX81 has 8K of built-in ROM + 1K of RAM which is shared between the user program (+variables); the 8K ROM's system variables; the 32x24 screen (which expands from 25 bytes to 793 bytes depending on how much is displayed on it); the 33b printer buffer and the machine stack used by the ROM. All this is described in Chapter 27 of the user manual.

So, in some respects it has less usable memory than an Intel 8048 (because 1K + 64b > 1K-system variables-screen...), but in others it has more (because the ROM is useful library code that would have to be included in the Intel 8048 equivalent). In addition, BASIC is generally less compact than assembler (and ZX81 BASIC is even worse).

The Listing


It doesn't look like there's much here, but a surprising amount thought went into it. You can run a javascript version of a zx81 from here, but many other emulators are available. To prove it will work in 1K, you will need to type POKE 16389,68 [Newline] NEW [Newline] to set the RAMTOP top of BASIC to the end of memory on a 1kB ZX81.

It's not obvious how to type the graphics, but the top line is 2 spaces, then ▗ █ ▙ 6 spaces Graphic Shift 5, 6 spaces Graphic Shift 8. The rest can be deduced fairly easily then.

The application will use 634 bytes. An earlier version used a mere 41 bytes more and this was enough to cause an out of memory error.

Using The Screen

The most significant change to the ZX81 version is to use the screen to provide a graphical representation of the puzzle. Because we still need to save space, it's essential to use the ZX81 block graphics (rather than whole characters) and I chose to move a ring by unplotting a ring's old position while plotting its new position rather than, e.g. animating the movement (which would have been very slow on the ZX81 anyway).

In the first version I used loops and plot commands to generate the puzzle, but it uses less memory to print the puzzle out directly. I could save another 4 bytes by assigning the graphics for the spaces and the poles to R$ and substituting R$ at the end of lines 10 to 30.

I also save a bit of space by calculating the minimum pole spacing. It looks like there isn't enough room between poles, but this isn't correct,  because at maximum we only ever need space for a 7 pixel wide ring and a 6 pixel wide ring. Therefore 7+1+6+1=15 pixels is enough between the poles.

This means the graphics take up: (8+15+15+7)/2=23 chars for row 4, 22 chars for row 3; 21 chars for row 2 and 20 chars for row 1(because the end of the higher rows are only ever filled with shorter rings). That's 86 bytes in total. The Column->Column moves are also displayed and this takes 4 bytes.

Moving The Rings

This is relatively simple: we have a string, R$, which is used as a byte array (to save space) to hold the number of occupied rings on each column. The width of each ring to move is determined by the current level: 1 for level 0 up to 7 for level 6. S and D determine the start and end coordinate. We plot to the ring we move to, while unplotting the ring we move from except for when x=0 in order to leave the pole. At the end we adjust the number of occupied rings, -1 for the source ring and +1 for the destination ring.

Memory-Saving Techniques

This program uses the normal ZX81 BASIC memory saving techniques. '0', '1' and '3' are replaced by NOT PI, SGN PI and INT PI to save 5 bytes each time. Values in the printable character range are replaced by CODE "x", saving 3 or 4 bytes each time; while other values use VAL "nnn" to save 3 bytes each time. This also applies to line numbers, so that placing the Hanoi function itself below line 1000 saves several bytes.

Using R$ as a byte array involves a number of clumsy VAL, CODE and CHR$ conversions, but replacing R$ with a DIM R(D) array would end up costing another 9 bytes, so it's a net saving to use R$.

Hanoi Algorithm Change

It turns out we can make the Hanoi algorithm itself less recursive than in the Intel 8048 version. In that version we pushed the source and destination columns on each call, but in fact that was done to demonstrate how to manage the data structure.

It's not necessary to do that. The original source and destination values can be restored after each recursive call, because 6-S-D is a reversible operation. Similarly, because L is decremented at the beginning of each call (instead of passing L-1 as a parameter to each call), then by incrementing it at the end of the function, it too, doesn't need to be saved.

Conclusion

The constraints and capabilities of running Hanoi on a different platform and language present challenges and opportunities which this ZX81 implementation amply demonstrates, not in the least by the 4:30 minutes of patience needed to fully run it for 7 rings (vs <1s for the Intel 8048 version). Finally, this implementation circumvents the ZX81 BASIC's lack of support for stack data structures to reduce the amount of recursive memory needed, which begs the question: how much recursion is really needed to implement the algorithm?

Saturday, 22 October 2022

Intel 8048 Coding Guide

This is a short guide on programming the largely obsolete Intel 8048 series of Microcontrollers from the viewpoint of a conventional coding paradigm. It describes how to essentially hand compile programs notionally written in a high-level language into assembler using a consistent methodology.

We shall chose a relatively simple toy program: a towers of Hanoi application. It uses 4 LEDs for output at each step (the source column and the destination column); a single button; some simple expressions; recursion (only up to 6 levels) and a simple interrupt routine to read the button.

Register Usage

The datasheets for the 8048 series shows it is an accumulator architecture, with direct access to 8 registers [r0 to r7] with which the accumulator can perform arithmetic / logic operations and indirect access to the rest of RAM via r0 and r1, with which it can also perform a similar set of ALU operations.

So, the convention here is to use r2 to r7 for parameters, locals and temporaries; while r0 and r1 point to globals. Thus, accessing a RAM location (e.g. x) takes 2 instructions:

    mov r0,#x
    mov a,@r0

Because r0 and r1 can both be used as pointers, we can cache up to 2 globals at any one time, e.g. b^=c =>
    mov r0,#b
    mov a,@r0
    mov r1,#c
    xrl a,@r1
    mov @r0,a ;7b
Becomes a straight-forward translation. If b and c had been in r2 and r3 it would have become just
    mov a,r2
    xrl a,r3
    mov r2,a ;3 bytes instead of 7b.

The 8048 has a second bank of registers, these will only be used within interrupts. That way, we never have to save context and we can be sure that nothing we do with r0' to r7' will affect the main program.

This means we can now write the interrupt-handling code, which debounces a button press. If we assume the 8048 runs at 8MHz, then the timer will overflow every 8MHz/15/32/256 = 65Hz, which is a reasonable period for debounce. The algorithm is fairly simple, on every overflow interrupt we simply shift in the button press value to a holding register and if the bottom 4 bits are 1100, then this means that the button was seen as being debounced and off ('11') then debounced and on ('00') so a button press has occurred and the button press bit is set. The key input routine waits for that bit to be set, then clears it.

    org 7 ;start tmr interrupt at its address.
TmrInt:
    sel rb1
    in A,p1 ;button in bit 0.
    roc A ;now in carry.
    mov A,r2 ;button press history
    rlc A ;now bottom
    mov r2,A
    anl #12 ;
    xrl #12 ;zero => button!
    jnz Tmr10
    clr f0
    cpl f0 ;F0= button result.
Tmr10:
    sel rb0 ;back to main reg set
    retr ;return from int. 16b.

Key: ;wait for key.
    jf0 Key10
    jump Key ;still clear
Key10:
    clr f0 ;ready for next keypress.
    ret ;Return. 6b.

TmrInit:
    mov a,#1
    orl a,p1
    out p1,a
    sel rb1
    mov r2,#0 ;button had been 'pressed'
    sel rb0
    en tcnti
    strt cnt ;65Hz.
    ret ;11b.

;16+6+11 = 33b.

Data Structures And Stack Frames

The Tower of Hanoi recursive program is fairly simple. If we want to move all the rings from column a to column c; we first move all the rings above from column a to column b; then move a ring from column a to column c, then move all the rings above from column b to column c.

Normally, because a 8048 program is poor at handling indexed data structures, it will be best to map stack frames to static stack frames. However, for this application We'll use r2 to r7 as locals and r0 as a stack pointer. The stack pointer will push down from the top of RAM and we'll assume it's an 8048 with 64b of RAM, so r0 starts at 64. The function is equivalent to the following 'C' function:

void Hanoi(uint8_t aLevel, uint8_t aFrom, uint8_t aToo)
{
    if(aLevel>0) {
        Hanoi(aLevel-1, aFrom, aToo^aFrom);
    }
    Out(p1,(((aFrom<<2)|aToo)<<1)|1);
    Key();
    if(aLevel>0) {
        Hanoi(aLevel-1, aFrom^aToo, aToo);
    }
}

I tend to choose caller-saved conventions, so the first thing Hanoi will do is save from and too; and restore them at the end. We'll assume a 6 level Hanoi, with columns numbered as 0, to 2. We can always compute the via as 3^from^to. e.g. 0 to 2 => 3^0^2 => 1. 0 =>1 is 3^1^0 => 2. 1 to 2 => 3^1^2 => 0.

Hanoi ;r2=aLevel, r3=aFrom, r4=aToo.
    mov a,r3
    dec r0
    mov @r0,a ;push aFrom
    mov a,r4
    dec r0
    mov @r0,a ;push aToo.
    dec r2 ;if the result is 0
    mov a,r2 ;
    JZ Hanoi10
    mov a, #3
    xrl a,r3
    xrl a,r4
    mov r4,a ;from source to via.
    call Hanoi ;recurse.
Hanoi10:
    mov a,r3
    rl a
    rl a
    orl a,r4
    clr c
    cpl c ;set c.
    rlc a ;because bit 0=button input.
    out p1,a ;output the move
    call Key ;wait for button
    mov a,r2 ;level==0?
    jz Hanoi20
Hanoi20:
    mov a,#3
    xrl a,r3
    xrl a,r4
    mov r3,a ;move via to dest.
    call Hanoi
    inc r2 ;restore level above.
    mov a,@r0 ;restore from and too at the end.
    inc r0
    mov r4,a
    mov a,@r0
    inc r0
    mov r3,a
    ret ;45b

Thus it can be seen that the implementation is very simple. Initialisation involves setting up the timer interrupt and the initial Hanoi call:

    org 0;reset
    jmp Main
Main:
    call TmrInit
    mov r0,#64 ;sp
    mov r2,#6 ;6 levels
    mov r3,#0
    mov r4,#2
    call Hanoi
Main10:
    jmp Main10;14b

Thus we now have a complete implementation with user interaction, display, interrupts, expressions, data structures, locals, and recursion. It takes only 7+33+45+14 bytes = 103b in total. Note, at the time of writing, the Hanoi application hasn't been tested.

Static Stack Frame Algorithm.

The Static stack frame algorithm needs to be computed by hand. First we work out the call tree for the application. Secondly, for each function we allocate a call level to it based on the deepest call to that function. Thirdly, for each call level we allocate the number of bytes = the maximum number of stack bytes needed over the set of functions at that call level. Fourthly, we set the start address for the higher call level to the start of spare RAM and each lower call level to the start address of the previous call level + the stack bytes calculated in step 3.

Conclusion

Although the instruction set for the 8048 family is fairly comprehensive for an early 8-bit MCU and its multiple source interrupt feature makes it far better than the contemporary PIC 1655, the limited and clumsy access to memory outside of a local set of 8 registers makes coding challenging. Many of these problems were fixed by its successor, the Intel 8051.

Nevertheless, some fairly simple coding techniques provide for a fairly straight-forward and efficient coding convention.