Thursday, 28 March 2024

Mouse Detechification! A PS/2 To Archimedes Quadrature Mouse conversion!

After an embarrassing amount of work I managed to convert a PS/2 mouse into a quadrature encoding mouse for my Acorn Archimedes A3020.


A Mini-Mouse History

Doug Englebart designed the first mouse, for his (at the time) very futuristic NLS demonstration in 1968 (it was built by a colleague, Bill English). It had one button, was brown (like real mice :-) ) and used two wheels, which must have made it scrape quite badly when moved diagonally:


Xerox picked up the concept of a Mouse for their pioneering Alto workstation project. They shrunk the wheels and then added a ball at the bottom which could drive the wheels easily in any direction.


Other early mice from Apple, Microsoft, Smaky, Logitech, AMS, Atari, Sun, ETH, Commodore (Amiga) and Acorn were variations on this kind of design: Quadrature-Encoded mice, using a ball to rotate two, orthogonal wheels and minimal electronics to convert the signals (often just a simple TTL Schmitt-trigger buffer).

Each wheel (one for left/right and the other for up/down) had about 100+ spokes in it and an infra-red LED shone light between the spokes to a pair of sensors (per axis) at the other side. As the ball turned and moved a wheel, a spoke would block the light to the sensors and then let the light through when it passed. However, because the sensors were offset from each other, a spoke would block light to Sensor 1 before blocking light to Sensor 2 and then let light through to sensor 1 before sensor 2. So, the sensors would see this if you moved right (or up):


So, you can tell if you’re moving right (or up), because Sensor 1 gets blocked first each time and left (or down) if Sensor 2 gets blocked first.

The problem is that it takes a lot of wires for a mouse to work this way and requires a surprising amount of computer power to handle all the interrupts, so later mice designs offloaded that to a microcontroller inside the mouse which then communicated with the computer using a simpler, 1 or 2 wire serial interface: these were called PS/2 mice on PCs (Apple had a design called ADB, Apple Desktop Bus). Eventually, USB mice replaced them and optical mice replaced the ball and wheels.

I know I could have bought a USB to quadrature encoded mouse adapter (from here on, QE mouse, because I don't like the term Bus-mouse), but that seemed like a cop-out, so instead I decided to make things as painful as possible. The first stage was to cut out the blobtronics that made it a PS/2 mouse.


Then I added some pull-down / level-shifting resistors to see if the A3020 would recognise it as valid values but it didn't. Then I went through a long process of getting an Arduino to recognise the analog voltages from the phototransistor (yep, I also know I could use a Schmitt-trigger buffer chip) and write a small program to actually process them into quadrature encoded X and Y grey code (Ref: Dir are 00, 01, 11, 10, 00... for forward/up and 00, 10, 11, 01, 00.. for backward/down). It turned out it wasn't very digital!


I figured I could solve the problem in software using a small MCU that would fit in the mouse, so I chose an ATTINY24, a 14-pin AVR MCU with 8 ADC 10-bit channels and 4 other digital IO pins. I used a simple hysteresis algorithm: it first figures out the range of values it can get from the phototransistors; then allows a bit more time to see if the range is bigger; then once it's interpreted any given state, the ADC values have to change by 2/3 of the range to flip into the next state.

I went through quite a bit of a debugging process, because obviously when you go from a fairly lush environment like an Arduino (plenty of libraries, plenty of IO pins, lots of code space and relatively abundant RAM (2kB ;) )) to an ATTINY24 (2kB flash, 128 bytes of RAM, just 12 IO pins) and then stick it in a mouse, the fewer opportunities you have for debugging - and you can't even use an LED inside a trad, ball mouse, because ta-dah, you won't see it! So it's best to debug as much as possible in simulation, before you take the final step. In fact the program only ended up being about 906 bytes, because 'C' compiles pretty well on an 8-bit AVR.

I made a lot of mistakes at pretty much every stage - but amazingly getting the mouse directions inverted wasn't one of them :) . I started with 10K resistors for the phototransistors, but then ended up deducing 15K was best (22K || 47K) so that was 8 resistors! When soldering the analogue connections to the AVR I was out by one pin all the way along, because from the underside, I mistook the decoupling cap pins for the Pin1 and Pin 14 of the AVR. I had to desolder the phototransistor connections to the main cable - which I'd been using when analysing on the Arduino; then solder up the photo transistors to the analog inputs and then the digital outputs to the original wires. I tried to keep the wires no longer than they needed to be because it was cramped in there, but I ended up making the Y axis analog wires just 1mm too short (because, d'uh they get a bit shorter when you strip them and feed them through a PCB) so they had to be redone. Because there were a lot of wires I needed to glue them down as otherwise the casing wouldn't close, and I was particularly concerned about the phototransistor wires getting caught in the Y axis wheel, but then I glued them down directly underneath that wheel so it couldn't clip into place! I also glued the grey and green wires running up the right so that they got in the way of the case closing - so all of these had to be very carefully cut out and moved. Near the end I remembered I had to add a 10K resistor between reset and power so that the MCU would actually come out of reset and execute code! I also had to add an input signal, to software switch the Y axis inputs to the X axis grey code, because the only header I could find to fit the mouse cable plug for testing didn't leave room to test both X and Y grey codes! Hence what looks like an extra button!

Finally I connected it all up, glued the board and wires down (after correcting the locations) and got: A BIG FAT NOTHING! I thought maybe I'd messed up the analogue ranges and retried it with a different range and that didn't work. Then I realised I could output debug serial out of one of the grey code outputs to the Arduino and see what the ATTINY was reading! Bit-banged serial code can be very compact!

void PutCh(uint8_t ch)

{ // (ch<<1) adds a start bit in bit 0 and |0x200 adds a

    uint16_t frame=((uint16_t)ch<<1)|0x200; // stop bit.

    do {

        if(frame&1) {

            PORTB|=kSerTxPin;

        }

        else {

            PORTB&=~kSerTxPin;

        }

        frame>>=1;

        _delay_us(52.083333-11/8); // 19200 baud.

    }while(frame); // & after the stop's been shifted out,

   // the frame is 0 and we're done.

}


Before I tried it, I did a sanity check for power and ground only to find I hadn't actually connected up VCC properly!!!! I'd forgotten the final stage of solder-bridging the decoupling cap's +ve lead to Pin1 of the AVR and I ended up ungluing everything so I could see underneath.

But when I fixed this and packed it all back in the case: It Worked! I tried reading the digital outputs at the end of the cable on the Arduino and when I was satisfied (which only took a few minutes of testing) I decided to hook it up to my A3020 and hey-presto! I have a working QE Mouse!


I had been a bit concerned that doing analog sampling wouldn't be fast enough and so my algorithm has a heuristic whereby if it sees a jump of 2 transitions (00 <=> 11 or 01 <=> 10) it assumes it's a single step in the same direction as before. I could manage about 20K x 4 samples per second, but a little maths shows this will be fine, because a full 640 pixels on the Arc's screen can be sampled OK if you cover it in about 3.2ms and clearly we don't do that.

!Paint, which is like, 96kB I think! It's terribly unintuitive though! Note on the video that a 12MHz ARM250 can drag whole, 4-bpp x 640x480 windows around rather than just outlines! Note also the Taskbar, which was standard on the Archimedes before Windows 95, or Mac OS X (NeXT step had a floating Dock).

Here, again is the link to the project and source code.



Sunday, 24 March 2024

Colour Me Stupid - An Early Archimedes 8-bit Colour Quest!

I'm assuming, naïvely again, that I can write a short blog post, but on past performance this isn't likely. I recently managed to get my Acorn Archimedes A3020 working again by converting a PS/2 mouse to a Quadrature mouse as early Archimedes' expect (see this blog post) and this has given me more of an interest in the system, particularly from a hardware viewpoint.

Wonderfully, I'm currently writing this on my MacBook Air M2, a 32-year later descendent of the ARM250 that powers that original A3020, so things have come full circle in a sense.

These early Arcs had a video chip called VIDC which supports 1, 2, 4 and 8-bit colour video modes at different resolutions, but for decades (probably even since the first review I saw in Personal Computer World August 1987), I was confused as to why the 8-bit colour mode was described as merely having a 64 colour palette with 4 tints instead of proper 8-bit RGB colours.


Why create something complex, which doesn't even do the job? What kind of terminology is 'tints'? How do you really use them?

The confusion deepened when I started to look into the VIDC hardware, because it never supported a 64-colour palette and some tints, instead it always supported a 16 x 12-bit colour palette for all modes, including the 8-bit colour mode. So, how did that work?

Standard VIDC Palette Entry

Sup Blue Green Red
S B3 B2 B1 B0 G3 G2 G1 G0 R3 R2 R1 R0

The standard VIDC palette entry contains 4-bits for each component, Blue, Green and Red, oddly in that order than the conventional Red, Green, Blue order. In addition, it has a sort-of single alpha bit which can be used for GenLocking. There are just 16 palette entries, so any one of them can be selected in 4-bits per pixel modes, but fewer of them are used in 1-bit and 2-bits per pixel video modes.

In 8-bit colour mode each 8-bit pixel is composed of a 4-bit palette entry and 4-bits which replace the four palette entry bits in bold above.

Direct Palette
B3 G3 G2 R3 Palette 0..15

Although I've heard several claims that these ARM computers couldn't do proper 8-bit colour RGB, with 3 bits for Red, 3 bits for Green and 2 bits for Blue (human vision is less sensitive to blue). In fact we can immediately see that by defining the 16 palette entries so that they literally provide the other bits, we will get the equivalent of RGB332 (really BGR233). This gives us:

DirectPalette
B3 G3 G2 R3B2 G1 R2 R1

Now we have 3 bits for Green and Red, and 2 bits for Blue. This means that in theory we have a proper 8-bit RGB mode; where we can freely select any one of the full range of 256 colours such a mode could describe. Note, we don't have a palette of 64 colours + 4 tints, we have a palette of 16 colours + 16 tints each and the palette can be assigned to provide the missing RGB bits.

How To Mess Up Palette Settings

A simplistic implementation of this would be to set all the remaining bits of the palette entries to 0, i.e. B1, B0, G0 and R0. This gives us the following palette:

B2G1\R2R1 R2R1=0 R2R1=2 R2R1=4 R2R1=6
B2G1=00 0x000 0x002 0x004 0x006
B2G1=01 0x020 0x022 0x024 0x026
B2G1=10 0x400 0x402 0x404 0x406
B2G1=11 0x420 0x422 0x424 0x426

To re-emphasise, by itself this would give a very dull palette, because B3, G3, G2 and R3 are never set, but as described earlier, these bits are provided directly by the upper 4 bits of each 8-bit pixel. Consider three pixels: 0x03, 0x63 and 0x83. They all use the palette entry 3, which provides a medium-level red 0x006, but the second pixel would add a green component of 0xc making the colour: 0x0c6 and the third would add a blue component of 0x8 making the colour 0x806 (purple-ish). Combining the palette entries and modifiers then gives this 256 colour range:


Here, the colour range has been generated by a BBC Basic program running on an Arculator, an excellent Archimedes Javascript emulator. It looks pretty decent. It's shown in two formats, a literal RrrGGgBb view is on the left where each column represents the bottom 2 bits for green and both bits for blue while subsequent rows increment the top bit for green and the red component. However, it's easier to block out by splitting the full range into 4 quadrants of RrrGGg where each quadrant increments Bb. I also tried the same program on my real A3020, but this time in mode 28 as my monitor couldn't really cope with the non-VGA mode, mode 13 and got this:


I made a programming typo with the linear representation, but the quadrant version looks better than the emulator! This shows that the palettes really can generate a reasonable RGB332 colour range.

There is a minor problem in that the colours don't span the full colour range. Red and Green can only go to 87.5% of the maximum (0xe) and blue can only go to 75% of the maximum (0xc). The conventional way to address this is to replicate bits in the component, so the 8 proper levels for red and green would be: 0b0000, 0b0010, 0b0100, 0b0110, 0b1001, 0b1011, 0b1101, 0b1111. And for blue it'd be: 0b0000, 0b0101, 0b1010, 0b1111. And if the Arc had a full 256 colour palette, like every colour Mac from the Mac II onwards had, that's exactly what's done. Unfortunately, if you try to approximate this, you get a worse arrangement:


There are two obvious problem: on the right-hand spectrum, I've outlined how the Red=3 and Red=4 values look almost the same (as as do the Blue=1 and Blue=2, outlined on the left). This is because the difference is only 1 in each case: Red=3 translates as 0x7 (from the palette), and Red=4 translates as 0x8 (from R3); while Blue=1 translates as 0x7 (from the palette) and Blue=2 translates as 0x8 (from B3).

It turns out, then that the naïve palette assignment is the most evenly distributed one. And this brings us to the next observation:

A Hint On Tints: Exact 8-bit RGB Is Impossible

On a real 8-bit RGB palette, the blue values 00 to 11 scale into the full range of blue: 0x00 to 0xff, matching the same range as green and red where 000 to 111 scale to 0x00 to 0xff. However, the Archimedes palette (as alluded to earlier) scales unevenly: blue scales to 0xc0 while red and green scale to 0xe0. Also since neither scale to 0xff, the colours will be more dull.

Instead what we get is an effective 64-entry palette where we only consider the top two bits of each component: BbGGxRry + 4 green/red tints for each one: xy = 00, 01, 10, 11. And this explains why the Archimedes manual always describes 8-bit colours in those terms, but the choice of their default tints is different.

One of the other major problems with this palette is that you can only have 4 proper BGR233 greys: 0x00, 0x52, 0xa4 and 0xf6. The slightly brighter colours: 0xf7, 0xfe and 0xff are off-white, pink, green and yellow tints. Ironically, proper RGB332 can only manage two greys! Consider RGB332 represented by 256 x 12-bit BGR palette entries or 256 x 24-bit BGR palette entries. Black is RGB332=0x00 which maps to 0x000 in the 12-bit palette and {0x00, 0x00, 0x00}. The next closest is Red=Green=2, Blue=1. This is 0x544 in the 12-bit palette and {0x55, 0x49, 0x49} in the 24-bit palette - both slightly blue. Then Red=Green=4, Blue=2, which is 0xa99 in the 12-bit palette and {0xaa, 0x92, 0x92} in the 24-bit palette, again, both slightly blue; then Red=Green=7, Blue=3 which is 0xfff in the 12-bit palette and {0xff, 0xff, 0xff} in the 24-bit palette.

In both of those cases, even the 24-bit palette is 16% out from a grey which is distinguishable to the human eye.

The Alternative Palette Approach

The conventional VIDC 8-bit palette with 64 base colours and 4 tints I now understand will look something more like this:

Direct Palette
B3 G3 G2 R3 B2 R2 T1 T0

Where T1 and T0 represents the tints of white, which get added to all of B1B0, G1G0 and R1R0. This kind of palette would contain these entries:

B2R2\Tint 0 1 2 3
B2R2=00 0x000 0x111 0x222 0x333
B2R2=01 0x004 0x115 0x226 0x337
B2R2=10 0x400 0x511 0x622 0x733
B2R2=11 0x404 0x515 0x626 0x737

This time we essentially have 4 dimensions to consider: Blue<2>, Green<2>, Red<2> and Tint<2>, I thought a recursive arrangement would be clearest, but it turns out that a semi-linear arrangement is:


Here, 4 horizontally adjacent blocks are the tints and then each horizontal block of 4 is the red component; while 4 vertically adjacent blocks are the blue component and each vertical block of 4 is green.

Trying to process images in this convention is challenging, because (as we'll see in the next section) it's hard to calculate how to dither each component, because they aren't truly independent and can't be, because there are really only 3 primaries (Red, Green, Blue), but four components. For example, it's easy to see that the top left and bottom right tint groups are actual greys, but harder to see that there's two other grey blocks (which I've outlined). This means we can't independently adjust tints against RGB.

Nevertheless, this convention has a several advantages over RGB332:
  • The RGB primaries are all evenly distributed, they get 2-bits each.
  • There are 16 levels of grey, which means that anti-aliased black text on a white background (or its inverse) can be done pretty well.
  • There's a brighter range because it goes up to 0xfff.
  • Human vision is more attuned to brightness than colour, which is why the HSV colour model is effective, so it's possible that this convention can represent images that perceive better to us.

Dithering

Even though we only have 256, imperfect base colours in our quasi-BGR332 palette, we can employ a standard technique to mask this, called dithering. I wanted to generate the inner surface of an RGB cube (where black is the bottom, left, front corner and white is the top, right, back corner) to show how we can generate fairly smooth colour transitions, by applying a fairly standard Floyd-Steinberg dither.

There's a really good blog post on dithering here, which covers far more forms of dithering that the ordered and FS dithering I knew about. FS dithering works by scanning the original image in raster order and then computing the error from the closest colour we can generate and then propagating the error to the immediate pixels on the right and below (or right and above).

* 7/16
3/16 5/16 1/16

In fact we can compute all of this by just maintaining a full-colour error array for a single row + the single pixel to the right.

3D Projection

So, dithering is fairly simple, but the 3D projection was actually fairly complex, because I couldn't just draw an image in normal 3D space, I had to scan the image; translating row and column pixels into (x,y,z) coordinates for the cube, culling pixels outside the cube and then calculating the furthest pixel at each of these points. Then x corresponds to red, y corresponds to green and z corresponds to blue. This involved quite a number of mistakes! To make the 3D calculations simple I generated an orthogonal projection where z=column*4/5+row*3/5; which is essentially a 3:4:5 triangle and avoids having to compute floating point maths nor square roots. The hacky calculations work as follows:

First we want to transform from (column, row) space (the screen coordinate) to (x,y,z), where y is down/up and z is depth. (0,0) is easy, it's (0,0,0) and any coordinate along the c axis is easy it's (c,0) => (c,0,0). As we go up rows, the beginning of the cube starts slightly further to the right and because of the projection, we know that (c, r)=(4,3) is also easy, it's (0,0,5). Similarly, any ratio where r=3c/4 is also (0, 0, 5c/4). When we're to the left of that axis, we're part of the left plane, so we need to draw that, and then the column calculation is the smallest, because e.g. (0, r) => (0, 0, 5r/3) > (0,0,5c/4) since 5c/4 is 0, but when we're to the right of that axis, the row calculation is the smallest. The back face is determined by the maximum depth of 255, so we simply limit the depth to that to generate it. (x,y,z) then map directly to (r, g, b).

In the end, I generated this Colour cube on the emulator:


It looks fairly smooth, but we can see some banding. Real-life images aren't smooth, so they don't tend to exhibit the same kind of artefacts.

Conclusion

Early, colour microcomputers had many compromises because of speed and memory limitations. Many of them used Palettes (Atari 8 and 16-bit, Acorn BBC and Archimedes, Macintosh II, Apple ||gs, PC EGA and VGA modes, the Amiga..) to provide a wider colour range than possible given the number of allocated bits per pixel and some used other tricks such as colour attributes which allowed a full colour range across the screen with a low colour boundary resolution (e.g. VIC-20, ZX Spectrum, Commodore 64, also Amiga HAM mode). As frame buffer memory approached 64kB or above, during the late 80s, it became possible to provide passable 8-bit true colour video in home computers. The early colour Macintosh II, PC MCGA and Archimedes computers fall into this category. They all use palettes, except that the Mac II and MCGA mode have 256 entries, each of 24-bits (or 18-bits in the case of MCGA).

The Archimedes A3020 inherits its graphics from its Atari ST / Commodore Amiga era incarnation with a limited 16 entry x 12-bit palette and a cheap hack to support a 'true' colour mode. The alternative, a proper 256 entry palette would have required a relatively costly 384 bytes of Palette RAM (+18K transistors) or a late chip redesign and a later or more expensive release for the integrated ARM250 chip[1].

Acorn's tendency to be technically pedantic I think, is what lead them to claim this mode is really 64-colours + 4 tints rather than a decent approximation to RGB332 from 16-base colours + 16 palette entries. RGBT2222 has some advantages, but RGB332 (really BGR233) makes most sense as a colour range, because all the others lead to either greater banding or a less coherent relationship between pixel bits and primary components. It turns out that it's possible to achieve a reasonable approximation to RGB332 on an Archimedes.

Notes [1]: ARM250 die from the Microprocessor Report linked earlier. The ARM CPU itself requires 29K transistors, so adding 18K transistors to VIDC would have resulted in a notable increase in size and cost for the 100K transistor, $25 chip.