Tuesday, 31 December 2024

Wobbly-Blue: An Optical Illusion on a ZX Spectrum And VIC-20

 Introduction

I recently came across an interesting optical illusion whereby a speckled-blue sphere on a random checkerboard pattern will appear to wobble relative to the checker-board if you move your head. When viewed on a mobile device, as you can move the device itself and the effect is even more pronounced.

I figured I could write a simple version of a program that generated it for the ZX Spectrum, and this is the result (you need to make the image occupy a fair amount of your field-of-view):

The program is fairly small:

The ZX Spectrum has character-level colour resolution, but because the blue sphere is surrounded by a black border, it doesn't cause any clashes. I originally wanted to produce a sphere where the blue coverage in the centre was obviously larger than at the edges, but it turned out that simply taking the sine of a random angle creates a distribution that looks spherical, because the rate of change is greatest near 0º so the dots are spread out more there and concentrated near the edges. If I let it run for about 2000 points it'd probably look more prominent.

Unexpanded VIC-20 Version

Quite frequently I like to create Unexpanded VIC-20 conversions of ZX Spectrum programs, because they're fairly contemporary machines with some similar characteristics, but the VIC-20 is more challenging to program due to a lack of support in its version of BASIC.

Here's the VIC-20 version:

The VIC-20 version is full of POKES to do what the ZX Spectrum version can do with PLOT, INK and PAPER. Also, the character set is squished and there's only 22 characters per line. But it's the techniques needed to perform hi-res graphics on a VIC-20 (particularly an unexpanded VIC-20) that's the real challenge.

Firstly, the VIC-20 can only really do hi-res graphics by modifying a character set of up to 256 characters. So, if you fill the screen with unique characters and update the pixels in each character then a full bitmap display is possible. However, on a standard VIC-20 screen there are 22 x 23 characters = 506 character positions which is far more than the number of characters in the character set! The VIC-20 'fixes' this by supporting double-height characters of 16-rows each, which means you only need 253 characters to fill the screen.

The second problem is that an unexpanded VIC-20 only has 3581 bytes free when you turn it on, and 253 double-height characters + the 506 screen bytes would need 4554 bytes, which is clearly more than what's available.

However, in this case, we don't need to fill the whole screen with bitmapped graphics, only the sphere in the centre! And in fact I would only need 172 single-height characters if I also reduced the screen size to 20x20 characters! 172 characters needs just 1882 bytes including the screen bytes. This leaves just over 2kB for my program!

How is this done? Well, I could work out which characters in the centre will be filled with the sphere's pixels and print unique characters for them, but it's easier to use the kind of tile-allocation technique you might use for a video game. You use some characters as background (in this case we only need one: character 255, which is filled with 8x $ff's).

Then whenever you want to plot properly on the screen you find out which character is being used at the character location at (INT(x/8),INT(y/8)). If it's not 255, then you can then look up the character in the character set memory; select the right row (Y & 7) then set the right pixel (128>>(X & 7)). Otherwise you allocate the next character code (denoted by UG% in the listing) and then fill in the pixel as before. It doesn't matter if the characters on the screen aren't allocated in order, because one simply gets the correct bitmap address from the character code itself.

The resultant program is similar to the ZX Spectrum version, except that a couple of subroutines are added. Line 1 allocates space for the new character set by setting the end of BASIC (and string stack) to 6143. Then 6144 onwards can be used. Lines 500 to 540 create the initial graphics setup. The screen size is set to 20x20 instead of 22x23. PAPER is set to black with the screen in non-inverse mode; the new character set is filled with 0s; the screen is filled with character code 255 and character 255 is filled with 255s too (all done with one loop). Finally, UG% is set to 0, as that's the first character code we'll allocate.

Lines 1000 to 1040 are very similar to the ZX Spectrum version except it works by setting the INK colour of each character to white or black for each checkerboard location.

Lines 200 to 220 are the plot routine discussed above. It also has to POKE colour memory (from 38400 onwards) to make the INK colour at that location Blue (colour 6). Finally we can run the program:


Pixels on a ZX Spectrum are square, but on a VIC-20 they're squished too, so the sphere is oblate. Total video memory is 2048b, and including 400b of screen memory that's room for 206 bitmap chars. So, I could have increased the checkerboard resolution by allocating 16 characters, taking the total up to 188.

What Causes The Effect?

I did a bit of searching for how the illusion works, but all I found was articles on how colour aberration can cause red or green colours to stand out in a sort-of 3D effect. But here's a simple theory. Human eyes are 10x less sensitive to blue than other primary colours and much more sensitive to luminance than colour. So, it's likely that the brain needs to do more processing for colour than for monochrome images; more processing for blue and more processing for sparse images (like this sphere).

That would make sense: the least sensitive blue cones might take longer to fire than the more sensitive rods (because it takes longer for enough energy to make them fire). There might be more neurone layers for making sense of colour image; more neurone layers for making sense of a sphere than a set of blocks and finally more neurone layers for making sense of a sparse image than a solid one.

All the extra processing causes a lag in processing; which means that when you move your head (or move the screen); the checkerboard pattern moves quickly in your field of view, but your other neurones take time to reconstruct the sparse, blue, sphere.



Tuesday, 19 November 2024

The Mythical Mac 256K

Introduction

This MacGUI blog post covers the boot process for the early System Software on a Mac 512K (or Mac 128K). In part of it, he talks about the Mythical Mac 256K.

At the beginning of the boot blocks are several stored parameters. The version number is two bytes. Another two bytes hold flags for the secondary sound and video pages. Then come a series of seven, 16-byte long file names. These are the names of the System resource file, Finder (or shell), debugger, disassembler, startup screen, initial application to run, and clipboard.

Following the names is a value for how many open files there may be at once, and the event queue size. Then come three, 4-byte values which are system heap size for a Mac with 128K of RAM, 256K, and 512K of RAM, respectively. If you could find me a Mac with 256K of RAM, I'd sure be impressed.  Wink

<snip>

If you landed from planet Krypton with a Mac 256K, your system heap size would be $8000 (decimal 32,768).

Disappointingly, as he also covers in this blog post, the 64K ROM only checks for the 128K or 512K Mac models and the 128kB model memory size is defined explicitly by the line:

4002E4: LEA $0001FFFC,A1 ;128K RAM

The tests work by exploiting incomplete memory decoding on early Macs. A Mac 128K's memory appears to be duplicated 4 times (actually it's duplicated 32 times all the way up to 4MB); whereas a Mac 256K is duplicated twice and a Mac 512K's memory doesn't get duplicated at all (up to 512K).

First (Failed) Attempt

I took a 64K Mac ROM and changed it to $0003FFFC, then added 2 to the 3rd byte of the ROM checksum to adjust it, which is in the first long word of the ROM. Then I added a miniVMac option for a Mac 256K, hacking about with a few scripts and some of the source code. I thought I'd correctly achieved a 256kB Mac, but I hadn't - for my 19kB MacWrite test document I had misread the MacWrite splash screen, which says how much memory is free, and used. I'd mistaken 'free' and 'used' for each other, because the values were fairly plausible for a Mac 256K.



PICO-Mac

I left the problem for quite a while after that, not knowing how to test it, but also because I lost the HDMI cable adapter for the Raspberry Pi 5 I was developing it on. Then Matt Evans referenced my post on his Axio.ms blog when creating a Raspberry Pi Pico based Mac emulator. I myself have another design for a PICO-based Mac, which would be able to fit the entire 68K emulator in its 16kB of SPI flash cache and still run at full-speed, allowing for a full 256kB of RAM on a standard PICO. This is for the future, if ever.

PCE-Mac Emulator

The key to being able to solve the problem is to use the PCE-Mac emulator, which has a step-time debugger, allowing me to figure out what really went wrong. 

What it looks like is this: Apple originally wanted to be able to support a 128KB, 512KB Mac, or a 256KB Mac at least for development purposes. Perhaps some of the engineers wanted to be able to bolt on another 128KB just for a bit more space. That made sense in 1983, because 256kBit RAM chips didn't yet exist and a 512kB Mac would need 64 RAM chips; whereas a 256kB Mac would only need 32. 

However, either they made a mistake in the ROM, which they tried, but failed to correct in the disk Boot block, which prevented them from doing that. The ROM itself is at fault, because a 256kB Mac will look like a 512kB Mac to the ROM. Here's the ROM code that works it out:

FindMemSize:;
4002DE:LEA $7FFFC,A0; 512K RAM
4002E4:LEA $1FFFC,A1; 128K RAM
4002EA:CLR.L (A0)+
4002EC:TST.L (A1)+
4002EE:BNE.S 4002F2; branch if 512K
4002F0:MOVE.L A1,A0; this is a 128K machine
4002F2:MOVE.L A0,$0108; set MemTop

A0 is set to point to the last word of a Mac 512K and A1, the last word of a Mac 128K. Initially all the RAM is set to $ffffffff. At $4002EA, the last word of a 512K machine is cleared and then the last word of the Mac 128K is tested. On a Mac 512K it won't be clear, so it won't be 0, so it'll branch and A0 will be used as is to define MemTop. On a Mac 128K it will be clear so it will be 0. It won't branch and so A1 (=128K) will be copied to A0 and then used for MemTop.

However, on a Mac 256K, the long word at $1FFFC also won't be cleared, so the ROM will think it's a Mac 512K. Nevertheless, it won't work to hack the ROM to make line 4002E4 LEA $3FFFC,A1 for a 256kB Mac, because:
  • It would mess up the ROM test for a 128KB Mac.
  • It turns out the Boot block on the Floppy Disk does its own test, explicitly for a Mac 256K and does the wrong thing if MemTop is set to 256K.
This is the critical part of the boot block:

0001008A:2238 0108MOVE.L $0108, D1
0001008E:4841SWAP D1
00010090:0C41 0004CMPI.W #$0004, D1
00010094:6E2CBGT.S $000100C2
00010096:7200MOVEQ #$00000000, D1
00010098:50F9 0001 FFF0ST $0001FFF0
0001009E:42B9 0003 FFF0CLR.L $0003FFF0
000100A4:4AB9 0001 FFF0TST.L $0001FFF0
000100AA:6716BEQ.S $000100C2
000100AC:7002MOVEQ #$00000002, D0
000100AE:4840SWAP D0
000100B0:D1B8 0108ADD.L D0, $0108
000100B4:D1B8 0824ADD.L D0, $0824
000100B8:D1B8 0266ADD.L D0, $0266
000100BC:D1B8 010CADD.L D0, $010C
000100C0:7204MOVEQ #$00000004, D1

Initially it checks for a 512kB Mac at $10090 and if it isn't falls through to the clever, but weird section at $10098 to $100A4. This code sets the top of RAM for a 128KB Mac to -1, then explicitly clears the top of RAM for a 256KB Mac and if the address at the top of a 128kB Mac has become 0, then it knows that the address range from $20000 to $3FFFF mirrored $00000 to $1FFFF and therefore it's a 128KB Mac. It will then execute the BEQ skipping the underlined code which adds $20000 to all the key system variables, starting with MemTop.

The upshot is that the boot block expects a Mac 256 to be set up as a Mac 128 in the ROM; and then the system variables are adjusted. And of course the ROM won't do that: a Mac 256 is set up as a Mac 512 in the ROM so the BGT instruction at $10094 skips the whole routine. They therefore could have fixed the boot block to check MemTop for a 512kB Mac and if it's really a Mac 256, subtract 256K from all the system variables.

Because they didn't do that, the ROM has to be changed.

A Correct ROM Fix

The answer is to save bytes on the LEA instructions. It's possible in some cases to generate large values in data registers by using the sequence MOVEQ, then SWAP. It takes just 2 words instead of 3 and because 128kB, 256kB and 512kB all have 0s in the bottom 16-bits, we can use this technique (the boot block uses this technique to test for 256kB too!). However, we need a data register to do this and I can't be sure it won't corrupt a useful data register value unless the execution path later overwrites a data register.

Fortunately, it does. At $322, there's a MOVEQ #2,d1. So, now we know we can use D1. The trick then is to set up both A1 and A0 to point to 256kB and 512kB respectively. My sequence is 1 word shorter than the original. Then, instead of clearing and testing the last words and post-incrementing to get the sizes; we pre-decrement from the sizes to get to the last words of RAM and test them. This will return the same result for both 128kB and 256kB Macs, but a different result for the 512kB Mac.

We don't need A0 and A1 now, because the actual size (for a 512kB Mac) is already in D1 so we can use our word of ROM space saved to move D1 to A0. And for the 256kB or 128kB cases, we can take the 512kB value; divide by 4 (by shifting right twice) to get 128kB, which is also a 1 word instruction substituting the MOVEA A1,A0 in the original code. This means that ROM routine now looks like this:

000002DE:7204MOVEQ #$00000004, D1
000002E0:4841SWAP D1
000002E2:2241MOVEA.L D1, A1
000002E4:D281ADD.L D1, D1
000002E6:2041MOVEA.L D1, A0
000002E8:42A0CLR.L -(A0)
000002EA:4AA1TST.L -(A1)
000002EC:6602BNE.S $000002F0
000002EE:E489LSR.L #$2, D1
000002F0:2041MOVEA.L D1, A0

We also need to make a change to the checksum again, which is a sum of the 16-bit words in the ROM. It now needs to be $28BA8FB5 instead of $28BA61CE.

Results

My 19kB MacWrite document now reports a reasonable value for a 256KB Mac:
On a Mac 512K it'd be 95%:5%. For testing purposes I wrote another program called FreeMem, which allocates 1kB blocks until it runs out of memory. The app itself uses about 260 bytes. There is a proper API for this, but I found that it wasn't reporting the value I was expecting. On a Mac 128 Freemen goes up to 75kB and on a Mac 512 it's about 430kB or more. On this Mac 256 it's 188kB:

This is only possible for a Mac 256K

In Matt Evans' blog post he said that it wasn't possible to run MacPaint on a Mac 256 disk, because it writes data to the boot blocks. I have tested this out and found that in fact it does run OK. Here's a screenshot, even though it's not actual proof since MacPaint can't tell you how much RAM is free AFAIK.

Conclusion

The Mythical Mac 256K has been talked about for a number of years and perhaps a few even existed during the early Macintosh development period. I tried, but failed to create one under the miniVmac emulator in December 2023, but after it was mentioned in Matt Evans' Pico-Mac blog and a couple more times on the 68KMLA when discussing a Pico-Mac based on the new Raspberry Pi PICO 2, I decided to explore it again. It turns out it really is possible and as far as I know, this is the first Mac 256 that actually runs, albeit in emulation!

But what's the point? Well, there's a real digital archeological purpose behind the Mac 256K, simply because the floppy disk Boot block code for System 1 up to (I believe) System 4.1 contains code for it. It seems unlikely, that someone would make the effort to create such an arcane hack if no-one considered the possibility of a Mac 256K. Nevertheless, the ROM won't allow a Mac 256K to boot properly. It is possible that a Mac 256K could have been built though, all an engineer would have to do is build a toggle-switch logically OR'd to address line A17; start up a Mac with the toggle switch connected to 5V (the Mac would then look like a Mac 128K); then flip the switch (which would enable A17, which would enable the second bank of 128K RAM) and insert a boot disk: The Boot disk would then see 256K and adjust the system variables.

The second reason is that in the 1983 to 1985 period, microcomputers with 256K of RAM were at the upper end of the market and the Mac was intended to be released in 1983 (and even before if it had been possible). The IBM PC/AT, for example was released in 1984 with 256kB as standard. A few early Atari ST's were sold with 256kB of RAM in 1985.

The third reason is that Apple already knew that the Mac 128K's RAM was limited to the point of near impracticality. MacWrite could only store a document with a few pages on it. However, a Mac 256K is far more usable. The extra 128kB of RAM can store between 16,000 to 18,000 more words in a document; enough for about 32 to 36 pages, and easily enough to store an undergraduate dissertation or a whole chapter of a book. 256K is big enough for a proper, small Pascal compiler (note: MacPascal was an interpreter); making development much more feasible.

On the downside, a Mac 256K would have needed 32 of the 64kBit chips the Mac was supplied with and this would have made the motherboard bigger, more costly and perhaps affected the shape and design of the Mac itself. The Mac 512K on the other hand, still only needed 16 RAM chips - in fact the motherboard was designed to accommodate them by simply fitting the new chips and cutting a track.

Next Steps

The 64K ROM for the Mac 256K can be downloaded from the 68KMLA thread. The next two major steps really are to use it with PICO-Mac running on a Raspberry Pi PICO 2 (which gets it closer to the hardware); and perhaps at the same time, get someone to burn the ROM and install it on a genuine Mac 128K, fitted with 256K of RAM (this is a big ask).

In the meantime I'll see if can get it to work with miniVMac and continue to work on M0̸Bius, my super-fast 68K emulator in M0+ assembler.

Sunday, 17 November 2024

Saving Europe-ASAP (Ally State Access Privileges)

 In 1939, 85 years ago at the time of writing, the UK government declared war on the Nazi régime in order to defend the sovereignty of Poland, but ultimately to rescue Europe from fascism.

It was unsuccessful in the first aim, but successful in the second with the help of allies across the globe, in particular, the United States under President Roosevelt, and the set of countries that became the Commonwealth (which at the time were part of the British Empire).

Today Europe is at a similar crisis point following the insurgence of far-right political groups in Europe and the recent re-election of Donald Trump as the president of the United States.

It is already the case that his administration is demanding vassalage from European countries. Vassalage is an appropriate term, since Trump is acting like a medieval king. In this context, a number of far-right European leaders are already pledging support at a time when the most dominant EU countries, Germany and France are facing a democratic crisis.

Meanwhile, Britain, as in 1939 has never been more alone: cut-off from the EU since the 2016 Brexit referendum and now deeply at odds with a United States that considers the new Labour government to be a socialist enemy. Falling into line with Trump's demands would certainly destabilise the UK, given that high-profile Trump supporters such as Elon Musk regularly portray the country as a Police State or close to civil war.

What we need to do at this time is demonstrate solidarity with the EU, but this too is not easily possible, because the Labour government is determined to honour Brexit by not rejoining the EU. Perhaps though, given the ugency, there is another way: Ally State Access Privileges (ASAP), a temporary mechanism for supporting Europe in a new time of need.

This blog post is a rough (and very ignorant) draft proposal of the ASAP, how it operates, who is involved and its underlying purpose.

Purpose

ASAP exists specifically to prevent the EU and Europe in general from falling into the hands of the far-right. European countries have been making increasingly desperate attempts to exclude far-right parties from government as they grow in power, following the 2008 crash and subsequent economic austerity policies. These attempts could fail within the early part of 2025, a few months from the time of writing.

ASAP dissolves immediately when the threat from the far-right is over.

The ASAP serves to stabilise the economy and political composition of the EU; accelerate its environmental programme; maintain rights while radically reforming economic competitiveness.

Operation

  1. ASAP is a temporary political alliance between member states of the EU and nations willing to support European stability. It applies to an ASAP alliance member for a 1 year duration and must be renewed every 12 months.
  2. It confers the temporary elimination of trade barriers on conditions of EU regulation compliance, as per EU membership and includes temporary access to the single market and customs union.
  3. It confers Freedom of movement to individuals who are part of ASAP member states.
  4. It provides a subset of access to European Institutions:
    1. Voting privileges to ASAP alliance members in the European Parliament. The representative block is exactly the same size as would be the case if the ASAP member was an actual EU member, but the composition of an ASAP contingent is in proportion to the composition of the legislative in the respective member state. ASAP MEPs are chosen by decree from its member state by any mechanism they see fit.
    2. The head of state from an ASAP member is allowed on the European Council, as per EU membership.
    3. One ASAP member per ASAP state is allowed on the Council of the European Union and European Commission. However, no ASAP member can be a European President.
    4. One ASAP member per ASAP state is allowed as part of the Eurogroup specifically in order to temporarily influence and help align economic policy between ASAP members and the EU.
    5. Others?
In order to combat the threat of the far-right, which is the primary purpose of EU-ASAP, media regulation is placed on a war-footing:
  • Access to social media dominated by the far-right is prohibited by law, where dominated means both that the content must be less than 3/5 represented by right-wing posts or reposts or contain more than 2% of far-right posts. (Is this reasonable - what's the criteria on Bluesky?).
  • Editorial guidelines for Printed media (and their online counterparts) are to be placed in the hands of a trust, analogous to the Guardian Scott Trust, in order to eliminate editorial influence from their owners; while retaining the same rough remit for their political flavour. The intent is to ensure freedom of expression while restricting extremism.
  • Extended powers for media regulators (such as OfCom in the UK) will ensure that factual errors in articles must be corrected with the same prominence given to the original article. Multiple regulatory breaches will lead to a process of mentoring by journalists from randomly chosen media outlets meeting higher regulatory standards. The purpose of this is to slow down the publication of objectively misleading articles while avoiding political bias.
Finally, the EU-ASAP commits to stripping the purpose of World War 2 from Nationalistic interpretations. Specifically, as described in the first paragraph, the allies purpose was to not to wage war on Europe and Indo-China (since they are not the enemy), but to free Europe from fascist control.

Members

In the first instance, ASAP membership should be open to non-EU European states. This includes the UK; EEA members and EFTA members. Because of the already close alignment with EU regulations this should be a rapid process. If these states fall to the far-right, their ASAP membership is automatically cancelled.

In the second instance, ASAP membership should be extended to stable states that share close democratic values as the EU, namely Australia, New Zealand, Japan and Canada. If these states fall to the far-right, their ASAP membership is automatically cancelled.

In the third instance a more limited ASAP membership should be extended to states that share common environmental goals, if their membership strategically serves the purpose of the EU and the country itself. This may include India, South Africa (and any other stable African country) and controversially, China. In the case of China, it shall include delegates from both the Communist Party of China (CPC) and Hong Kong Legislative council to provide a wider range of representation. Why China? Because the US under Trump is also determined to wage an economic war with China, so a closer association between the EU and China; with a corresponding weakening of links between China and Russia would benefit the goal of weakening the far-right.

Conclusion

The re-election of Donald Trump and encroachment of the far-right across Europe has brought us to the point of an emergency comparable with the lead up to World War 2. Therefore, radical, emergency actions must be taken in order to stabilise the continent while there is still time. EU-ASAP provides a draft of a suitable programme for a set of suitable alliance members of whom are not able to be full EU members for a number of practical and political reasons.

Note: Further edits will include suitable links.

Wednesday, 24 July 2024

Why does Sin(a)≈a for small a in radians?


 Introduction

When we were introduced to calculus during 'A' levels (actually 'AO' levels), an investigation into differentiation was used to show that sin(a)≈a for small angles, if the angles were in radians.

At the time, the observation seemed fairly magical: that somehow by picking radians as the units for degrees made it work out. To me, radians always seemed an arbitrary unit, because it wasn't a whole number, but 2π for a complete circle. And the method didn't help to enlighten us, because it involved repeated calculations for sin(a+d)-sin(a) for a0.

Actually, though the reason is very simple. Let's consider a full circle with a unit radius:

The distance around the circle (the circumference) is 2π and any part of that distance a around the circle is its angle in radians (since 2π = 360º). Now let's look at a right-angled triangle embedded in a small part of the arc:


The height of the triangle is h, is nearly the same as the length a of the arc around the circle for angle x. And the height of the triangle is the actual meaning of sin(a), since the length a of the arc is the same as the angle in radians, because the radius=1. If we just enlarge that part, we can see that it's very close, but not quite the same.


Therefore, to understand the equivalence doesn't require any advanced notion of calculus, it can be seen directly. Of course, understanding that ∂h=∂a i.e. when a is infintessimal is calculus.

Saturday, 6 April 2024

Space Shuttle CPU MMU, It's Not Rocket Science

The NASA Space Shuttle used a cut-down IBM System/360 influenced CPU called the AP-101S aimed mostly at military aeronautical applications. It's fairly weird, but then again, many architectures from the 1960s and even 1970s are weird.

I've known for a long time that the Space Shuttle initially had an addressing range of about 176K and because one of the weird things is that it's 16-bit word addressed (what they call half-words), this means 352kB. Later this was expanded to 1024kB (i.e. 512k half-words). How did they do this?

You might imagine that, being jolly clever people at NASA, they'd come up with a super-impressive extended memory technique, but in fact it's a simple bank-switched Harvard architecture where the address range for both code and data is split in two and 10 x 4-bit bank registers are used to map the upper half to 64kB (32k half-word) banks.

So, the scheme is simple and can be summarised as:

Two of the 10 bank registers are placed in the 64-bit PSW in the BSR and DSR fields. The other 8 DSE registers are used only when the effective address of an instruction uses a Base Register that's not 0 (which is when the effective address is of the form [BaseReg+offset] or [BaseReg] or [BaseReg+IndexReg+Offset]).

Documentation

The documentation, for this, however is overly convoluted, wordy, and repetitive. The best source I could find is here, which seems to be the same document twice where the second version is the better one, but written in 1987 using an early 80s word processor (WordStar?) instead of being typeset.

There's a bizarre memory diagram on page 2-19, which can only be easily understood by people who already understand the system, which proceeded a less incomprehensible, but difficult to visualise, flow-diagram description of the MMU:

Both of these diagrams are trying to say the same thing.

Surrounding this stuff is several paragraphs of badly-worded text explaining "banked-memory". It's obviously badly-written, because the official text needed hand-written corrections! I had to trawl through it to check if the bank-switching worked as it appeared to, but it does. It's all very odd.

Bank-switching was a standard technique for expanding memory in late 1960s minicomputers (like the Data General Nova series or the DEC pdp-11) as well as a plethora of 8-bit computers (including CP/M, BBC Micro, Apple ][, Apple ///, Sinclair and Amstrad) and then again in the late 1980s on IBM PCs and clones as affordable RAM started to exceed 1MB. People knew how to both expand memory with bank-switched techniques and just as importantly, describe it. Most bank-switching diagrams look a lot like mine (mine are inspired by DEC pdp-11 user manuals).

So, why is NASA's documentation is so poor here? My most plausible explanations are:
  1. Budget cuts and pressure in the 1970s and 1980s led to poor documentation. This can be checked by reading the quality of the documentation prior to this mid-80s document and/or later if standards improved.
  2. Justification: all NASA hardware, including the AP-101S was expensive, so convoluted documentation helps convey the idea that the public and NASA were getting their money's worth: if you can't comprehend it, you won't grasp how simple and standard it was.
  3. Small developer-base: documentation tends to suffer when not many people are developing for a given product. That's partly because there's a lack of resources dedicated to documentation, but it's also because documentation is frequently passed on verbally (what I sarcastically call Cognitive documentation, i.e. no documentation 😉 ). I don't know the size of the Shuttle software developer team, but I guess it was in the low hundreds at any one time, because although the memory space was only a few hundred kB; I believe they loaded in various programs to handle different mission stages (e.g. ascent, docking, orbital corrections, satellite deployment, landing) and that means if there's a few MB of code, that's about 300,000 lines and given the safety requirements, perhaps only a few thousand lines per developer.
Nevertheless, I don't think the poor documentation is excusable - it implies the software quality is worse than is often claimed.

Conclusion

NASA engineers are often lauded for their genius. There's no doubt that NASA has to employ clever people, but like all computer engineers they make the same kinds of mistakes as the rest of us and draw upon the same kinds of solutions. The AP-101S was a 32-bit architecture cut-down into a hybrid 16-bit / 32-bit architecture. Part of the initial simplification was to limit addressing to 16-bits, but like every architecture (including NASA's AGC from a decade earlier), requirements exceeded that addressing capability. And even though the index registers were 32-bits, prior decisions (like the 24-bit addressing on early 32-bit Macintosh computers) forced more complex solutions than merely using more bits from the registers.

There are not many techniques designers can use to extend memory addressing, and NASA picked a common (and relatively conventional) bank-switching one. For reasons I'm not sure of, their documentation for it was pretty awful, and despite their best efforts at software standards, both the bank-switching mechanism and the documentation would have, sadly, reduced the quality of Space Shuttle software.



Thursday, 28 March 2024

Mouse Detechification! A PS/2 To Archimedes Quadrature Mouse conversion!

After an embarrassing amount of work I managed to convert a PS/2 mouse into a quadrature encoding mouse for my Acorn Archimedes A3020.


A Mini-Mouse History

Doug Englebart designed the first mouse, for his (at the time) very futuristic NLS demonstration in 1968 (it was built by a colleague, Bill English). It had one button, was brown (like real mice :-) ) and used two wheels, which must have made it scrape quite badly when moved diagonally:


Xerox picked up the concept of a Mouse for their pioneering Alto workstation project. They shrunk the wheels and then added a ball at the bottom which could drive the wheels easily in any direction.


Other early mice from Apple, Microsoft, Smaky, Logitech, AMS, Atari, Sun, ETH, Commodore (Amiga) and Acorn were variations on this kind of design: Quadrature-Encoded mice, using a ball to rotate two, orthogonal wheels and minimal electronics to convert the signals (often just a simple TTL Schmitt-trigger buffer).

Each wheel (one for left/right and the other for up/down) had about 100+ spokes in it and an infra-red LED shone light between the spokes to a pair of sensors (per axis) at the other side. As the ball turned and moved a wheel, a spoke would block the light to the sensors and then let the light through when it passed. However, because the sensors were offset from each other, a spoke would block light to Sensor 1 before blocking light to Sensor 2 and then let light through to sensor 1 before sensor 2. So, the sensors would see this if you moved right (or up):


So, you can tell if you’re moving right (or up), because Sensor 1 gets blocked first each time and left (or down) if Sensor 2 gets blocked first.

The problem is that it takes a lot of wires for a mouse to work this way and requires a surprising amount of computer power to handle all the interrupts, so later mice designs offloaded that to a microcontroller inside the mouse which then communicated with the computer using a simpler, 1 or 2 wire serial interface: these were called PS/2 mice on PCs (Apple had a design called ADB, Apple Desktop Bus). Eventually, USB mice replaced them and optical mice replaced the ball and wheels.

I know I could have bought a USB to quadrature encoded mouse adapter (from here on, QE mouse, because I don't like the term Bus-mouse), but that seemed like a cop-out, so instead I decided to make things as painful as possible. The first stage was to cut out the blobtronics that made it a PS/2 mouse.


Then I added some pull-down / level-shifting resistors to see if the A3020 would recognise it as valid values but it didn't. Then I went through a long process of getting an Arduino to recognise the analog voltages from the phototransistor (yep, I also know I could use a Schmitt-trigger buffer chip) and write a small program to actually process them into quadrature encoded X and Y grey code (Ref: Dir are 00, 01, 11, 10, 00... for forward/up and 00, 10, 11, 01, 00.. for backward/down). It turned out it wasn't very digital!


I figured I could solve the problem in software using a small MCU that would fit in the mouse, so I chose an ATTINY24, a 14-pin AVR MCU with 8 ADC 10-bit channels and 4 other digital IO pins. I used a simple hysteresis algorithm: it first figures out the range of values it can get from the phototransistors; then allows a bit more time to see if the range is bigger; then once it's interpreted any given state, the ADC values have to change by 2/3 of the range to flip into the next state.

I went through quite a bit of a debugging process, because obviously when you go from a fairly lush environment like an Arduino (plenty of libraries, plenty of IO pins, lots of code space and relatively abundant RAM (2kB ;) )) to an ATTINY24 (2kB flash, 128 bytes of RAM, just 12 IO pins) and then stick it in a mouse, the fewer opportunities you have for debugging - and you can't even use an LED inside a trad, ball mouse, because ta-dah, you won't see it! So it's best to debug as much as possible in simulation, before you take the final step. In fact the program only ended up being about 906 bytes, because 'C' compiles pretty well on an 8-bit AVR.

I made a lot of mistakes at pretty much every stage - but amazingly getting the mouse directions inverted wasn't one of them :) . I started with 10K resistors for the phototransistors, but then ended up deducing 15K was best (22K || 47K) so that was 8 resistors! When soldering the analogue connections to the AVR I was out by one pin all the way along, because from the underside, I mistook the decoupling cap pins for the Pin1 and Pin 14 of the AVR. I had to desolder the phototransistor connections to the main cable - which I'd been using when analysing on the Arduino; then solder up the photo transistors to the analog inputs and then the digital outputs to the original wires. I tried to keep the wires no longer than they needed to be because it was cramped in there, but I ended up making the Y axis analog wires just 1mm too short (because, d'uh they get a bit shorter when you strip them and feed them through a PCB) so they had to be redone. Because there were a lot of wires I needed to glue them down as otherwise the casing wouldn't close, and I was particularly concerned about the phototransistor wires getting caught in the Y axis wheel, but then I glued them down directly underneath that wheel so it couldn't clip into place! I also glued the grey and green wires running up the right so that they got in the way of the case closing - so all of these had to be very carefully cut out and moved. Near the end I remembered I had to add a 10K resistor between reset and power so that the MCU would actually come out of reset and execute code! I also had to add an input signal, to software switch the Y axis inputs to the X axis grey code, because the only header I could find to fit the mouse cable plug for testing didn't leave room to test both X and Y grey codes! Hence what looks like an extra button!

Finally I connected it all up, glued the board and wires down (after correcting the locations) and got: A BIG FAT NOTHING! I thought maybe I'd messed up the analogue ranges and retried it with a different range and that didn't work. Then I realised I could output debug serial out of one of the grey code outputs to the Arduino and see what the ATTINY was reading! Bit-banged serial code can be very compact!

void PutCh(uint8_t ch)

{ // (ch<<1) adds a start bit in bit 0 and |0x200 adds a

    uint16_t frame=((uint16_t)ch<<1)|0x200; // stop bit.

    do {

        if(frame&1) {

            PORTB|=kSerTxPin;

        }

        else {

            PORTB&=~kSerTxPin;

        }

        frame>>=1;

        _delay_us(52.083333-11/8); // 19200 baud.

    }while(frame); // & after the stop's been shifted out,

   // the frame is 0 and we're done.

}


Before I tried it, I did a sanity check for power and ground only to find I hadn't actually connected up VCC properly!!!! I'd forgotten the final stage of solder-bridging the decoupling cap's +ve lead to Pin1 of the AVR and I ended up ungluing everything so I could see underneath.

But when I fixed this and packed it all back in the case: It Worked! I tried reading the digital outputs at the end of the cable on the Arduino and when I was satisfied (which only took a few minutes of testing) I decided to hook it up to my A3020 and hey-presto! I have a working QE Mouse!


I had been a bit concerned that doing analog sampling wouldn't be fast enough and so my algorithm has a heuristic whereby if it sees a jump of 2 transitions (00 <=> 11 or 01 <=> 10) it assumes it's a single step in the same direction as before. I could manage about 20K x 4 samples per second, but a little maths shows this will be fine, because a full 640 pixels on the Arc's screen can be sampled OK if you cover it in about 3.2ms and clearly we don't do that.

!Paint, which is like, 96kB I think! It's terribly unintuitive though! Note on the video that a 12MHz ARM250 can drag whole, 4-bpp x 640x480 windows around rather than just outlines! Note also the Taskbar, which was standard on the Archimedes before Windows 95, or Mac OS X (NeXT step had a floating Dock).

Here, again is the link to the project and source code.



Sunday, 24 March 2024

Colour Me Stupid - An Early Archimedes 8-bit Colour Quest!

I'm assuming, naïvely again, that I can write a short blog post, but on past performance this isn't likely. I recently managed to get my Acorn Archimedes A3020 working again by converting a PS/2 mouse to a Quadrature mouse as early Archimedes' expect (see this blog post) and this has given me more of an interest in the system, particularly from a hardware viewpoint.

Wonderfully, I'm currently writing this on my MacBook Air M2, a 32-year later descendent of the ARM250 that powers that original A3020, so things have come full circle in a sense.

These early Arcs had a video chip called VIDC which supports 1, 2, 4 and 8-bit colour video modes at different resolutions, but for decades (probably even since the first review I saw in Personal Computer World August 1987), I was confused as to why the 8-bit colour mode was described as merely having a 64 colour palette with 4 tints instead of proper 8-bit RGB colours.


Why create something complex, which doesn't even do the job? What kind of terminology is 'tints'? How do you really use them?

The confusion deepened when I started to look into the VIDC hardware, because it never supported a 64-colour palette and some tints, instead it always supported a 16 x 12-bit colour palette for all modes, including the 8-bit colour mode. So, how did that work?

Standard VIDC Palette Entry

Sup Blue Green Red
S B3 B2 B1 B0 G3 G2 G1 G0 R3 R2 R1 R0

The standard VIDC palette entry contains 4-bits for each component, Blue, Green and Red, oddly in that order than the conventional Red, Green, Blue order. In addition, it has a sort-of single alpha bit which can be used for GenLocking. There are just 16 palette entries, so any one of them can be selected in 4-bits per pixel modes, but fewer of them are used in 1-bit and 2-bits per pixel video modes.

In 8-bit colour mode each 8-bit pixel is composed of a 4-bit palette entry and 4-bits which replace the four palette entry bits in bold above.

Direct Palette
B3 G3 G2 R3 Palette 0..15

Although I've heard several claims that these ARM computers couldn't do proper 8-bit colour RGB, with 3 bits for Red, 3 bits for Green and 2 bits for Blue (human vision is less sensitive to blue). In fact we can immediately see that by defining the 16 palette entries so that they literally provide the other bits, we will get the equivalent of RGB332 (really BGR233). This gives us:

DirectPalette
B3 G3 G2 R3B2 G1 R2 R1

Now we have 3 bits for Green and Red, and 2 bits for Blue. This means that in theory we have a proper 8-bit RGB mode; where we can freely select any one of the full range of 256 colours such a mode could describe. Note, we don't have a palette of 64 colours + 4 tints, we have a palette of 16 colours + 16 tints each and the palette can be assigned to provide the missing RGB bits.

How To Mess Up Palette Settings

A simplistic implementation of this would be to set all the remaining bits of the palette entries to 0, i.e. B1, B0, G0 and R0. This gives us the following palette:

B2G1\R2R1 R2R1=0 R2R1=2 R2R1=4 R2R1=6
B2G1=00 0x000 0x002 0x004 0x006
B2G1=01 0x020 0x022 0x024 0x026
B2G1=10 0x400 0x402 0x404 0x406
B2G1=11 0x420 0x422 0x424 0x426

To re-emphasise, by itself this would give a very dull palette, because B3, G3, G2 and R3 are never set, but as described earlier, these bits are provided directly by the upper 4 bits of each 8-bit pixel. Consider three pixels: 0x03, 0x63 and 0x83. They all use the palette entry 3, which provides a medium-level red 0x006, but the second pixel would add a green component of 0xc making the colour: 0x0c6 and the third would add a blue component of 0x8 making the colour 0x806 (purple-ish). Combining the palette entries and modifiers then gives this 256 colour range:


Here, the colour range has been generated by a BBC Basic program running on an Arculator, an excellent Archimedes Javascript emulator. It looks pretty decent. It's shown in two formats, a literal RrrGGgBb view is on the left where each column represents the bottom 2 bits for green and both bits for blue while subsequent rows increment the top bit for green and the red component. However, it's easier to block out by splitting the full range into 4 quadrants of RrrGGg where each quadrant increments Bb. I also tried the same program on my real A3020, but this time in mode 28 as my monitor couldn't really cope with the non-VGA mode, mode 13 and got this:


I made a programming typo with the linear representation, but the quadrant version looks better than the emulator! This shows that the palettes really can generate a reasonable RGB332 colour range.

There is a minor problem in that the colours don't span the full colour range. Red and Green can only go to 87.5% of the maximum (0xe) and blue can only go to 75% of the maximum (0xc). The conventional way to address this is to replicate bits in the component, so the 8 proper levels for red and green would be: 0b0000, 0b0010, 0b0100, 0b0110, 0b1001, 0b1011, 0b1101, 0b1111. And for blue it'd be: 0b0000, 0b0101, 0b1010, 0b1111. And if the Arc had a full 256 colour palette, like every colour Mac from the Mac II onwards had, that's exactly what's done. Unfortunately, if you try to approximate this, you get a worse arrangement:


There are two obvious problem: on the right-hand spectrum, I've outlined how the Red=3 and Red=4 values look almost the same (as as do the Blue=1 and Blue=2, outlined on the left). This is because the difference is only 1 in each case: Red=3 translates as 0x7 (from the palette), and Red=4 translates as 0x8 (from R3); while Blue=1 translates as 0x7 (from the palette) and Blue=2 translates as 0x8 (from B3).

It turns out, then that the naïve palette assignment is the most evenly distributed one. And this brings us to the next observation:

A Hint On Tints: Exact 8-bit RGB Is Impossible

On a real 8-bit RGB palette, the blue values 00 to 11 scale into the full range of blue: 0x00 to 0xff, matching the same range as green and red where 000 to 111 scale to 0x00 to 0xff. However, the Archimedes palette (as alluded to earlier) scales unevenly: blue scales to 0xc0 while red and green scale to 0xe0. Also since neither scale to 0xff, the colours will be more dull.

Instead what we get is an effective 64-entry palette where we only consider the top two bits of each component: BbGGxRry + 4 green/red tints for each one: xy = 00, 01, 10, 11. And this explains why the Archimedes manual always describes 8-bit colours in those terms, but the choice of their default tints is different.

One of the other major problems with this palette is that you can only have 4 proper BGR233 greys: 0x00, 0x52, 0xa4 and 0xf6. The slightly brighter colours: 0xf7, 0xfe and 0xff are off-white, pink, green and yellow tints. Ironically, proper RGB332 can only manage two greys! Consider RGB332 represented by 256 x 12-bit BGR palette entries or 256 x 24-bit BGR palette entries. Black is RGB332=0x00 which maps to 0x000 in the 12-bit palette and {0x00, 0x00, 0x00}. The next closest is Red=Green=2, Blue=1. This is 0x544 in the 12-bit palette and {0x55, 0x49, 0x49} in the 24-bit palette - both slightly blue. Then Red=Green=4, Blue=2, which is 0xa99 in the 12-bit palette and {0xaa, 0x92, 0x92} in the 24-bit palette, again, both slightly blue; then Red=Green=7, Blue=3 which is 0xfff in the 12-bit palette and {0xff, 0xff, 0xff} in the 24-bit palette.

In both of those cases, even the 24-bit palette is 16% out from a grey which is distinguishable to the human eye.

The Alternative Palette Approach

The conventional VIDC 8-bit palette with 64 base colours and 4 tints I now understand will look something more like this:

Direct Palette
B3 G3 G2 R3 B2 R2 T1 T0

Where T1 and T0 represents the tints of white, which get added to all of B1B0, G1G0 and R1R0. This kind of palette would contain these entries:

B2R2\Tint 0 1 2 3
B2R2=00 0x000 0x111 0x222 0x333
B2R2=01 0x004 0x115 0x226 0x337
B2R2=10 0x400 0x511 0x622 0x733
B2R2=11 0x404 0x515 0x626 0x737

This time we essentially have 4 dimensions to consider: Blue<2>, Green<2>, Red<2> and Tint<2>, I thought a recursive arrangement would be clearest, but it turns out that a semi-linear arrangement is:


Here, 4 horizontally adjacent blocks are the tints and then each horizontal block of 4 is the red component; while 4 vertically adjacent blocks are the blue component and each vertical block of 4 is green.

Trying to process images in this convention is challenging, because (as we'll see in the next section) it's hard to calculate how to dither each component, because they aren't truly independent and can't be, because there are really only 3 primaries (Red, Green, Blue), but four components. For example, it's easy to see that the top left and bottom right tint groups are actual greys, but harder to see that there's two other grey blocks (which I've outlined). This means we can't independently adjust tints against RGB.

Nevertheless, this convention has a several advantages over RGB332:
  • The RGB primaries are all evenly distributed, they get 2-bits each.
  • There are 16 levels of grey, which means that anti-aliased black text on a white background (or its inverse) can be done pretty well.
  • There's a brighter range because it goes up to 0xfff.
  • Human vision is more attuned to brightness than colour, which is why the HSV colour model is effective, so it's possible that this convention can represent images that perceive better to us.

Dithering

Even though we only have 256, imperfect base colours in our quasi-BGR332 palette, we can employ a standard technique to mask this, called dithering. I wanted to generate the inner surface of an RGB cube (where black is the bottom, left, front corner and white is the top, right, back corner) to show how we can generate fairly smooth colour transitions, by applying a fairly standard Floyd-Steinberg dither.

There's a really good blog post on dithering here, which covers far more forms of dithering that the ordered and FS dithering I knew about. FS dithering works by scanning the original image in raster order and then computing the error from the closest colour we can generate and then propagating the error to the immediate pixels on the right and below (or right and above).

* 7/16
3/16 5/16 1/16

In fact we can compute all of this by just maintaining a full-colour error array for a single row + the single pixel to the right.

3D Projection

So, dithering is fairly simple, but the 3D projection was actually fairly complex, because I couldn't just draw an image in normal 3D space, I had to scan the image; translating row and column pixels into (x,y,z) coordinates for the cube, culling pixels outside the cube and then calculating the furthest pixel at each of these points. Then x corresponds to red, y corresponds to green and z corresponds to blue. This involved quite a number of mistakes! To make the 3D calculations simple I generated an orthogonal projection where z=column*4/5+row*3/5; which is essentially a 3:4:5 triangle and avoids having to compute floating point maths nor square roots. The hacky calculations work as follows:

First we want to transform from (column, row) space (the screen coordinate) to (x,y,z), where y is down/up and z is depth. (0,0) is easy, it's (0,0,0) and any coordinate along the c axis is easy it's (c,0) => (c,0,0). As we go up rows, the beginning of the cube starts slightly further to the right and because of the projection, we know that (c, r)=(4,3) is also easy, it's (0,0,5). Similarly, any ratio where r=3c/4 is also (0, 0, 5c/4). When we're to the left of that axis, we're part of the left plane, so we need to draw that, and then the column calculation is the smallest, because e.g. (0, r) => (0, 0, 5r/3) > (0,0,5c/4) since 5c/4 is 0, but when we're to the right of that axis, the row calculation is the smallest. The back face is determined by the maximum depth of 255, so we simply limit the depth to that to generate it. (x,y,z) then map directly to (r, g, b).

In the end, I generated this Colour cube on the emulator:


It looks fairly smooth, but we can see some banding. Real-life images aren't smooth, so they don't tend to exhibit the same kind of artefacts.

Conclusion

Early, colour microcomputers had many compromises because of speed and memory limitations. Many of them used Palettes (Atari 8 and 16-bit, Acorn BBC and Archimedes, Macintosh II, Apple ||gs, PC EGA and VGA modes, the Amiga..) to provide a wider colour range than possible given the number of allocated bits per pixel and some used other tricks such as colour attributes which allowed a full colour range across the screen with a low colour boundary resolution (e.g. VIC-20, ZX Spectrum, Commodore 64, also Amiga HAM mode). As frame buffer memory approached 64kB or above, during the late 80s, it became possible to provide passable 8-bit true colour video in home computers. The early colour Macintosh II, PC MCGA and Archimedes computers fall into this category. They all use palettes, except that the Mac II and MCGA mode have 256 entries, each of 24-bits (or 18-bits in the case of MCGA).

The Archimedes A3020 inherits its graphics from its Atari ST / Commodore Amiga era incarnation with a limited 16 entry x 12-bit palette and a cheap hack to support a 'true' colour mode. The alternative, a proper 256 entry palette would have required a relatively costly 384 bytes of Palette RAM (+18K transistors) or a late chip redesign and a later or more expensive release for the integrated ARM250 chip[1].

Acorn's tendency to be technically pedantic I think, is what lead them to claim this mode is really 64-colours + 4 tints rather than a decent approximation to RGB332 from 16-base colours + 16 palette entries. RGBT2222 has some advantages, but RGB332 (really BGR233) makes most sense as a colour range, because all the others lead to either greater banding or a less coherent relationship between pixel bits and primary components. It turns out that it's possible to achieve a reasonable approximation to RGB332 on an Archimedes.

Notes [1]: ARM250 die from the Microprocessor Report linked earlier. The ARM CPU itself requires 29K transistors, so adding 18K transistors to VIDC would have resulted in a notable increase in size and cost for the 100K transistor, $25 chip.