Saturday, 17 May 2025

SIBlings! 80386 Addressing Modes Simplified

I can't believe it's taken me over 30 years to get around to actually learning the 80386 addressing modes. So, long in fact that the x86 architecture has now been obsolete for about 20 years (slightly earlier for the original AMD64).

This is a blog post about how to make sense of the x86 addressing modes, because it's really not that complex if it's presented in a clear way. You can skip ahead to the end, but first a bit of background on my 8086 coding experiences!

I used to do a significant amount of 8086 programming in the early-mid 1990s, when I was working at Micro Control Systems in Sandiacre, Nottingham. I was assigned to the solid-state storage products (called Silicon Disk) which implemented battery-backed SRAM, EPROM and Flash expansion cards for PC compatibles that emulated bootable hard disks via an Int13h interface and a boot rom. They were versatile in one sense, because you could combine different storage media on a single card or combine multiple cards to make a larger disk - even so, the disks were small by modern standards: a maximum of 3MB per full-length ISA card.

The EPROM and Flash disks were fairly rudimentary though: the user would have to first erase all the storage (using UV in the case of EPROM and using a software erasing tool for Flash); and then copy the data they needed to the Silicon Disk. The firmware hijacked the MSDOS INT23h(?) interface so that writes to the Silicon Disk used our code. Files were written directly to the non-volatile storage, but the firmware kept the FAT table in the PC's RAM so that they could be updated multiple times. Finally, the user was expected to close the disk, which caused the FAT table to be written to the disk properly, along with a boot ROM image so the PC could boot up from the device.

This meant that people were able to develop a full, solid-state PC. Of course, a PC with a read-only disk isn't terribly useful, so most EPROM or Flash disks would also have at least one bank allocated to battery-backed RAM. And in this sense the PC became a large microcontroller with up to 2MB of EPROM/Flash and 1MB of a RAM disk.

All the firmware was written in 8086 code which meant I ended up being pretty familiar with 8086 coding. Coming from a 68000 background that was fairly disappointing, but I learned to make decent use of the CPU. Eventually I persuaded the company that we could improve development time by only having to write the core routines in assembly, while the rest of it could be written in 'C'.

I rarely had to write in 80286 assembly - it was basically the same as 8086 programming with a few more instructions, pusha and popa being the most useful. We never had to write 80386 code, so I never had to learn that and my embedded programming jobs after Micro Control Systems never required it either.

But occasionally I'd come across 80386 code and realise that some of the basic stuff had changed: in i386 mode, addressing modes are more flexible and you can scale index registers so that the CPU can direct address 16-bit or 32-bit arrays. It's possible to read the code OK, but not write it unless you know what the constraints on the address modes really are.

So, finally I've had a go at trying to understand them and it turns out, it's not very complex.

Recap: 8086 Addressing Modes

The x86 series has a byte-oriented instruction set, which means it's just a series of 1 to however many bytes regardless of whether you're dealing with the original 16-bit 8086 or a 64-bit Core i7. Many instructions consisted of a specific initial byte followed by an effective address byte which told the CPU where to find the memory location (or register) to obtain the source or destination data. This was called the MOD:REG:R/M byte. In some cases this byte is sufficient, but in other cases, the MOD bits would indicate 8 or 16-bit literal offsets would then follow and these values were added to whatever memory location was indicated by the R/M bits. The meaning of the R/M bits themselves depended on the MOD bit value too, and could either be one of the 8, 8-bit registers (or 16-bit registers if it was operating on 16-bit data); one of the 4, 16-bit registers that could be used for indexing: BX, BP, SI and DI or a restrictive combination of a pair of those registers: BX+SI (or DI) / BP+SI (or DI).

In addition, although SP addressed memory (because it was the stack pointer) it wasn't possible to index via SP; instead the convention was to use BP to point to a frame of data in the stack and it was possible to use that with an offset. Intel did that, because frame pointers were a fairly common convention for Pascal in the 1970s when the 8086 was designed.

So, it was all pretty restrictive: only 3 address registers were generally available and indexing data with a computed offset was even more limited: only BX+SI (or DI) being the useful mode (because normally you wouldn't stick an entire array on the stack). Still experienced 8086 programmers were used to juggling registers in functions so that the right address registers just so happened to be available when the programmer needed them. It was slow, tedious, but surprisingly efficient.

By comparison, the 68000 was a delight, because you could use any one of 8 address registers (A0..A7) and index them directly; or with a post-increment / pre-decrement (directly implementing 'C's ++ and --); or with a 16-bit displacement or 8-bit displacement and a second register which could be any of the 16 registers D0..D7, A0..A7 treated as 16-bit or 32-bit offsets). Very flexible.

The whole set of addressing modes can be summarised below:

MOD	R/M:	000	001	010	011	100	101	110	111
00		[BX+SI]	[BX+DI]	[BP+SI]	[BP+DI]	[SI]	[DI]	disp16	[BX]
01	disp8	[BX+SI]	[BX+DI]	[BP+SI]	[BP+DI]	[SI]	[DI]	[BP]	[BX]
10	disp16	[BX+SI]	[BX+DI]	[BP+SI]	[BP+DI]	[SI]	[DI]	[BP]	[BX]
11	(reg:)	AL/AX	CL/CX	DL/DX	BL/BX	AH/SP	AH/BP	AH/SI	AH/DI

This means there are 25 unique addressing modes on the 8086, ignoring the register mode, as it's not addressing memory.

80386 Addressing Modes

Intel could have kept the same set of addressing modes for the 32-bit 80386, but computer architecture design had advanced between 1978 and 1985 when the 80386 was released. Firstly, the 68000 CPU with its more flexible addressing modes, represented a significant amount of competition, primarily because it was already 32-bit and used for high-end workstations and secondly, RISC processor designs were starting to emerge and these showed that simple addressing modes were used most of the time.

Therefore, the i386 took the drastic step of changing the addressing modes in its 32-bit mode. Instead of the double-index register modes and single-index register modes being allocated to the R/M field, only a wider set of single-index register modes were allocated; and where SP would be being used as an index register a second address extension byte was added, the SIB byte. All the SIB byte does is provide a pair of 3-bit index register fields and a 2-bit scaling field for the first index register. These fields can be basically mixed and matched.

MOD	R/M:	000	001	010	011	100	101	110	111
00		[EAX]	[ECX]	[EDX]	[EBX]	[Ix*n+ Base]	disp32	[ESI]	[EDI]
01	disp8	[EAX]	[ECX]	[EDX]	[EBX]	[Ix*n+ Base]	[EBP]	[ESI]	[EDI]
10	disp32	[EAX]	[ECX]	[EDX]	[EBX]	[Ix*n+ Base]	[EBP]	[ESI]	[EDI]
11	(reg:)	AL/AX/ EAX	CL/CX/ ECX	DL/DX/ EDX	BL/BX/ EBX	AH/SP/ ESP	AH/BP/ EBP	AH/SI/ ESI	AH/DI/ EDI

SIB

Ix:3	Scale:2	Base:3
EAX	*1	EAX
ECX	*2	ECX
EDX	*4	EDX
EBX	*8	EBX
(none)		ESP
EBP		*
ESI
EDI

(* if Mod=00 and Base=* then the addressing mode is disp32[Ix*n], i.e. a 32-bit displacement and a scaled index register without a base register).

Some of the SIB encodings overlap with existing addressing modes. E.g. Ix=No Index and Base=EAX..EBX will overlap with R/M= the same Base register. Also, Mod=00, Ix=Index, n=1, Base=* overlaps with disp32[R/M=Ix].

This means that if we count addressing modes on the 80386 the same way we count them on the 8086, we have 7*3+1 = 22 main addressing modes + the SIB addressing modes * 3. There are 8*4*5=160 SIB modes, giving another 480 modes + the special '*' mode, giving a grand total of 503 addressing modes (most of which are 2 bytes, yet these are also the least frequently used ones).

A programmer, of course, could restrict themselves to only using the same subset of addressing modes available on the 16-bit x86 CPUs, by merely substituting 32-bit index and base registers. This would have the same syntax, but a longer encoding for all the Index + Base addressing modes.

Conclusion

I did a lot of 8086 assembly programming in the 1990s, because developers had to do more assembly, because CPUs were slower and compilers were poor. However, the 16-bit x86 CPUs were also simpler devices.

I have never had to do i386+ programming, as embedded programming moved away from x86 and shifted away from assembly, but I was always intrigued by it. The hardest bit would always be the new addressing modes, but every time I read up on them, it just seemed like more effort than it was worth. Finally, after about 30 years I took it seriously and discovered they're more simple than the descriptions I've seen. So, this blog post covers my new understanding.

Of course, the 32-bit Intel era has been over, for more than 10 years, even though Intel is unable to delete it from its CPUs! Soon the Intel era itself might be over as ARM CPUs (and perhaps RISC-V CPUs) overtake its performance at all scales.

Tuesday, 21 January 2025

Burn's Night Is The Coldest

In winter I usually mark off 4 dates as the yearly cycle starts transforming into a more positive outlook. I call these 'Milestones'.

Milestone 1

This is the earliest evening. Most people are unaware that evenings start getting later, before the shortest day. In 2024, evenings in the UK started getting later from December 12.

Milestone 2

This is the shortest day, the winter solstice. Everyone knows about this, however, even after this day, the mornings are still getting later; it's just that the evenings are getting later quicker than the mornings are, so the days start getting longer.

Milestone 3

This is the latest morning. Again most people are unaware of this, but it happens right at the end of the year. In 2024 it happened around December 30 or 31.

Milestone 4

This is the coldest day of the year on average and the topic for this post. It's difficult to calculate this date, because daily temperatures vary wildly from day to day and also across years for equivalent days. Nevertheless, it's fairly easy to see that after the days start getting longer, they continue to also get colder for a while. This is because other environmental factors such as cloud cover, heat loss from the ground, air temperatures or the Jet Stream can continue to drive temperatures on average down faster than the sun adds energy to the atmosphere and land.

Anecdotally, I used to figure the coldest time of the year was at the end of January / beginning of February, so I set Milestone 4 on January 31. Later, however, I thought to myself that perhaps it's mid-way between the winter solstice and spring equinox, because all of these diurnal patterns tend to follow year-long sine waves.

Winter solstice is on December 21, and Spring equinox is on March 21. So, that's 10 days in December + January (31 days) + February (28.24 days) + 21 days of March. This is 10+31+28.24+21=90.24 days. Calculating Milestone 4 after 90.24/2=45.12 days, which, given 10 days at the end of December + 31 days in January leaves 4.12 days. So for the past several years I've been setting it on February 4.

But neither of these techniques are based on actual evidence. What if it's not symmetrical as I've assumed? What if temperatures simply aren't shifted mid-way? To figure that out I need real data.

Real Data

I was involved in an on-line, climate discussion trying to work out how temperatures had changed in the UK over the past decade or so and found an open Statistica page on it:

You can hover over the months to get the actual figures, downloading the raw data requires a subscription. It turns out that for nearly all the months in the year there's an upward trend, but for January there's no observable trend.

But as I was looking at it, I realised that I could use my new understanding of Fourier transforms to obtain a better approximation for the coldest day.

The Winding Principle

At University we covered quite a lot of math in the first year including Fourier Transforms (and Laplace Transforms). I was able to do the math, but I didn't remotely understand how one can isolate the set of harmonic frequencies from waveform data. It took a Hackaday article to help me. I can't do justice to the article, nor the associated animated video explainer, but I can précis the idea as far as the fundamental harmonic goes, which is all we care about here.

Every complex, repeating, sampled waveform can be constructed from a set of sine waves at 1x, 2x, 3x.. the fundamental frequency up to half the sample period just added together. However, if I want to isolate the fundamental frequency that turns out to be pretty easy. All you do is multiply each sample by the sine of the corresponding angle within the waveform and add the results together. If the fundamental is present, then its amplitude at any point will cohere with the sine wave itself, but higher frequencies will 'disappear', because their positive phases will end up getting multiplied by both the positive and negative phases of the reference sine wave. For example, consider an 8-sample wave containing a fundamental and 1st harmonic:

Sample#	1	2	3	5	6	7	Total
Ref Sine	0.707	1.000	0.707	-0.707	-1.000	-0.707	0.000
Fundamental	0.573	0.810	0.573	-0.573	-0.810	-0.573	0.000
^^^ x Ref Sine	0.405	0.810	0.405	-0.405	-0.810	-0.405	3.240
1st Harmonic	0.210	0.000	-0.210	0.210	0.000	-0.210	0.000
^^^ x Ref Sine	0.148	0.000	-0.148	0.148	0.000	-0.148	0.000

To fully calculate each harmonic you need to consider the phase of each harmonic. That's because a sine wave at any given phase can be generated by a pair of sine + cosine waves with two respective amplitudes; thus the above technique will only recover the sine wave component. For example, if the Fundamental was shifted by +90º, then the Fundamental * the Ref Sine would still end up with a total of 0, but here the wave * a Reference cosine wave would have an amplitude of 3.240.

Finding The Phase

Therefore, a Fourier analysis of the fundamental can tell us not only its amplitude, but also its phase. And it turns out we can obtain an accurate phase from relatively few samples. The phase is simply obtained from the vector obtained from ∑ waveform data * the Reference sine wave on the x axis ∑ waveform data * the Reference cosine wave on the y axis.

This means that even though all we have are monthly values for the temperature data, we can calculate the actual minima, zero-crossings and maxima at a much higher resolution.

The phase calculated is always relative to the reference angles. For example, if we started the reference angle at 30º and the samples were a sine wave starting at 30º, then the phase would still be 0º. If the reference angle was 0º, and the samples were a sine wave starting at -90º, then the relative phase would be reported as 90º, because the zero-crossing for the sine wave would be at 90º.

The phase therefore tells us the average temperature day and the minimum temperature day will be 90º earlier (or 91.31 days earlier). For UK temperatures, the minimum temperature is therefore reported as Jan 25.5. Ironically, this means that Burn's Night is the coldest.

There's one more aspect of the model that's worth mentioning, which is that the reference phases aren't equidistant, because the months don't all have the same number of days in them (though it's close). Therefore, in this calculation, the reference dates are taken from the mid-point of each month, on the basis that the average temperature for that month represents the temperature half-way through the month.

Minimal Temperature

Temperatures:

Wednesday, 8 January 2025

Basic Blitz: A Surprisingly Addictive VIC-20 Remake

The game Blitz was written and self-published by Simon Taylor for the unexpanded VIC-20 in 1981, then later sold to Mastertronic.

https://www.eurogamer.net/lost-and-found-blitz

I always thought it looked like a game that must have been written in Basic, but I never got around to testing that until the beginning of 2025.

So, here's my version in all its glorious 64 lines of code!

Mine seems to be based on the later Mastertronics' version, because my plane is just one graphic character instead of 2 or 3 and my buildings are multicoloured instead of just black. Multi-coloured buildings adds to the fun, given most actual buildings are grey.

Also, mine doesn't speed up during each flight; it does get faster per level while the number of buildings it generates also increases by 2. My current high score is 533. Game control is pretty simple: you just press 'v' to drop a bomb as the ship flies across the screen. Only one bomb can be dropped at a time.

Design

Enough of the gameplay, let's discuss the software design. The outline of the game is pretty simple:

Line 5 reserves memory for the graphics characters then calls a subroutine at line 9000 to generate them.
Line 7 defines a function to simplify random number generation.
Line 8 is a bit of debug, see later.
Line 9 resets the high score. So, this only happens once.
Line 10 starts a game with a width of 5 (so 5x2=10 buildings are generated) and a delay of 100 between each frame.
Lines 30 to 60 are the main loop of the game. It really is that tiny. The loop terminates when the plane lands or hits a building. Within that the plane is drawn (by displaying it in its next position then erasing the previous position to avoid flicker).
Bomb handling is done in lines 45 to 50, but the explosion is handled in lines 200-300.
End of game is handled in lines 66 to 80 including displaying "Landed" or "Crashed", updating the high score and handling the user wanting to quit.
Line 99 resets the graphics characters back to the normal character set so that you can carry on editing it.
The subroutine at line 100 performs the equivalent of a PRINT AT.
The subroutine at lines 200 to 250 handle a bomb hitting a building (a random number of floors are destroyed).
The subroutine at lines 8000 to 8070 generate a new level based on W, the width of the cityscape.
The subroutine at lines 9000 to 9010 generates the graphics characters and sets the sound level to 5.
The data from lines 9012 to 9090 are the graphics characters themselves, in the sequence: 'blank', 'solid square', 3x building types, 2x roofs, plane, grass.
The subroutine from lines 9500 to 9520 wait for a key to be released, then pressed, returning the key in A$.

Graphics

Because VIC-20 graphics are weird, programmers end up with bespoke graphics routines, so it's always worth discussing them. Firstly, VIC-20 graphics are tile-based, somewhat like the Nintendo Entertainment System. Video memory contains character codes between 0 and 255, and each character code points to an 8x8 bit pattern at CharacterMemoryBaseAddress+(CharCode*8). Usefully, the base address for the character bit patterns (and the video base address too) can be set by poking 36869. That base address can be set to RAM (which gives the programmer 256 tiles to play with), ROM (which is the default and provides caps+graphics or a caps+lowercase+some graphics option) or can be made to straddle both (which gives the programmer up to 128 tiles to play with + an upper case character set). This is the case even though the user defined graphics (UDGs) have addresses below 8192 while the ROM tiles are above 32768, because of the way the 14-bit VIC-chip's address space is mapped to the VIC-20's, full 16-bit address space.

In practical, unexpanded VIC-20 applications, programmers will want to use as few UDGs as possible to maximise program space while retaining much of the conventional character patterns. In Basic Blitz we therefore set the graphics to straddle mode (value 0xf, giving a CharacterMemoryBaseAddress of 0x1c00) which means characters 0..127 are in RAM and 128..255 are in ROM.

Intuitively, you might imagine that you'd want to start using tile 0 first, but that would waste of most of the tile space, so in fact we always count the UDGs we need backwards from tile 63, because tiles 64 to 127 overlap with video memory itself by default (and are therefore unusable!). Also, because the VIC-20 ROM characters aren't in ASCII order, and amazingly enough don't include the filled-in graphics character I have to provide that. When Basic Blitz is run, it first shows the entire usable character set.

I added this as a bit of debug, because I initially wasn't sure the ROM characters would print out OK. Also, I then made it print Hello in red to test both my PRINT AT subroutine and embedded colour control codes.

Graphics characters can easily be printed, because they're the normal characters '6, '7', 8', 9', ':', ';', '<', '=', '>', '?'. Normal text can be displayed, but you have to force 'inverse' characters which is achieved by preceding each print statement with <ctrl>+9 and ending with a true character <ctrl>+0.

Colours

Colours on a VIC-20 are strangely limited. There's a block of colour attribute memory, one location for each video byte, but each one is only 4 bits, which means you can only select an INK colour for on pixels. The PAPER colour is global, defined by bits 4..7 of 36879. The VIC-20 partially gets around this by normally making characters 128 to 255 inverse characters, but also by defining bit 3 of 36879 as normal or inverse mode.

The upshot though is that with the ROM character sets you can choose a common PAPER colour with any INK, or the common PAPER colour as INK, with any INK colour as PAPER. But when you select the character set to straddle RAM and ROM, you can only choose any INK colour + the common PAPER colour.

Hence in Basic Blitz, the background is white (as that seems most useful) and I have to define a UDG just so that I can get a filled in green character for grass with a building on top.

Sound

BASIC Blitz, sound is pretty simple. The initialisation routine switches audio on to level 5 (POKE 36878, 5); and leaves it there. There are 3 voice channels, which are individually switched on if bit 7 is set. In practical terms, each voice has a range of about 2 octaves, the first one having values from 128 to 65; then the next octave from 64 to 33. Beyond 32, the frequency ratio between each note is 1.03 to 1.06, close to that of a semitone 1.059 making most note intervals unusably out of tune.

The plane makes a drone sound using the lowest pitch audio channel (address 36874) OR'd with the bottom 4 bits of the jiffy clock at PEEK(162).

The bomb uses the high octave channel (at 36876) just generating an ascending tone. If the bomb hits a building it's silenced and the noise channel with a fixed low pitch of 129. The important thing, finally is to turn off all the sounds when they're done, by poking the channels with 0.

Playing The Game

You can run this VIC-20 Javascript emulator and type in the code (if the keyboard mapping allows it):

https://nippur72.github.io/vic20-emu/

I've found this emulator is better than the Dawson one for .prg files. Here's how to load the .prg on a desktop/laptop. First download the BasicBlitz.prg from my Google Drive. Then drag the file from wherever you downloaded it from to the emulator in the browser. It will automatically load and run!

However, it's also useful to be able to type in code directly for editing, debugging and other stuff.

The keyboard on my MacBook M4 doesn't map correctly to VIC-20 keys, because the emulator does a straight translation from character codes to VIC-20 keys rather than from key codes. This means that pressing Shift+':' gives you ';' on this emulator rather than '[' as marked on a VIC-20 keyboard.

Mostly this makes typing easier, but the VIC-20 uses a number of embedded attribute key combinations. Basic Blitz doesn't use many, here's how to type what it does use, it isn't easy!

In Chrome, you need enable console mode, by typing function key F12. Then tap on the Console tab. In Safari, you need to choose Safari:Settings... then Select the 'Advanced tab'; and click on "Show features for web developers" at the bottom. Then the "Develop" menu appears on the menu bar and you can then choose Develop:Show JavaScript Console.

So far so good. Now, you can type most of the text as normal, but whenever you need to type a special code, type pasteChar(theCode) in the console followed by Enter (e.g. pasteChar(147) for the clear screen code). Here are the codes you'll need:

Inverse 'R' => 18. This is for Reverse text, which ends with inverse nearly underline => 146.
Inverse '£' (Red) => 28.
Inverse '┓' (Black) => 144.
Inverse 'S' => 19 (this is the home code).
Inverse heart => 147 (this is the clear screen code).
Inverse up-arrow => 30 (this is green).
Inverse 'Q' and inverse '|'can be typed directly just using the down cursor and left cursor respectively.
The codes in line 8045 are more colour codes used for the buildings. They are 144 (Black), 28 (Red), 159 (Inverse filled diagonal=cyan), 156 (checkered-black character=purple), 30 (inverse up arrow = green), 31 (inverse left arrow=blue), 158 (inverse 'π' = yellow).

Conclusion

The original VIC-20 Blitz program, though derivative in its own way, is so simple it could have been written in BASIC, as this version proves. The arcane design of the VIC-20 hardware and its lousy BASIC implementation means there's a lot of subtle complexity even in a simple game. Finally, although there are many emulators for the VIC-20, both the Javascript implementations I know of have limitations and bugs which make distributing this game and/or modifying it non-trivial.

Tuesday, 31 December 2024

Wobbly-Blue: An Optical Illusion on a ZX Spectrum And VIC-20

Introduction

I recently came across an interesting optical illusion whereby a speckled-blue sphere on a random checkerboard pattern will appear to wobble relative to the checker-board if you move your head. When viewed on a mobile device, as you can move the device itself and the effect is even more pronounced.

I figured I could write a simple version of a program that generated it for the ZX Spectrum, and this is the result (you need to make the image occupy a fair amount of your field-of-view):

The program is fairly small:

The ZX Spectrum has character-level colour resolution, but because the blue sphere is surrounded by a black border, it doesn't cause any clashes. I originally wanted to produce a sphere where the blue coverage in the centre was obviously larger than at the edges, but it turned out that simply taking the sine of a random angle creates a distribution that looks spherical, because the rate of change is greatest near 0º so the dots are spread out more there and concentrated near the edges. If I let it run for about 2000 points it'd probably look more prominent.

Unexpanded VIC-20 Version

Quite frequently I like to create Unexpanded VIC-20 conversions of ZX Spectrum programs, because they're fairly contemporary machines with some similar characteristics, but the VIC-20 is more challenging to program due to a lack of support in its version of BASIC.

Here's the VIC-20 version:

The VIC-20 version is full of POKES to do what the ZX Spectrum version can do with PLOT, INK and PAPER. Also, the character set is squished and there's only 22 characters per line. But it's the techniques needed to perform hi-res graphics on a VIC-20 (particularly an unexpanded VIC-20) that's the real challenge.

Firstly, the VIC-20 can only really do hi-res graphics by modifying a character set of up to 256 characters. So, if you fill the screen with unique characters and update the pixels in each character then a full bitmap display is possible. However, on a standard VIC-20 screen there are 22 x 23 characters = 506 character positions which is far more than the number of characters in the character set! The VIC-20 'fixes' this by supporting double-height characters of 16-rows each, which means you only need 253 characters to fill the screen.

The second problem is that an unexpanded VIC-20 only has 3581 bytes free when you turn it on, and 253 double-height characters + the 506 screen bytes would need 4554 bytes, which is clearly more than what's available.

However, in this case, we don't need to fill the whole screen with bitmapped graphics, only the sphere in the centre! And in fact I would only need 172 single-height characters if I also reduced the screen size to 20x20 characters! 172 characters needs just 1882 bytes including the screen bytes. This leaves just over 2kB for my program!

How is this done? Well, I could work out which characters in the centre will be filled with the sphere's pixels and print unique characters for them, but it's easier to use the kind of tile-allocation technique you might use for a video game. You use some characters as background (in this case we only need one: character 255, which is filled with 8x $ff's).

Then whenever you want to plot properly on the screen you find out which character is being used at the character location at (INT(x/8),INT(y/8)). If it's not 255, then you can then look up the character in the character set memory; select the right row (Y & 7) then set the right pixel (128>>(X & 7)). Otherwise you allocate the next character code (denoted by UG% in the listing) and then fill in the pixel as before. It doesn't matter if the characters on the screen aren't allocated in order, because one simply gets the correct bitmap address from the character code itself.

The resultant program is similar to the ZX Spectrum version, except that a couple of subroutines are added. Line 1 allocates space for the new character set by setting the end of BASIC (and string stack) to 6143. Then 6144 onwards can be used. Lines 500 to 540 create the initial graphics setup. The screen size is set to 20x20 instead of 22x23. PAPER is set to black with the screen in non-inverse mode; the new character set is filled with 0s; the screen is filled with character code 255 and character 255 is filled with 255s too (all done with one loop). Finally, UG% is set to 0, as that's the first character code we'll allocate.

Lines 1000 to 1040 are very similar to the ZX Spectrum version except it works by setting the INK colour of each character to white or black for each checkerboard location.

Lines 200 to 220 are the plot routine discussed above. It also has to POKE colour memory (from 38400 onwards) to make the INK colour at that location Blue (colour 6). Finally we can run the program:

Pixels on a ZX Spectrum are square, but on a VIC-20 they're squished too, so the sphere is oblate. Total video memory is 2048b, and including 400b of screen memory that's room for 206 bitmap chars. So, I could have increased the checkerboard resolution by allocating 16 characters, taking the total up to 188.

What Causes The Effect?

I did a bit of searching for how the illusion works, but all I found was articles on how colour aberration can cause red or green colours to stand out in a sort-of 3D effect. But here's a simple theory. Human eyes are 10x less sensitive to blue than other primary colours and much more sensitive to luminance than colour. So, it's likely that the brain needs to do more processing for colour than for monochrome images; more processing for blue and more processing for sparse images (like this sphere).

That would make sense: the least sensitive blue cones might take longer to fire than the more sensitive rods (because it takes longer for enough energy to make them fire). There might be more neurone layers for making sense of colour image; more neurone layers for making sense of a sphere than a set of blocks and finally more neurone layers for making sense of a sparse image than a solid one.

All the extra processing causes a lag in processing; which means that when you move your head (or move the screen); the checkerboard pattern moves quickly in your field of view, but your other neurones take time to reconstruct the sparse, blue, sphere.

Tuesday, 19 November 2024

The Mythical Mac 256K

Introduction

This MacGUI blog post covers the boot process for the early System Software on a Mac 512K (or Mac 128K). In part of it, he talks about the Mythical Mac 256K.

At the beginning of the boot blocks are several stored parameters. The version number is two bytes. Another two bytes hold flags for the secondary sound and video pages. Then come a series of seven, 16-byte long file names. These are the names of the System resource file, Finder (or shell), debugger, disassembler, startup screen, initial application to run, and clipboard.
Following the names is a value for how many open files there may be at once, and the event queue size. Then come three, 4-byte values which are system heap size for a Mac with 128K of RAM, 256K, and 512K of RAM, respectively. If you could find me a Mac with 256K of RAM, I'd sure be impressed. Wink
<snip>
If you landed from planet Krypton with a Mac 256K, your system heap size would be $8000 (decimal 32,768).

Disappointingly, as he also covers in this blog post, the 64K ROM only checks for the 128K or 512K Mac models and the 128kB model memory size is defined explicitly by the line:

4002E4: LEA $0001FFFC,A1 ;128K RAM

The tests work by exploiting incomplete memory decoding on early Macs. A Mac 128K's memory appears to be duplicated 4 times (actually it's duplicated 32 times all the way up to 4MB); whereas a Mac 256K is duplicated twice and a Mac 512K's memory doesn't get duplicated at all (up to 512K).

First (Failed) Attempt

I took a 64K Mac ROM and changed it to $0003FFFC, then added 2 to the 3rd byte of the ROM checksum to adjust it, which is in the first long word of the ROM. Then I added a miniVMac option for a Mac 256K, hacking about with a few scripts and some of the source code. I thought I'd correctly achieved a 256kB Mac, but I hadn't - for my 19kB MacWrite test document I had misread the MacWrite splash screen, which says how much memory is free, and used. I'd mistaken 'free' and 'used' for each other, because the values were fairly plausible for a Mac 256K.

PICO-Mac

I left the problem for quite a while after that, not knowing how to test it, but also because I lost the HDMI cable adapter for the Raspberry Pi 5 I was developing it on. Then Matt Evans referenced my post on his Axio.ms blog when creating a Raspberry Pi Pico based Mac emulator. I myself have another design for a PICO-based Mac, which would be able to fit the entire 68K emulator in its 16kB of SPI flash cache and still run at full-speed, allowing for a full 256kB of RAM on a standard PICO. This is for the future, if ever.

PCE-Mac Emulator

The key to being able to solve the problem is to use the PCE-Mac emulator, which has a step-time debugger, allowing me to figure out what really went wrong.

What it looks like is this: Apple originally wanted to be able to support a 128KB, 512KB Mac, or a 256KB Mac at least for development purposes. Perhaps some of the engineers wanted to be able to bolt on another 128KB just for a bit more space. That made sense in 1983, because 256kBit RAM chips didn't yet exist and a 512kB Mac would need 64 RAM chips; whereas a 256kB Mac would only need 32.

However, either they made a mistake in the ROM, which they tried, but failed to correct in the disk Boot block, which prevented them from doing that. The ROM itself is at fault, because a 256kB Mac will look like a 512kB Mac to the ROM. Here's the ROM code that works it out:

FindMemSize:		;
4002DE:	LEA $7FFFC,A0	; 512K RAM
4002E4:	LEA $1FFFC,A1	; 128K RAM
4002EA:	CLR.L (A0)+
4002EC:	TST.L (A1)+
4002EE:	BNE.S 4002F2	; branch if 512K
4002F0:	MOVE.L A1,A0	; this is a 128K machine
4002F2:	MOVE.L A0,$0108	; set MemTop

A0 is set to point to the last word of a Mac 512K and A1, the last word of a Mac 128K. Initially all the RAM is set to $ffffffff. At $4002EA, the last word of a 512K machine is cleared and then the last word of the Mac 128K is tested. On a Mac 512K it won't be clear, so it won't be 0, so it'll branch and A0 will be used as is to define MemTop. On a Mac 128K it will be clear so it will be 0. It won't branch and so A1 (=128K) will be copied to A0 and then used for MemTop.

However, on a Mac 256K, the long word at $1FFFC also won't be cleared, so the ROM will think it's a Mac 512K. Nevertheless, it won't work to hack the ROM to make line 4002E4 LEA $3FFFC,A1 for a 256kB Mac, because:

It would mess up the ROM test for a 128KB Mac.
It turns out the Boot block on the Floppy Disk does its own test, explicitly for a Mac 256K and does the wrong thing if MemTop is set to 256K.

This is the critical part of the boot block:

0001008A:	2238 0108	MOVE.L $0108, D1
0001008E:	4841	SWAP D1
00010090:	0C41 0004	CMPI.W #$0004, D1
00010094:	6E2C	BGT.S $000100C2
00010096:	7200	MOVEQ #$00000000, D1
00010098:	50F9 0001 FFF0	ST $0001FFF0
0001009E:	42B9 0003 FFF0	CLR.L $0003FFF0
000100A4:	4AB9 0001 FFF0	TST.L $0001FFF0
000100AA:	6716	BEQ.S $000100C2
000100AC:	7002	MOVEQ #$00000002, D0
000100AE:	4840	SWAP D0
000100B0:	D1B8 0108	ADD.L D0, $0108
000100B4:	D1B8 0824	ADD.L D0, $0824
000100B8:	D1B8 0266	ADD.L D0, $0266
000100BC:	D1B8 010C	ADD.L D0, $010C
000100C0:	7204	MOVEQ #$00000004, D1

Initially it checks for a 512kB Mac at $10090 and if it isn't falls through to the clever, but weird section at $10098 to $100A4. This code sets the top of RAM for a 128KB Mac to -1, then explicitly clears the top of RAM for a 256KB Mac and if the address at the top of a 128kB Mac has become 0, then it knows that the address range from $20000 to $3FFFF mirrored $00000 to $1FFFF and therefore it's a 128KB Mac. It will then execute the BEQ skipping the underlined code which adds $20000 to all the key system variables, starting with MemTop.

The upshot is that the boot block expects a Mac 256 to be set up as a Mac 128 in the ROM; and then the system variables are adjusted. And of course the ROM won't do that: a Mac 256 is set up as a Mac 512 in the ROM so the BGT instruction at $10094 skips the whole routine. They therefore could have fixed the boot block to check MemTop for a 512kB Mac and if it's really a Mac 256, subtract 256K from all the system variables.

Because they didn't do that, the ROM has to be changed.

A Correct ROM Fix

The answer is to save bytes on the LEA instructions. It's possible in some cases to generate large values in data registers by using the sequence MOVEQ, then SWAP. It takes just 2 words instead of 3 and because 128kB, 256kB and 512kB all have 0s in the bottom 16-bits, we can use this technique (the boot block uses this technique to test for 256kB too!). However, we need a data register to do this and I can't be sure it won't corrupt a useful data register value unless the execution path later overwrites a data register.

Fortunately, it does. At $322, there's a MOVEQ #2,d1. So, now we know we can use D1. The trick then is to set up both A1 and A0 to point to 256kB and 512kB respectively. My sequence is 1 word shorter than the original. Then, instead of clearing and testing the last words and post-incrementing to get the sizes; we pre-decrement from the sizes to get to the last words of RAM and test them. This will return the same result for both 128kB and 256kB Macs, but a different result for the 512kB Mac.

We don't need A0 and A1 now, because the actual size (for a 512kB Mac) is already in D1 so we can use our word of ROM space saved to move D1 to A0. And for the 256kB or 128kB cases, we can take the 512kB value; divide by 4 (by shifting right twice) to get 128kB, which is also a 1 word instruction substituting the MOVEA A1,A0 in the original code. This means that ROM routine now looks like this:

000002DE:	7204	MOVEQ #$00000004, D1
000002E0:	4841	SWAP D1
000002E2:	2241	MOVEA.L D1, A1
000002E4:	D281	ADD.L D1, D1
000002E6:	2041	MOVEA.L D1, A0
000002E8:	42A0	CLR.L -(A0)
000002EA:	4AA1	TST.L -(A1)
000002EC:	6602	BNE.S $000002F0
000002EE:	E489	LSR.L #$2, D1
000002F0:	2041	MOVEA.L D1, A0

We also need to make a change to the checksum again, which is a sum of the 16-bit words in the ROM. It now needs to be $28BA8FB5 instead of $28BA61CE.

Results

My 19kB MacWrite document now reports a reasonable value for a 256KB Mac:

On a Mac 512K it'd be 95%:5%. For testing purposes I wrote another program called FreeMem, which allocates 1kB blocks until it runs out of memory. The app itself uses about 260 bytes. There is a proper API for this, but I found that it wasn't reporting the value I was expecting. On a Mac 128 Freemen goes up to 75kB and on a Mac 512 it's about 430kB or more. On this Mac 256 it's 188kB:

This is only possible for a Mac 256K

In Matt Evans' blog post he said that it wasn't possible to run MacPaint on a Mac 256 disk, because it writes data to the boot blocks. I have tested this out and found that in fact it does run OK. Here's a screenshot, even though it's not actual proof since MacPaint can't tell you how much RAM is free AFAIK.

Conclusion

The Mythical Mac 256K has been talked about for a number of years and perhaps a few even existed during the early Macintosh development period. I tried, but failed to create one under the miniVmac emulator in December 2023, but after it was mentioned in Matt Evans' Pico-Mac blog and a couple more times on the 68KMLA when discussing a Pico-Mac based on the new Raspberry Pi PICO 2, I decided to explore it again. It turns out it really is possible and as far as I know, this is the first Mac 256 that actually runs, albeit in emulation!

But what's the point? Well, there's a real digital archeological purpose behind the Mac 256K, simply because the floppy disk Boot block code for System 1 up to (I believe) System 4.1 contains code for it. It seems unlikely, that someone would make the effort to create such an arcane hack if no-one considered the possibility of a Mac 256K. Nevertheless, the ROM won't allow a Mac 256K to boot properly. It is possible that a Mac 256K could have been built though, all an engineer would have to do is build a toggle-switch logically OR'd to address line A17; start up a Mac with the toggle switch connected to 5V (the Mac would then look like a Mac 128K); then flip the switch (which would enable A17, which would enable the second bank of 128K RAM) and insert a boot disk: The Boot disk would then see 256K and adjust the system variables.

The second reason is that in the 1983 to 1985 period, microcomputers with 256K of RAM were at the upper end of the market and the Mac was intended to be released in 1983 (and even before if it had been possible). The IBM PC/AT, for example was released in 1984 with 256kB as standard. A few early Atari ST's were sold with 256kB of RAM in 1985.

The third reason is that Apple already knew that the Mac 128K's RAM was limited to the point of near impracticality. MacWrite could only store a document with a few pages on it. However, a Mac 256K is far more usable. The extra 128kB of RAM can store between 16,000 to 18,000 more words in a document; enough for about 32 to 36 pages, and easily enough to store an undergraduate dissertation or a whole chapter of a book. 256K is big enough for a proper, small Pascal compiler (note: MacPascal was an interpreter); making development much more feasible.

On the downside, a Mac 256K would have needed 32 of the 64kBit chips the Mac was supplied with and this would have made the motherboard bigger, more costly and perhaps affected the shape and design of the Mac itself. The Mac 512K on the other hand, still only needed 16 RAM chips - in fact the motherboard was designed to accommodate them by simply fitting the new chips and cutting a track.

Next Steps

The 64K ROM for the Mac 256K can be downloaded from the 68KMLA thread. The next two major steps really are to use it with PICO-Mac running on a Raspberry Pi PICO 2 (which gets it closer to the hardware); and perhaps at the same time, get someone to burn the ROM and install it on a genuine Mac 128K, fitted with 256K of RAM (this is a big ask).

In the meantime I'll see if can get it to work with miniVMac and continue to work on M0̸Bius, my super-fast 68K emulator in M0+ assembler.

Sunday, 17 November 2024

Saving Europe-ASAP (Ally State Access Privileges)

In 1939, 85 years ago at the time of writing, the UK government declared war on the Nazi régime in order to defend the sovereignty of Poland, but ultimately to rescue Europe from fascism.

It was unsuccessful in the first aim, but successful in the second with the help of allies across the globe, in particular, the United States under President Roosevelt, and the set of countries that became the Commonwealth (which at the time were part of the British Empire).

Today Europe is at a similar crisis point following the insurgence of far-right political groups in Europe and the recent re-election of Donald Trump as the president of the United States.

It is already the case that his administration is demanding vassalage from European countries. Vassalage is an appropriate term, since Trump is acting like a medieval king. In this context, a number of far-right European leaders are already pledging support at a time when the most dominant EU countries, Germany and France are facing a democratic crisis.

Meanwhile, Britain, as in 1939 has never been more alone: cut-off from the EU since the 2016 Brexit referendum and now deeply at odds with a United States that considers the new Labour government to be a socialist enemy. Falling into line with Trump's demands would certainly destabilise the UK, given that high-profile Trump supporters such as Elon Musk regularly portray the country as a Police State or close to civil war.

What we need to do at this time is demonstrate solidarity with the EU, but this too is not easily possible, because the Labour government is determined to honour Brexit by not rejoining the EU. Perhaps though, given the ugency, there is another way: Ally State Access Privileges (ASAP), a temporary mechanism for supporting Europe in a new time of need.

This blog post is a rough (and very ignorant) draft proposal of the ASAP, how it operates, who is involved and its underlying purpose.

Purpose

ASAP exists specifically to prevent the EU and Europe in general from falling into the hands of the far-right. European countries have been making increasingly desperate attempts to exclude far-right parties from government as they grow in power, following the 2008 crash and subsequent economic austerity policies. These attempts could fail within the early part of 2025, a few months from the time of writing.

ASAP dissolves immediately when the threat from the far-right is over.

The ASAP serves to stabilise the economy and political composition of the EU; accelerate its environmental programme; maintain rights while radically reforming economic competitiveness.

Operation

ASAP is a temporary political alliance between member states of the EU and nations willing to support European stability. It applies to an ASAP alliance member for a 1 year duration and must be renewed every 12 months.
It confers the temporary elimination of trade barriers on conditions of EU regulation compliance, as per EU membership and includes temporary access to the single market and customs union.
It confers Freedom of movement to individuals who are part of ASAP member states.
It provides a subset of access to European Institutions:

Voting privileges to ASAP alliance members in the European Parliament. The representative block is exactly the same size as would be the case if the ASAP member was an actual EU member, but the composition of an ASAP contingent is in proportion to the composition of the legislative in the respective member state. ASAP MEPs are chosen by decree from its member state by any mechanism they see fit.
The head of state from an ASAP member is allowed on the European Council, as per EU membership.
One ASAP member per ASAP state is allowed on the Council of the European Union and European Commission. However, no ASAP member can be a European President.
One ASAP member per ASAP state is allowed as part of the Eurogroup specifically in order to temporarily influence and help align economic policy between ASAP members and the EU.
Others?

In order to combat the threat of the far-right, which is the primary purpose of EU-ASAP, media regulation is placed on a war-footing:

Access to social media dominated by the far-right is prohibited by law, where dominated means both that the content must be less than 3/5 represented by right-wing posts or reposts or contain more than 2% of far-right posts. (Is this reasonable - what's the criteria on Bluesky?).
Editorial guidelines for Printed media (and their online counterparts) are to be placed in the hands of a trust, analogous to the Guardian Scott Trust, in order to eliminate editorial influence from their owners; while retaining the same rough remit for their political flavour. The intent is to ensure freedom of expression while restricting extremism.
Extended powers for media regulators (such as OfCom in the UK) will ensure that factual errors in articles must be corrected with the same prominence given to the original article. Multiple regulatory breaches will lead to a process of mentoring by journalists from randomly chosen media outlets meeting higher regulatory standards. The purpose of this is to slow down the publication of objectively misleading articles while avoiding political bias.

Finally, the EU-ASAP commits to stripping the purpose of World War 2 from Nationalistic interpretations. Specifically, as described in the first paragraph, the allies purpose was to not to wage war on Europe and Indo-China (since they are not the enemy), but to free Europe from fascist control.

Members

In the first instance, ASAP membership should be open to non-EU European states. This includes the UK; EEA members and EFTA members. Because of the already close alignment with EU regulations this should be a rapid process. If these states fall to the far-right, their ASAP membership is automatically cancelled.

In the second instance, ASAP membership should be extended to stable states that share close democratic values as the EU, namely Australia, New Zealand, Japan and Canada. If these states fall to the far-right, their ASAP membership is automatically cancelled.

In the third instance a more limited ASAP membership should be extended to states that share common environmental goals, if their membership strategically serves the purpose of the EU and the country itself. This may include India, South Africa (and any other stable African country) and controversially, China. In the case of China, it shall include delegates from both the Communist Party of China (CPC) and Hong Kong Legislative council to provide a wider range of representation. Why China? Because the US under Trump is also determined to wage an economic war with China, so a closer association between the EU and China; with a corresponding weakening of links between China and Russia would benefit the goal of weakening the far-right.

Conclusion

The re-election of Donald Trump and encroachment of the far-right across Europe has brought us to the point of an emergency comparable with the lead up to World War 2. Therefore, radical, emergency actions must be taken in order to stabilise the continent while there is still time. EU-ASAP provides a draft of a suitable programme for a set of suitable alliance members of whom are not able to be full EU members for a number of practical and political reasons.

Note: Further edits will include suitable links.

Wednesday, 24 July 2024

Why does Sin(a)≈a for small a in radians?

Introduction

When we were introduced to calculus during 'A' levels (actually 'AO' levels), an investigation into differentiation was used to show that sin(a)≈a for small angles, if the angles were in radians.

At the time, the observation seemed fairly magical: that somehow by picking radians as the units for degrees made it work out. To me, radians always seemed an arbitrary unit, because it wasn't a whole number, but 2π for a complete circle. And the method didn't help to enlighten us, because it involved repeated calculations for sin(a+d)-sin(a) for a⟶0.

Actually, though the reason is very simple. Let's consider a full circle with a unit radius:

The distance around the circle (the circumference) is 2π and any part of that distance a around the circle is its angle in radians (since 2π = 360º). Now let's look at a right-angled triangle embedded in a small part of the arc:

The height of the triangle is h, is nearly the same as the length a of the arc around the circle for angle x. And the height of the triangle is the actual meaning of sin(a), since the length a of the arc is the same as the angle in radians, because the radius=1. If we just enlarge that part, we can see that it's very close, but not quite the same.

Therefore, to understand the equivalence doesn't require any advanced notion of calculus, it can be seen directly. Of course, understanding that ∂h=∂a i.e. when a is infintessimal is calculus.

One Week Wonder

Saturday, 17 May 2025

SIBlings! 80386 Addressing Modes Simplified

Recap: 8086 Addressing Modes

80386 Addressing Modes

SIB

Conclusion

Tuesday, 21 January 2025

Burn's Night Is The Coldest

Milestone 1

Milestone 2

Milestone 3

Milestone 4

Real Data

The Winding Principle

Finding The Phase

Minimal Temperature

Wednesday, 8 January 2025

Basic Blitz: A Surprisingly Addictive VIC-20 Remake

Design

Graphics

Colours

Sound

Playing The Game

Conclusion

Tuesday, 31 December 2024

Wobbly-Blue: An Optical Illusion on a ZX Spectrum And VIC-20

Introduction

Unexpanded VIC-20 Version

What Causes The Effect?

Tuesday, 19 November 2024

The Mythical Mac 256K

Introduction

First (Failed) Attempt

PICO-Mac

PCE-Mac Emulator

A Correct ROM Fix

Results

Conclusion

Next Steps

Sunday, 17 November 2024

Saving Europe-ASAP (Ally State Access Privileges)

Purpose

Operation

Members

Conclusion

Wednesday, 24 July 2024

Why does Sin(a)≈a for small a in radians?

Introduction

Blog Archive

About Me

Blog Links