Saturday, 17 May 2025

SIBlings! 80386 Addressing Modes Simplified

I can't believe it's taken me over 30 years to get around to actually learning the 80386 addressing modes. So, long in fact that the x86 architecture has now been obsolete for about 20 years (slightly earlier for the original AMD64).

This is a blog post about how to make sense of the x86 addressing modes, because it's really not that complex if it's presented in a clear way. You can skip ahead to the end, but first a bit of background on my 8086 coding experiences!

I used to do a significant amount of 8086 programming in the early-mid 1990s, when I was working at Micro Control Systems in Sandiacre, Nottingham. I was assigned to the solid-state storage products (called Silicon Disk) which implemented battery-backed SRAM, EPROM and Flash expansion cards for PC compatibles that emulated bootable hard disks via an Int13h interface and a boot rom. They were versatile in one sense, because you could combine different storage media on a single card or combine multiple cards to make a larger disk - even so, the disks were small by modern standards: a maximum of 3MB per full-length ISA card.

The EPROM and Flash disks were fairly rudimentary though: the user would have to first erase all the storage (using UV in the case of EPROM and using a software erasing tool for Flash); and then copy the data they needed to the Silicon Disk. The firmware hijacked the MSDOS INT23h(?) interface so that writes to the Silicon Disk used our code. Files were written directly to the non-volatile storage, but the firmware kept the FAT table in the PC's RAM so that they could be updated multiple times. Finally, the user was expected to close the disk, which caused the FAT table to be written to the disk properly, along with a boot ROM image so the PC could boot up from the device.

This meant that people were able to develop a full, solid-state PC. Of course, a PC with a read-only disk isn't terribly useful, so most EPROM or Flash disks would also have at least one bank allocated to battery-backed RAM. And in this sense the PC became a large microcontroller with up to 2MB of EPROM/Flash and 1MB of a RAM disk.

All the firmware was written in 8086 code which meant I ended up being pretty familiar with 8086 coding. Coming from a 68000 background that was fairly disappointing, but I learned to make decent use of the CPU. Eventually I persuaded the company that we could improve development time by only having to write the core routines in assembly, while the rest of it could be written in 'C'.

I rarely had to write in 80286 assembly - it was basically the same as 8086 programming with a few more instructions, pusha and popa being the most useful. We never had to write 80386 code, so I never had to learn that and my embedded programming jobs after Micro Control Systems never required it either.

But occasionally I'd come across 80386 code and realise that some of the basic stuff had changed: in i386 mode, addressing modes are more flexible and you can scale index registers so that the CPU can direct address 16-bit or 32-bit arrays. It's possible to read the code OK, but not write it unless you know what the constraints on the address modes really are.

So, finally I've had a go at trying to understand them and it turns out, it's not very complex.

Recap: 8086 Addressing Modes

The x86 series has a byte-oriented instruction set, which means it's just a series of 1 to however many bytes regardless of whether you're dealing with the original 16-bit 8086 or a 64-bit Core i7. Many instructions consisted of a specific initial byte followed by an effective address byte which told the CPU where to find the memory location (or register) to obtain the source or destination data. This was called the MOD:REG:R/M byte. In some cases this byte is sufficient, but in other cases, the MOD bits would indicate 8 or 16-bit literal offsets would then follow and these values were added to whatever memory location was indicated by the R/M bits. The meaning of the R/M bits themselves depended on the MOD bit value too, and could either be one of the 8, 8-bit registers (or 16-bit registers if it was operating on 16-bit data); one of the 4, 16-bit registers that could be used for indexing: BX, BP, SI and DI or a restrictive combination of a pair of those registers: BX+SI (or DI) / BP+SI (or DI).

In addition, although SP addressed memory (because it was the stack pointer) it wasn't possible to index via SP; instead the convention was to use BP to point to a frame of data in the stack and it was possible to use that with an offset. Intel did that, because frame pointers were a fairly common convention for Pascal in the 1970s when the 8086 was designed.

So, it was all pretty restrictive: only 3 address registers were generally available and indexing data with a computed offset was even more limited: only BX+SI (or DI) being the useful mode (because normally you wouldn't stick an entire array on the stack). Still experienced 8086 programmers were used to juggling registers in functions so that the right address registers just so happened to be available when the programmer needed them. It was slow, tedious, but surprisingly efficient.

By comparison, the 68000 was a delight, because you could use any one of 8 address registers (A0..A7) and index them directly; or with a post-increment / pre-decrement (directly implementing 'C's ++ and --); or with a 16-bit displacement or 8-bit displacement and a second register which could be any of the 16 registers D0..D7, A0..A7 treated as 16-bit or 32-bit offsets). Very flexible.

The whole set of addressing modes can be summarised below:

MOD R/M: 000 001 010 011 100 101 110 111
00 [BX+SI] [BX+DI] [BP+SI] [BP+DI] [SI] [DI] disp16 [BX]
01 disp8 [BX+SI] [BX+DI] [BP+SI] [BP+DI] [SI] [DI] [BP] [BX]
10 disp16 [BX+SI] [BX+DI] [BP+SI] [BP+DI] [SI] [DI] [BP] [BX]
11 (reg:) AL/AX CL/CX DL/DX BL/BX AH/SP AH/BP AH/SI AH/DI

This means there are 25 unique addressing modes on the 8086, ignoring the register mode, as it's not addressing memory.

80386 Addressing Modes

Intel could have kept the same set of addressing modes for the 32-bit 80386, but computer architecture design had advanced between 1978 and 1985 when the 80386 was released. Firstly, the 68000 CPU with its more flexible addressing modes, represented a significant amount of competition, primarily because it was already 32-bit and used for high-end workstations and secondly, RISC processor designs were starting to emerge and these showed that simple addressing modes were used most of the time.

Therefore, the i386 took the drastic step of changing the addressing modes in its 32-bit mode. Instead of the double-index register modes and single-index register modes being allocated to the R/M field, only a wider set of single-index register modes were allocated; and where SP would be being used as an index register a second address extension byte was added, the SIB byte. All the SIB byte does is provide a pair of 3-bit index register fields and a 2-bit scaling field for the first index register. These fields can be basically mixed and matched.

MOD R/M: 000 001 010 011 100 101 110 111
00 [EAX] [ECX] [EDX] [EBX] [Ix*n+ Base] disp32 [ESI] [EDI]
01 disp8 [EAX] [ECX] [EDX] [EBX] [Ix*n+ Base] [EBP] [ESI] [EDI]
10 disp32 [EAX] [ECX] [EDX] [EBX] [Ix*n+ Base] [EBP] [ESI] [EDI]
11 (reg:) AL/AX/ EAX CL/CX/ ECX DL/DX/ EDX BL/BX/ EBX AH/SP/ ESP AH/BP/ EBP AH/SI/ ESI AH/DI/ EDI

SIB

Ix:3 Scale:2 Base:3
EAX *1 EAX
ECX *2 ECX
EDX *4 EDX
EBX *8 EBX
(none) ESP
EBP *
ESI
EDI

(* if Mod=00 and Base=* then the addressing mode is disp32[Ix*n], i.e. a 32-bit displacement and a scaled index register without a base register).

Some of the SIB encodings overlap with existing addressing modes. E.g. Ix=No Index and Base=EAX..EBX will overlap with R/M= the same Base register. Also, Mod=00, Ix=Index, n=1, Base=* overlaps with disp32[R/M=Ix].

This means that if we count addressing modes on the 80386 the same way we count them on the 8086, we have 7*3+1 = 22 main addressing modes + the SIB addressing modes * 3. There are 8*4*5=160 SIB modes, giving another 480 modes + the special '*' mode, giving a grand total of 503 addressing modes (most of which are 2 bytes, yet these are also the least frequently used ones).

A programmer, of course, could restrict themselves to only using the same subset of addressing modes available on the 16-bit x86 CPUs, by merely substituting 32-bit index and base registers. This would have the same syntax, but a longer encoding for all the Index + Base addressing modes.

Conclusion

I did a lot of 8086 assembly programming in the 1990s, because developers had to do more assembly, because CPUs were slower and compilers were poor. However, the 16-bit x86 CPUs were also simpler devices.

I have never had to do i386+ programming, as embedded programming moved away from x86 and shifted away from assembly, but I was always intrigued by it. The hardest bit would always be the new addressing modes, but every time I read up on them, it just seemed like more effort than it was worth. Finally, after about 30 years I took it seriously and discovered they're more simple than the descriptions I've seen. So, this blog post covers my new understanding.

Of course, the 32-bit Intel era has been over, for more than 10 years, even though Intel is unable to delete it from its CPUs! Soon the Intel era itself might be over as ARM CPUs (and perhaps RISC-V CPUs) overtake its performance at all scales.

Tuesday, 21 January 2025

Burn's Night Is The Coldest

In winter I usually mark off 4 dates as the yearly cycle starts transforming into a more positive outlook. I call these 'Milestones'.

Milestone 1

This is the earliest evening. Most people are unaware that evenings start getting later, before the shortest day. In 2024, evenings in the UK started getting later from December 12.

Milestone 2

This is the shortest day, the winter solstice. Everyone knows about this, however, even after this day, the mornings are still getting later; it's just that the evenings are getting later quicker than the mornings are, so the days start getting longer.

Milestone 3

This is the latest morning. Again most people are unaware of this, but it happens right at the end of the year. In 2024 it happened around December 30 or 31.

Milestone 4

This is the coldest day of the year on average and the topic for this post. It's difficult to calculate this date, because daily temperatures vary wildly from day to day and also across years for equivalent days. Nevertheless, it's fairly easy to see that after the days start getting longer, they continue to also get colder for a while. This is because other environmental factors such as cloud cover, heat loss from the ground, air temperatures or the Jet Stream can continue to drive temperatures on average down faster than the sun adds energy to the atmosphere and land.

Anecdotally, I used to figure the coldest time of the year was at the end of January / beginning of February, so I set Milestone 4 on January 31. Later, however, I thought to myself that perhaps it's mid-way between the winter solstice and spring equinox, because all of these diurnal patterns tend to follow year-long sine waves.

Winter solstice is on December 21, and Spring equinox is on March 21. So, that's 10 days in December + January (31 days) + February (28.24 days) + 21 days of March. This is 10+31+28.24+21=90.24 days. Calculating Milestone 4 after 90.24/2=45.12 days, which, given 10 days at the end of December + 31 days in January leaves 4.12 days. So for the past several years I've been setting it on February 4.

But neither of these techniques are based on actual evidence. What if it's not symmetrical as I've assumed? What if temperatures simply aren't shifted mid-way? To figure that out I need real data.

Real Data

I was involved in an on-line, climate discussion trying to work out how temperatures had changed in the UK over the past decade or so and found an open Statistica page on it:


You can hover over the months to get the actual figures, downloading the raw data requires a subscription. It turns out that for nearly all the months in the year there's an upward trend, but for January there's no observable trend.

But as I was looking at it, I realised that I could use my new understanding of Fourier transforms to obtain a better approximation for the coldest day.

The Winding Principle

At University we covered quite a lot of math in the first year including Fourier Transforms (and Laplace Transforms). I was able to do the math, but I didn't remotely understand how one can isolate the set of harmonic frequencies from waveform data. It took a Hackaday article to help me. I can't do justice to the article, nor the associated animated video explainer, but I can précis the idea as far as the fundamental harmonic goes, which is all we care about here.

Every complex, repeating, sampled waveform can be constructed from a set of sine waves at 1x, 2x, 3x.. the fundamental frequency up to half the sample period just added together. However, if I want to isolate the fundamental frequency that turns out to be pretty easy. All you do is multiply each sample by the sine of the corresponding angle within the waveform and add the results together. If the fundamental is present, then its amplitude at any point will cohere with the sine wave itself, but higher frequencies will 'disappear', because their positive phases will end up getting multiplied by both the positive and negative phases of the reference sine wave. For example, consider an 8-sample wave containing a fundamental and 1st harmonic:

Sample# 0 1 2 3 4 5 6 7 Total
Ref Sine 0.000 0.707 1.000 0.707 0.000 -0.707 -1.000 -0.707 0.000
Fundamental 0.000 0.573 0.810 0.573 0.000 -0.573 -0.810 -0.573 0.000
^^^ x Ref Sine 0.000 0.405 0.810 0.405 0.000 -0.405 -0.810 -0.405 3.240
1st Harmonic 0.000 0.210 0.000 -0.210 0.000 0.210 0.000 -0.210 0.000
^^^ x Ref Sine 0.000 0.148 0.000 -0.148 0.000 0.148 0.000 -0.148 0.000

To fully calculate each harmonic you need to consider the phase of each harmonic. That's because a sine wave at any given phase can be generated by a pair of sine + cosine waves with two respective amplitudes; thus the above technique will only recover the sine wave component. For example, if the Fundamental was shifted by +90º, then the Fundamental * the Ref Sine would still end up with a total of 0, but here the wave * a Reference cosine wave would have an amplitude of 3.240.

Finding The Phase

Therefore, a Fourier analysis of the fundamental can tell us not only its amplitude, but also its phase. And it turns out we can obtain an accurate phase from relatively few samples. The phase is simply obtained from the vector obtained from ∑ waveform data * the Reference sine wave on the x axis ∑  waveform data * the Reference cosine wave on the y axis.

This means that even though all we have are monthly values for the temperature data, we can calculate the actual minima, zero-crossings and maxima at a much higher resolution.

The phase calculated is always relative to the reference angles. For example, if we started the reference angle at 30º and the samples were a sine wave starting at 30º, then the phase would still be 0º. If the reference angle was 0º, and the samples were a sine wave starting at -90º, then the relative phase would be reported as 90º, because the zero-crossing for the sine wave would be at 90º.

The phase therefore tells us the average temperature day and the minimum temperature day will be 90º earlier (or 91.31 days earlier). For UK temperatures, the minimum temperature is therefore reported as Jan 25.5. Ironically, this means that Burn's Night is the coldest.

There's one more aspect of the model that's worth mentioning, which is that the reference phases aren't equidistant, because the months don't all have the same number of days in them (though it's close). Therefore, in this calculation, the reference dates are taken from the mid-point of each month, on the basis that the average temperature for that month represents the temperature half-way through the month.

Minimal Temperature

Temperatures:

Your browser does not support the canvas element.

Wednesday, 8 January 2025

Basic Blitz: A Surprisingly Addictive VIC-20 Remake

The game Blitz was written and self-published by Simon Taylor for the unexpanded VIC-20 in 1981, then later sold to Mastertronic.

https://www.eurogamer.net/lost-and-found-blitz

I always thought it looked like a game that must have been written in Basic, but I never got around to testing that until the beginning of 2025.

So, here's my version in all its glorious 64 lines of code!


Mine seems to be based on the later Mastertronics' version, because my plane is just one graphic character instead of 2 or 3 and my buildings are multicoloured instead of just black. Multi-coloured buildings adds to the fun, given most actual buildings are grey.

Also, mine doesn't speed up during each flight; it does get faster per level while the number of buildings it generates also increases by 2. My current high score is 533. Game control is pretty simple: you just press 'v' to drop a bomb as the ship flies across the screen. Only one bomb can be dropped at a time.

Design

Enough of the gameplay, let's discuss the software design. The outline of the game is pretty simple:
  • Line 5 reserves memory for the graphics characters then calls a subroutine at line 9000 to generate them.
  • Line 7 defines a function to simplify random number generation.
  • Line 8 is a bit of debug, see later.
  • Line 9 resets the high score. So, this only happens once.
  • Line 10 starts a game with a width of 5 (so 5x2=10 buildings are generated) and a delay of 100 between each frame.
  • Lines 30 to 60 are the main loop of the game. It really is that tiny. The loop terminates when the  plane lands or hits a building. Within that the plane is drawn (by displaying it in its next position then erasing the previous position to avoid flicker).
  • Bomb handling is done in lines 45 to 50, but the explosion is handled in lines 200-300.
  • End of game is handled in lines 66 to 80 including displaying "Landed" or "Crashed", updating the high score and handling the user wanting to quit.
  • Line 99 resets the graphics characters back to the normal character set so that you can carry on editing it.
  • The subroutine at line 100 performs the equivalent of a PRINT AT.
  • The subroutine at lines 200 to 250 handle a bomb hitting a building (a random number of floors are destroyed).
  • The subroutine at lines 8000 to 8070 generate a new level based on W, the width of the cityscape.
  • The subroutine at lines 9000 to 9010 generates the graphics characters and sets the sound level to 5.
  • The data from lines 9012 to 9090 are the graphics characters themselves, in the sequence: 'blank', 'solid square', 3x building types, 2x roofs, plane, grass.
  • The subroutine from lines 9500 to 9520 wait for a key to be released, then pressed, returning the key in A$.

Graphics

Because VIC-20 graphics are weird, programmers end up with bespoke graphics routines, so it's always worth discussing them. Firstly, VIC-20 graphics are tile-based, somewhat like the Nintendo Entertainment System. Video memory contains character codes between 0 and 255, and each character code points to an 8x8 bit pattern at CharacterMemoryBaseAddress+(CharCode*8). Usefully, the base address for the character bit patterns (and the video base address too) can be set by poking 36869. That base address can be set to RAM (which gives the programmer 256 tiles to play with), ROM (which is the default and provides caps+graphics or a caps+lowercase+some graphics option) or can be made to straddle both (which gives the programmer up to 128 tiles to play with + an upper case character set). This is the case even though the user defined graphics (UDGs) have addresses below 8192 while the ROM tiles are above 32768, because of the way the 14-bit VIC-chip's address space is mapped to the VIC-20's, full 16-bit address space.


In practical, unexpanded VIC-20 applications, programmers will want to use as few UDGs as possible to maximise program space while retaining much of the conventional character patterns. In Basic Blitz we therefore set the graphics to straddle mode (value 0xf, giving a CharacterMemoryBaseAddress of 0x1c00) which means characters 0..127 are in RAM and 128..255 are in ROM.

Intuitively, you might imagine that you'd want to start using tile 0 first, but that would waste of most of the tile space, so in fact we always count the UDGs we need backwards from tile 63, because tiles 64 to 127 overlap with video memory itself by default (and are therefore unusable!). Also, because the VIC-20 ROM characters aren't in ASCII order, and amazingly enough don't include the filled-in graphics character I have to provide that. When Basic Blitz is run, it first shows the entire usable character set.


I added this as a bit of debug, because I initially wasn't sure the ROM characters would print out OK. Also, I then made it print Hello in red to test both my PRINT AT subroutine and embedded colour control codes.

Graphics characters can easily be printed, because they're the normal characters '6, '7', 8', 9', ':', ';', '<', '=', '>', '?'. Normal text can be displayed, but you have to force 'inverse' characters which is achieved by preceding each print statement with <ctrl>+9 and ending with a true character <ctrl>+0.

Colours

Colours on a VIC-20 are strangely limited. There's a block of colour attribute memory, one location for each video byte, but each one is only 4 bits, which means you can only select an INK colour for on pixels. The PAPER colour is global, defined by bits 4..7 of 36879. The VIC-20 partially gets around this by normally making characters 128 to 255 inverse characters, but also by defining bit 3 of 36879 as normal or inverse mode.

The upshot though is that with the ROM character sets you can choose a common PAPER colour with any INK, or the common PAPER colour as INK, with any INK colour as PAPER. But when you select the character set to straddle RAM and ROM, you can only choose any INK colour + the common PAPER colour.

Hence in Basic Blitz, the background is white (as that seems most useful) and I have to define a UDG just so that I can get a filled in green character for grass with a building on top.

Sound

BASIC Blitz, sound is pretty simple. The initialisation routine switches audio on to level 5 (POKE 36878, 5); and leaves it there. There are 3 voice channels, which are individually switched on if bit 7 is set. In practical terms, each voice has a range of about 2 octaves, the first one having values from 128 to 65; then the next octave from 64 to 33. Beyond 32, the frequency ratio between each note is 1.03 to 1.06, close to that of a semitone 1.059 making most note intervals unusably out of tune.

The plane makes a drone sound using the lowest pitch audio channel (address 36874) OR'd with the bottom 4 bits of the jiffy clock at PEEK(162).

The bomb uses the high octave channel (at 36876) just generating an ascending tone. If the bomb hits a building it's silenced and the noise channel with a fixed low pitch of 129. The important thing, finally is to turn off all the sounds when they're done, by poking the channels with 0.

Playing The Game

You can run this VIC-20 Javascript emulator and type in the code (if the keyboard mapping allows it):


I've found this emulator is better than the Dawson one for .prg files. Here's how to load the .prg on a desktop/laptop. First download the BasicBlitz.prg from my Google Drive. Then drag the file from wherever you downloaded it from to the emulator in the browser. It will automatically load and run!

However, it's also useful to be able to type in code directly for editing, debugging and other stuff.

The keyboard on my MacBook M4 doesn't map correctly to VIC-20 keys, because the emulator does a straight translation from character codes to VIC-20 keys rather than from key codes. This means that pressing Shift+':' gives you ';' on this emulator rather than '[' as marked on a VIC-20 keyboard.

Mostly this makes typing easier, but the VIC-20 uses a number of embedded attribute key combinations. Basic Blitz doesn't use many, here's how to type what it does use, it isn't easy!

In Chrome, you need enable console mode, by typing function key F12. Then tap on the Console tab. In Safari, you need to choose Safari:Settings... then Select the 'Advanced tab'; and click on "Show features for web developers" at the bottom. Then the "Develop" menu appears on the menu bar and you can then choose Develop:Show JavaScript Console.

So far so good. Now, you can type most of the text as normal, but whenever you need to type a special code, type pasteChar(theCode) in the console followed by Enter (e.g. pasteChar(147) for the clear screen code). Here are the codes you'll need:
  • Inverse 'R' => 18. This is for Reverse text, which ends with inverse nearly underline => 146.
  • Inverse '£' (Red) => 28.
  • Inverse '┓' (Black) => 144.
  • Inverse 'S' => 19 (this is the home code).
  • Inverse heart => 147 (this is the clear screen code).
  • Inverse up-arrow => 30 (this is green).
  • Inverse 'Q' and inverse '|'can be typed directly just using the down cursor and left cursor respectively.
  • The codes in line 8045 are more colour codes used for the buildings. They are 144 (Black), 28 (Red), 159 (Inverse filled diagonal=cyan), 156 (checkered-black character=purple), 30 (inverse up arrow = green), 31 (inverse left arrow=blue), 158 (inverse 'π' = yellow).

Conclusion

The original VIC-20 Blitz program, though derivative in its own way, is so simple it could have been written in BASIC, as this version proves. The arcane design of the VIC-20 hardware and its lousy BASIC implementation means there's a lot of subtle complexity even in a simple game. Finally, although there are many emulators for the VIC-20, both the Javascript implementations I know of have limitations and bugs which make distributing this game and/or modifying it non-trivial.