I can't believe it's taken me over 30 years to get around to actually learning the 80386 addressing modes. So, long in fact that the x86 architecture has now been obsolete for about 20 years (slightly earlier for the original AMD64).
This is a blog post about how to make sense of the x86 addressing modes, because it's really not that complex if it's presented in a clear way. You can skip ahead to the end, but first a bit of background on my 8086 coding experiences!
I used to do a significant amount of 8086 programming in the early-mid 1990s, when I was working at Micro Control Systems in Sandiacre, Nottingham. I was assigned to the solid-state storage products (called Silicon Disk) which implemented battery-backed SRAM, EPROM and Flash expansion cards for PC compatibles that emulated bootable hard disks via an Int13h interface and a boot rom. They were versatile in one sense, because you could combine different storage media on a single card or combine multiple cards to make a larger disk - even so, the disks were small by modern standards: a maximum of 3MB per full-length ISA card.
The EPROM and Flash disks were fairly rudimentary though: the user would have to first erase all the storage (using UV in the case of EPROM and using a software erasing tool for Flash); and then copy the data they needed to the Silicon Disk. The firmware hijacked the MSDOS INT23h(?) interface so that writes to the Silicon Disk used our code. Files were written directly to the non-volatile storage, but the firmware kept the FAT table in the PC's RAM so that they could be updated multiple times. Finally, the user was expected to close the disk, which caused the FAT table to be written to the disk properly, along with a boot ROM image so the PC could boot up from the device.
This meant that people were able to develop a full, solid-state PC. Of course, a PC with a read-only disk isn't terribly useful, so most EPROM or Flash disks would also have at least one bank allocated to battery-backed RAM. And in this sense the PC became a large microcontroller with up to 2MB of EPROM/Flash and 1MB of a RAM disk.
All the firmware was written in 8086 code which meant I ended up being pretty familiar with 8086 coding. Coming from a 68000 background that was fairly disappointing, but I learned to make decent use of the CPU. Eventually I persuaded the company that we could improve development time by only having to write the core routines in assembly, while the rest of it could be written in 'C'.
I rarely had to write in 80286 assembly - it was basically the same as 8086 programming with a few more instructions, pusha and popa being the most useful. We never had to write 80386 code, so I never had to learn that and my embedded programming jobs after Micro Control Systems never required it either.
But occasionally I'd come across 80386 code and realise that some of the basic stuff had changed: in i386 mode, addressing modes are more flexible and you can scale index registers so that the CPU can direct address 16-bit or 32-bit arrays. It's possible to read the code OK, but not write it unless you know what the constraints on the address modes really are.
So, finally I've had a go at trying to understand them and it turns out, it's not very complex.
Recap: 8086 Addressing Modes
The x86 series has a byte-oriented instruction set, which means it's just a series of 1 to however many bytes regardless of whether you're dealing with the original 16-bit 8086 or a 64-bit Core i7. Many instructions consisted of a specific initial byte followed by an effective address byte which told the CPU where to find the memory location (or register) to obtain the source or destination data. This was called the MOD:REG:R/M byte. In some cases this byte is sufficient, but in other cases, the MOD bits would indicate 8 or 16-bit literal offsets would then follow and these values were added to whatever memory location was indicated by the R/M bits. The meaning of the R/M bits themselves depended on the MOD bit value too, and could either be one of the 8, 8-bit registers (or 16-bit registers if it was operating on 16-bit data); one of the 4, 16-bit registers that could be used for indexing: BX, BP, SI and DI or a restrictive combination of a pair of those registers: BX+SI (or DI) / BP+SI (or DI).
In addition, although SP addressed memory (because it was the stack pointer) it wasn't possible to index via SP; instead the convention was to use BP to point to a frame of data in the stack and it was possible to use that with an offset. Intel did that, because frame pointers were a fairly common convention for Pascal in the 1970s when the 8086 was designed.
So, it was all pretty restrictive: only 3 address registers were generally available and indexing data with a computed offset was even more limited: only BX+SI (or DI) being the useful mode (because normally you wouldn't stick an entire array on the stack). Still experienced 8086 programmers were used to juggling registers in functions so that the right address registers just so happened to be available when the programmer needed them. It was slow, tedious, but surprisingly efficient.
By comparison, the 68000 was a delight, because you could use any one of 8 address registers (A0..A7) and index them directly; or with a post-increment / pre-decrement (directly implementing 'C's ++ and --); or with a 16-bit displacement or 8-bit displacement and a second register which could be any of the 16 registers D0..D7, A0..A7 treated as 16-bit or 32-bit offsets). Very flexible.
The whole set of addressing modes can be summarised below:
MOD |
R/M: |
000 |
001 |
010 |
011 |
100 |
101 |
110 |
111 |
00 |
|
[BX+SI] |
[BX+DI] |
[BP+SI] |
[BP+DI] |
[SI] |
[DI] |
disp16 |
[BX] |
01 |
disp8 |
[BX+SI] |
[BX+DI] |
[BP+SI] |
[BP+DI] |
[SI] |
[DI] |
[BP] |
[BX] |
10 |
disp16 |
[BX+SI] |
[BX+DI] |
[BP+SI] |
[BP+DI] |
[SI] |
[DI] |
[BP] |
[BX] |
11 |
(reg:) |
AL/AX |
CL/CX |
DL/DX |
BL/BX |
AH/SP |
AH/BP |
AH/SI |
AH/DI |
This means there are 25 unique addressing modes on the 8086, ignoring the register mode, as it's not addressing memory.
80386 Addressing Modes
Intel could have kept the same set of addressing modes for the 32-bit 80386, but computer architecture design had advanced between 1978 and 1985 when the 80386 was released. Firstly, the 68000 CPU with its more flexible addressing modes, represented a significant amount of competition, primarily because it was already 32-bit and used for high-end workstations and secondly, RISC processor designs were starting to emerge and these showed that simple addressing modes were used most of the time.
Therefore, the i386 took the drastic step of changing the addressing modes in its 32-bit mode. Instead of the double-index register modes and single-index register modes being allocated to the R/M field, only a wider set of single-index register modes were allocated; and where SP would be being used as an index register a second address extension byte was added, the SIB byte. All the SIB byte does is provide a pair of 3-bit index register fields and a 2-bit scaling field for the first index register. These fields can be basically mixed and matched.
MOD |
R/M: |
000 |
001 |
010 |
011 |
100 |
101 |
110 |
111 |
00 |
|
[EAX] |
[ECX] |
[EDX] |
[EBX] |
[Ix*n+ Base] |
disp32 |
[ESI] |
[EDI] |
01 |
disp8 |
[EAX] |
[ECX] |
[EDX] |
[EBX] |
[Ix*n+ Base] |
[EBP] |
[ESI] |
[EDI] |
10 |
disp32 |
[EAX] |
[ECX] |
[EDX] |
[EBX] |
[Ix*n+ Base] |
[EBP] |
[ESI] |
[EDI] |
11 |
(reg:) |
AL/AX/ EAX |
CL/CX/ ECX |
DL/DX/ EDX |
BL/BX/ EBX |
AH/SP/ ESP |
AH/BP/ EBP |
AH/SI/ ESI |
AH/DI/ EDI |
SIB
Ix:3 |
Scale:2 |
Base:3 |
EAX |
*1 |
EAX |
ECX |
*2 |
ECX |
EDX |
*4 |
EDX |
EBX |
*8 |
EBX |
(none) |
|
ESP |
EBP |
|
* |
ESI |
|
|
EDI |
|
|
(* if Mod=00 and Base=* then the addressing mode is disp32[Ix*n], i.e. a 32-bit displacement and a scaled index register without a base register).
Some of the SIB encodings overlap with existing addressing modes. E.g. Ix=No Index and Base=EAX..EBX will overlap with R/M= the same Base register. Also, Mod=00, Ix=Index, n=1, Base=* overlaps with disp32[R/M=Ix].
This means that if we count addressing modes on the 80386 the same way we count them on the 8086, we have 7*3+1 = 22 main addressing modes + the SIB addressing modes * 3. There are 8*4*5=160 SIB modes, giving another 480 modes + the special '*' mode, giving a grand total of 503 addressing modes (most of which are 2 bytes, yet these are also the least frequently used ones).
A programmer, of course, could restrict themselves to only using the same subset of addressing modes available on the 16-bit x86 CPUs, by merely substituting 32-bit index and base registers. This would have the same syntax, but a longer encoding for all the Index + Base addressing modes.
Conclusion
I did a lot of 8086 assembly programming in the 1990s, because developers had to do more assembly, because CPUs were slower and compilers were poor. However, the 16-bit x86 CPUs were also simpler devices.
I have never had to do i386+ programming, as embedded programming moved away from x86 and shifted away from assembly, but I was always intrigued by it. The hardest bit would always be the new addressing modes, but every time I read up on them, it just seemed like more effort than it was worth. Finally, after about 30 years I took it seriously and discovered they're more simple than the descriptions I've seen. So, this blog post covers my new understanding.
Of course, the 32-bit Intel era has been over, for more than 10 years, even though Intel is unable to delete it from its CPUs! Soon the Intel era itself might be over as ARM CPUs (and perhaps RISC-V CPUs) overtake its performance at all scales.