Saturday, 25 April 2026

Recovering Sinclair QL App DataSpaces

Recently I've been playing with a QL emulator again, because I found a printed copy of my third year Computer Science dissertation, and wanted to retype it in the default word processor, QUILL.

However, I couldn't run QUILL, because when the program was copied to my Mac's file system it didn't copy the header, which contains the data and stack space allocated to the program. And that's part of how the QL works, executable files contain meta-data providing this information and it gets lost on modern operating systems including Linux and Windows.

In theory, fixing it is as easy as reserving memory for the program (progCode=RESPR(sizeOfFile)), then loading the code (lbytes fileName,progCode), then re-saving it under a different name: (sexec_w newFileName,progCode,dataSpace). But this means finding out how much data space has been allocated for static data and the stack. And... this information isn't generally available, even though many QL owners have had to face the problem.

The nearest I got was a web site which contained a BASIC program which could tweak QUILL for a few features I didn't care about (see the section called QUILL Mod). But at the end it did sexec_w QUILL with an actual data space which worked. I was then able to use QUILL to type in the first couple of pages of my dissertation, which was fun.

This wasn't a solution for all the other programs I could run on my QL emulator, e.g. Forth79! Amazingly though I found an intriguing program on one of my QL directories on my Mac called: HeadRead_bas. This turned out to be a machine code program and hexloader for it, which would then save the machine code in a file. Could this be it? Here's the program:

100 CLS:RESTORE:READ space:start=RESPR(space)
140 PRINT 'loading hex..':endAddr=hex_load(start)
150 INPUT 'save to file';f$
160 SBYTES f$,start,endAddr-start
170 STOP
180 DEFine FuNction dec(h$):RETurn h$(1) INSTR "0123456789ABCDEF"-1:END DEFine
190 DEFine FuNction hex_load(start)
195 LOCal sum,addr
200 PRINT 'Data entered at:',start
220 sum=0:addr=start
230 REPeat load_hex_digits
240 READ h$:IF LEN(h$)<>2*INT(LEN(h$)/2) THEN PRINT "Odd Hex digit Count";h$:STOP
300 FOR b=0 TO LEN(h$) STEP 2
360 byte=16*dec(h$(b+1))+dec(h$(b+2)):POKE addr,byte
370 sum=sum+byte
380 addr=addr+1
390 END FOR b
400 END REPeat load_hex_digits
410 READ check
420 IF check=sum then print "Sum OK":else print "Bad Sum"
430 RETurn addr
490 END DEFine
500 DATA 144
510 DATA '43FA000A34790000','01104ED20002001E'
520 DATA '0747657448454144','0010075365744845'
530 DATA '4144000000000000','784660027847BBCB'
540 DATA '675A2A0D4BEB0008','3479000001124E92'
550 DATA '664C3031E80054AE','0058264D2A45C0FC'
560 DATA '0028D0AE0030B0AE','00346C2C2A360800'
570 DATA '6B26347900000118','4E9266225343661C'
580 DATA '2031E80008000000','6612204522407440'
590 DATA '766420044E434E75','70FA4E7570F14E75'
600 DATA '*',10007

The program doesn't read QL program headers, it just creates the machine code file you can then use in another program to read headers. And that program was elsewhere in the same directory too:

10 hdrMod=RESPR(144):LBYTES mdv1_GetSetHead_bin,hdrMod:CALL hdrMod
100 BUFFER=RESPR(64)
120 INPUT 'ENTER DEVICE:';F$
130 OPEN #3,F$
140 GetHEAD #3,BUFFER
150 PRINT F$;', ';PEEK_L(BUFFER);' BYTES'
160 PRINT 'LAST ALTERED ';DATE$(PEEK_L(BUFFER+52))
170 PRINT 'CURRENT DATA SPACE ';PEEK_L(BUFFER+6)
210 CLOSE #3

It loads in the machine code first. It turns out the machine code adds the command GetHEAD to BASIC. GetHEAD reads the header from a file at the given channel and stores it in an allocated buffer. Then we can look at offsets in the file for the actual size of the executable and the data space.

This solves part of the problem: I now had a program which could read the headers. However, all the reported dataspace values were reported as 0. Fortunately, I still have my real Sinclair QL and a floppy disk system which is still largely reliable! I could either look for the same BASIC programs on a floppy disk, or type it out by hand again. Indeed, the programs were on floppy disk too!

Now I was able to list all the data spaces for the executable files I had. It turns out that all the PSION programs for version 2.3 (though Easel is version 2.0) had a data space of 1280 bytes. So, then I could get all of them to work! Mostly I used QUILL and the Spreadsheet, ABACUS. I used the ARCHIVE database a bit and EASEL very little.

The rest can be summarised in this scrappy table:

Program                Size DataSpace
Computer One Assembler: ASSEMB 18094 256
Computer One Editor: EDITOR 12714 256
Computer One Linker:LINKER 4278 256
And Linker_A (??): LINKER_A 8616 4800
Debugger: debug_exc 2272 500
eda         13653 256
eye_q_dp         31476 43008
forth79         12616 57528

You might like to know what the assembly code for Header read is? I disassembled it using the Alan Giles disassembler written in BASIC. It's slow, about 1 or 2 lines per second but good enough for this.

3FF00 43FA000A               LEA     $000A(PC)=$3FF0C,A1
3FF04 347900000110           MOVE.W  $00000110,A2
3FF0A 4ED2                   JMP     (A2)
3FF0C 0002001E               OR.B    #$1E,D2
3FF10 0747                   BCHG    D3,D7
3FF12 6574                   BCS.S   $74(PC)=$3FF88
3FF14 4845                   SWAP    D5
3FF16 4144                   DC.B    'A','D'
3FF18 0010                   DC.B    0,16
3FF1A 0753                   BCHG    D3,(A3)
3FF1C 6574                   BCS.S   $74(PC)=$3FF92
3FF1E 4845                   SWAP    D5
3FF20 4144                   DC.B    'A','D'
3FF22 00000000               OR.B    #$00,D0
3FF26 0000                   DC.B    0,0
3FF28 7846                   MOVEQ   #$46,D4
3FF2A 6002                   BRA.S   $02(PC)=$3FF2E
3FF2C 7847                   MOVEQ   #$47,D4
3FF2E BBCB                   CMP.L   A3,A5
3FF30 675A                   BEQ.S   $5A(PC)=$3FF8C
3FF32 2A0D                   MOVE.L  A5,D5
3FF34 4BEB0008               LEA     $0008(A3),A5
3FF38 347900000112           MOVE.W  $00000112,A2
3FF3E 4E92                   JSR     (A2)
3FF40 664C                   BNE.S   $4C(PC)=$3FF8E
3FF42 3031E800               MOVE.W  $00(A1,A6.L),D0
3FF46 54AE0058               ADDQ.L  #2,$0058(A6)
3FF4A 264D                   MOVE.L  A5,A3
3FF4C 2A45                   MOVE.L  D5,A5
3FF4E C0FC0028               MULU    #$0028,D0
3FF52 D0AE0030               ADD.L   $0030(A6),D0
3FF56 B0AE0034               CMP.L   $0034(A6),D0
3FF5A 6C2C                   BGE.S   $2C(PC)=$3FF88
3FF5C 2A360800               MOVE.L  $00(A6,D0.L),D5
3FF60 6B26                   BMI.S   $26(PC)=$3FF88
3FF62 347900000118           MOVE.W  $00000118,A2
3FF68 4E92                   JSR     (A2)
3FF6A 6622                   BNE.S   $22(PC)=$3FF8E
3FF6C 5343                   SUBQ.W  #1,D3
3FF6E 661C                   BNE.S   $1C(PC)=$3FF8C
3FF70 2031E800               MOVE.L  $00(A1,A6.L),D0
3FF74 08000000               BTST    #$00,D0
3FF78 6612                   BNE.S   $12(PC)=$3FF8C
3FF7A 2045                   MOVE.L  D5,A0
3FF7C 2240                   MOVE.L  D0,A1
3FF7E 7440                   MOVEQ   #$40,D2
3FF80 7664                   MOVEQ   #$64,D3
3FF82 2004                   MOVE.L  D4,D0
3FF84 4E43                   TRAP    #$3
3FF86 4E75                   RTS
3FF88 70FA                   MOVEQ   #$FA,D0
3FF8A 4E75                   RTS
3FF8C 70F1                   MOVEQ   #$F1,D0
3FF8E 4E75                   RTS
3FF90 0000                   DC.B    0,0

The section hilighted in yellow is actually the information passed to SuperBASIC for the new command names and their syntax. In a future edit I hope to annotate it better.

Anyway, armed with this information you too, can go back to your old, actual QL and work out the data spaces for all the executables you couldn't otherwise run on your QL. Feel free to add them in comments!


Tuesday, 24 February 2026

Bike vs BEV

Introduction

We're frequently told that non-motorised bicycles are the most energy efficient transport in the world. So, there's no way a full EV (BEV) can compete with a bike for energy efficiency.

Or can it?

For example, it's easy to show that a bicycle at 20km/h requires about 75W of power; whereas a typical BEV might use 8.2kW at 50km/h. So, it's no contest.

Or is it?

The problem is that when people make these kinds of comparisons, they're comparing the energy required for the final product rather than the energy required at the source. For a non-motorised bike, that energy comes from the sunlight used to grow crops or farm animals + the energy used to process the food, but for a BEV that energy can come from a renewable energy source (such as Solar PV). Thus the right question to ask is whether replacing the crops making the additional food the cyclist needs, with Solar PV is more than the Solar PV needed for the BEV for the same distance.

With a bit of maths and some publicly available data, we can work it out!

Method

The basic concept is to make some simple (but not unreasonable assumptions) about how we power the cyclist, alongside some corresponding calculations for the BEV. The bike conversions we need are:

We'll see that when we do the calculations, available data is often in different units, so we'll have to do some unit conversions too. Also, we'll end up calculating backwards. For the EV, it's:


We can see already there are fewer obvious areas of loss, but that's because I've combined the motor for the BEV and the BEV into the same box (whereas for the Bike, the human is the motor).

Bike Calculations

This website provides us with a handy table for calculating power consumption for a given bicycle speed:


At 20km/h it gives 75W, which means that travelling 20km requires 75Wh of energy (3.75Wh/km). A Wh is 860 calories, so 75Wh is 860*75=64,500 calories, or 64.5kcal.

Humans, being biological systems burn fat (and other chemical energy), and emit CO₂, much like a combustion car burns petrol, except that it outputs CO₂ from fossil fuels, which adds to the atmosphere, increasing global temperatures. But the key thing is that the mechanism is similar, because chemicals are burned, so the efficiency is similar. In fact it's about 25%[1].

So, the amount of energy the human needs is 64.5/0.25=258 kcal to travel 20km. That's 258/20=12.9 kcal per km.

My big assumption is where the human gets that energy from. I assume they're getting it from a sandwich with no filling, i.e. from bread. And I approximate that with flour, because most of it will be flour and the other stuff, e.g. butter will be less energy efficient, because the energy conversion factor going from the Sun to butter also goes through crops and cows, and therefore can't be better than crops alone. The same reasoning applies to the filling from a sandwich, e.g. egg mayonnaise. The eggs have to go through a conversion factor involving crops and chickens, and therefore can't be better than crops.

It turns out that there are 3.58 calories per gramme of flour. So, 12.9 kcal requires 3.6g of flour and that's tiny. For white flour, we are interested in the energy content of the flour, which is 353kcal per 11.3g of protein[5]. From [4] we can see there's about 32kg of protein per Hectare, which means there's 32000/11.3*353kcal=999646 kcal per hectare. A hectare is 100m*100m so there's 999646/(100*100)=99.965 kcal per m². So, we need 12.9/99.965=0.129 m² of cropland per km.

BEV Calculations

This is a bit simpler, I'll take our Renault Zoe and our SolarPV as an example. Our Zoe achieves about 3.8miles per kWh, which is 3.8*1.609=6.114 km/kWh. So, 1 km uses 1/6.114=0.164kWh which is 164Wh.

We have 3.96kW of SolarPV on our roof, which usually provides 3200kWh per year. Current solar panels are rated at 450W and on average are 1.6m² which means that 0.45/3.96*3200/1.6=227.273kWh is generated per m².

Thus we need 0.164/227.273=0.000722m² of SolarPV per km with a Battery Electric Vehicle.

Results

It takes 0.129m² of cropland per km of cycling, but 0.000722m² of SolarPV per km in our BEV. Thus a BEV is 0.129/0.000722=178.67 x more efficient than a cyclist in terms of land area, a truly astounding result!

Conclusion

Cycling takes up land area to feed cyclists. We can estimate the land area based on the energy used by the cyclist; and the most optimal amount is if the cyclist eats only plant-based food. Thus every km of cycling (per year) corresponds to an area of crop land on a yearly basis. With some simplified, but reasonable assumptions, the value is 0.129m² of cropland per km, for a cyclist travelling at 20km/h.

A BEV only takes 0.000722m² of SolarPV per km per year, about 180x more efficient than a cyclist even though the BEV needs 164/3.75=43.733 more final energy per km.

There are two main reasons for this. Firstly people, being biological systems are inefficient, roughly as inefficient as a combustion car because they both burn chemicals for energy (though as said before, combustion cars add to atmospheric CO₂ and global temperatures). Secondly, and most importantly, crops are astoundingly inefficient compared to solar panels when it comes to energy conversion efficiency.

There are flaws with my method and conclusion, but not flaws amounting to 2.3 orders of magnitude. I only picked a single crop (wheat), but perhaps other crops have better energy yields per m² per year. I picked our Renault Zoe, which has a pretty good range/km: other BEVs, especially bigger ones can be worse (but some are better). On the other hand, practical food intake is far more than just flour, sandwich fillings will be more energy inefficient and I ignored food processing (though it would be a minor contribution). Also, I could have picked an e-bike: that same 75W would translate to 43.7x better efficiency!

But the upshot is clear: when we estimate land-use for transport, cycling requires over 200x the area of a decent Battery Electric Vehicle, yet no-one would consider the land-use requirements for cycling to be excessive.

Refs:

[1] “Thermal energy generated during the chemical reactions that power muscle contractions along with friction in joints and other tissues reduce the efficiency of humans to about 25 %.” 
https://phys.libretexts.org/Bookshelves/Conceptual_Physics/Body_Physics_-_Motion_to_Metabolism_(Davis)/10%3A_Powering_the_Body/10.09%3A_Efficiency_of_the_Human_Body




Wednesday, 31 December 2025

Personal Computer World Sun SparcStation-1 Review

While doing some research on the Sun Sparcstation-1 I was finding it difficult to search for pages that actually gave the prices of anything other than the basic diskless configuration. However, the review of the SparcStation in Personal Computer World (June 1989) in fact does. It's a comprehensive review and word reading! 

This post is almost entirely just pictures!








The critical information I was looking for is near the end.

The entry-level price for a SPARCstation 1 with 8Mbytes of RAM, a single 1.44Mbyte floppy drive, a standard 1152x900 video board and a 17in grey-scale screen is £7400, while a system with two
104Mbyte hard disks and the same video board driving a 16in 256-colour screen costs £12,700. With two hard disks, a 19in colour screen and the GX graphics accelerator, the price goes up to £16,400.

Conclusion

The Sun SparcStation 1 was a ground-breaking RISC workstation with a performance and price that matched x86 (i.e. the recently released i486) PCs for at least the next 12 months. I'm writing a series of blog posts on the machine. However, because data relating to the computer is getting lost to the internet over time it shows that having actual hard-copy magazines of the era can provide better and faster answers than Google (or another search engine).












TI-30LCD Says Hello & Does Stats!

There were a wide variety of scientific calculators in our Maths class in 1981. In 2025 I bought a Casio FX-180P to compensate for my original Casio FX-180P that got broken! It's great calculator with lots of features.

Although Casio calculators were the most popular at school at the time, other schoolmates had different makes. In my mind, I recall the TI-30LCD was one of the worst, but was it really that bad?

So, I bought one off eBay too! It turns out it's quite easy to make it says Hello!



Introduction

This was Texas Instruments' first attempt to update their much earlier TI-30 LED scientific calculator, which had been quite popular. They copied the late 70's brushed-aluminium look to make it seem more trendy and Japanese. The sideways view was pretty ugly though, because they tried to make it really thin, but then stuffed it up by adding a bulky 'AA' battery compartment instead of lithium coin cells!



 
Why? Cheap batteries? To make it incline to compensate for the poor LCD contrast?

It does have one nice touch, the [ON/C] button is slightly recessed: 1mm lower than the others and has a prominent surround to stop users accidentally turning it on when it's bouncing around in a haversack.

The buttons are pretty stiff though, requiring some force before clunking down, which although I think these are a bit worn after 44 years, it's much like I remember when borrowing a school-mate's TI-30 briefly. The layout is quite clever though; adding keys on both sides of the numbers makes for a pleasing symmetry.

On the positive side it can do the basic scientific stuff: π, Trig, logs, factorial and BOOBS! It's only got 8-digits + a dedicated '-', so 99-BOOBS- is its limit! Also interestingly, it displays "Error" in text using the LCD segments whereas Casio calculators just said: "E"

It's slow and inaccurate. 69! takes 1.34s on my FX-180P, but 7s on the TI-30 😮 ! The sin⁻ⁱ(cos⁻ⁱ(tan⁻ⁱ(tan(cos(sin(1)) test says 1.4756033, whereas my FX-180P says: 1.00020289 (it should be 1).

Finally, it lacks a specific stats mode which makes computing standard deviations a real pain and only has 4 levels of brackets (the Casios had 6). OTOH it can convert from degrees to radians if you press a number then [INV] [DRG>].

This calculator travelled 122 miles to get to me, but originally it was bought for a company, just 16 miles from where I live!

Stats

I played around with the calculator for a while and cleaned it up. Then I did a bit of thinking and realised that it is in fact possible to compute statistics fairly easily!

The standard Casio Calculator stats mode could compute all the functions you needed (like population and standard deviation, or averages) from three variables:

  • n: the number of items entered.
  • ∑x: the sum of the items.
  • ∑x²: the sum of the squares of all the items.
In theory this means you need 3 memories. We can reduce it to 2 by simply counting n, the number of items. But the TI-30 only has one memory. Or does it?

The trick is to use the internal memory to compute the sums, and the calculator stack to compute the ∑x² terms. And this is possible, because as well as a [STO] and [RCL] button on the calculator, it also has a [SUM] button.

So, we can use [SUM] to store the ∑x terms as you enter each one and [x²] [+] to compute the running ∑x² total.

For example, let's say we have the data: {5, 4, 8, 3, 4, 5, 7, 4, 3}.

First, you'd press: [0] [STO] to clear memory and the display.

Then

Number ∑x Term Generate x² (Display) ∑ (Display)
5 [SUM] [x²] 25 [+] 25
4 [SUM] [x²] 16 [+] 41
8 [SUM] [x²] 64 [+] 105
3 [SUM] [x²] 9 [+] 114
4 [SUM] [x²] 16 [+] 130
5 [SUM] [x²] 25 [+] 155
7 [SUM] [x²] 49 [+] 204
4 [SUM] [x²] 16 [+] 220
3 [SUM] [x²] 9 [+] 229

At this point the calculator shows 229 (∑x²). Pressing [RCL] gives you 43 (∑x) and manually you count n=9.

So, then:
  • Average=[RCL]/9=4.78;
  • Standard Deviation=√(∑x²/n -(∑x/n)²= √(229/9-(43/9)²)= 1.62.
It's better to clear the memory first and then use [SUM] all the time even though you could do 5 [STO] for the first one, because it allows you to repeat the pattern and avoid thinking.

I don't think this was actually taught as a technique at school. Instead they expected you to compute ∑x terms, then go back and compute the ∑x² in a new column, before summing each column and then calculating the variance (rather than the Standard Deviation). This technique avoids entering the numbers twice. It's still somewhat slower than using a Casio with its Sd mode, but at least it's quicker than a purely manual mode.

Correcting Errors

I've also worked out how to correct data entries too. Let's say you mis-entered 5307 as 5607 by typing:
5607 [SUM] [x²] [+]

On an FX-82 you'd just do 5607 [DEL] and it would correct it. But on a TI-30 there's no such button (just as there isn't an [x] button). But correcting data is almost as easy. You type:

[-] 5607 [+/-] [SUM] [X²] [+]

On a real TI-30 LCD you're much more likely to miss or double-type a digit due to the dodgy debounce (it shares LCD and keypad pins so it can't display and read keys at the same time), so correcting errors is important.

The TI-30 isn't RPN, so the [+] at the end of each line signifies that the next calculation will be another addition, but also gives you the running total. So, the initial [-] overrides the previous [+] so that when the [+] is hit at the end, it subtracts 5607². But when you enter the number, you can't type [SUM] to delete the ∑x term, because the number on the display will be positive. You need to hit [+/-] to make it properly negative and then hit [SUM]. When you hit [X²] it makes it positive again so the earlier [-] will subtract, as you intend. Also, it wouldn't work to type in 5607 [+/-], because when you later hit [X²] it'll become positive again.

Conclusion

The TI-LCD was already a poor calculator by the standards of the early 1980s. It lacked functions other calculators had as standard and accuracy was poor. However, a critical statistical mode can be implemented fairly easily, and can roughly halve the number of keypresses to compute.




Monday, 22 December 2025

D'oh, D'oh! Host <-> Target Transfer On MAME's SparcStation 1 Emulator


I'm interested in simulating the real-time experience of a Sun SparcStation 1, because I'm currently interested in RISC Workstation era at the turn of the 1990s and I've never used one. Without an actual SS1, an emulator is the easiest choice.

There are two primary emulators available. QEMU supports a SparcStation 5 emulator, but it runs as fast as possible. MAME supports SparcStation 1 emulator. It can be used to try out SunOS (e.g. version 4.1.2), but lots of things don't work, at least on the MAME Version 0.255 I'm exploring, e.g. ethernet, serial, colour framebuffers (and accelerated graphics), second drives, floppy disks, tape (I think, but I haven't tried that).

There's a fairly decent introduction to getting MAME's SS1 emulator running. The only major difference I'd make is to create an uncompressed hard disk image just using dd (which is the subject of this blog post). In my case, it's called SunHd320.hd not sunos412.chd.

dd if=/dev/zero of=SunHd320.hd  bs=16k count=19530

I can then run the emulator with this command line:

./mame sun4_60 -window -slot1 "bwtwo" "-scsibus:0" harddisk "-scsibus:1" "" "-scsibus:6" cdrom -hard SunHd320.hd -cdrom SunOS_412.iso

To make it somewhat usable I needed to get data in and out, and I faffed a lot with trying to create floppy disk images or second hard disks or serial transfer, but it was basically a waste of time. I could describe my multiple attempts to use these, but suffice to say, don't go down that rabbit hole until there's good evidence they work! Then, today it occurred to me I could just use dd, which of course that Unix Workstation had in 1989 and I also have on the Mac mini 2012 I'm emulating it on.

Background: Why The SparcStation 1?

The SparcStation 1 is a really interesting Sun Workstation, because it appeared at the peak of RISC's ascendency (April 12, 1989), just 2 days after the Intel i486 was launched, proving that CISC CPUs could achieve RISC levels of performance. The first Intel 486 computer didn't actually appear until September. I'm a fan of RISC. I loved PowerPC and I'm super-glad Apple have switched to ARM-based Macs. But the 1990s were an awkward reckoning for the methodology as vendors strove to stay ahead of Intel despite having far fewer engineers, only to find Intel copying their techniques and often better than comparable RISC computers.

So, skipping back to 1989. The SparcStation 1 was Sun's attempt to compete with PCs by designing a highly-integrated, low-end Sparc workstation. It ran at a blazing 20MHz, delivering about 13 Dhrystone MIPS (and a SpecInt89 of about 13-14 too). This was over twice as fast as the fastest Intel i386DX at the time and about 30% faster than the just announced Intel i486. So, it was impressive. It had 64kB of external cache (no internal cache) and used the bizarre Sun MMU ported from their earlier 68020-based workstations. This MMU doesn't use a Translate Look-aside Buffer, but dedicated SRAM chips to maintain a large, 2-level cache of page translations.

Finally, it has a Sparc V7 CPU, the epitome of a classic RISC design. For example, it has no multiply or divide instructions (just a multiply step instruction: 32 cycles for a 32x32=>64 bit result, but other multiply sizes can be done quicker).

Setting Up Data Transfer

The basic principle is that the disk image (.hd) is divided 8 partitions in SunOS and only 5 of them are used, so assigning some space to a spare partition and then accessing it using dd from either side is relatively easy.

There are several stages to making it work though! I'm using a 320MB HD image (a Sun type 5 drive). This would have been very big for 1989. Standard drives from Sun were 80MB or 104MB. You can install the OS from a CD image as described in the earlier link. It's a horrible textual, menu-driven thing which is easy to get wrong. The critical thing though is that you can install the OS and then later modify the partitions. By convention, disk3 is used for the main HD; partition a is root '/'; partition b is swap (not mounted as such); c is a non-mounted partition that covers the whole drive; g is /usr and h is /home.

For this experiment I didn't quite do an 'easy' setup. Initially one needs to >b cdrom at the bootloader prompt. Then when it says:

Enter 2. It puts you in a # root type prompt, but there isn't a user as such and you can't logout. Enter format and it will take you through the standard setup. You can type ?<cr> at any time to see all the menu options. I typed disk<cr> 0<cr> type<cr> 5<cr> partition<cr> . I knew from an earlier, 207MB HD setup what sizes I needed, and wanted the rest of the space for the /home directory, so I basically duplicated that:

Partition
(type letter<cr>)
Cylinder (prompted)
(672 x 512b blocks/ cylinder)
Blocks
(prompted)
a 0 16800
b 25 102144
c 0 623616
g 177 201600
h 477 303072

You should check they all add up and no partitions overlap (apart from partition c which covers the whole disk). Then You need to type label<cr> y<cr> quit<cr> then I typed label<cr> y<cr> quit<cr> just to make sure. Back at the # prompt, type reboot cdrom<cr> and on this run through, type 1<cr> then it'll install miniroot, then type 2<cr> because you've already 'formatted' the disk, 1<cr> to reboot, note the vmunix is now a bit bigger at 802kB instead of about 737kB. If the SS1 doesn't automatically reboot, type >b disk3 at the prompt and it should boot. At the # prompt type suninstall<cr> and you'll be into the installer proper ("Welcome to SunInstall"). Type 1<cr> then n<cr>. Although this means the partitions will be overwritten, the sizes of partitions a, b, c, g and h will be preserved. I then chose the Programmer option by following the on-screen instructions, then finally y<cr> to start. Note also, the installer runs newfs for each of these partitions, which means it isn't changing the partition sizes.

After the basic install, it will complete the installation by asking you to provide extra information. I pressed 2<cr> <RETURN> I set the hostname to SS1mini , chose GMT for the time zone. 2<cr> then 22/12/2025<cr> for the date (UK format) and 11:34:30<cr> for the time. 1<cr> to accept that. 2<cr> (not on a network), y<cr> to accept. Finally I chose a password; I set up a user account with a full name of Julian Skidmore; user name as js; 100 as the user ID (because 0..99 are supposedly reserved) and the same password as root. It then finally completed the installation and dropped me into a login prompt.

I logged in as root ; then typed reboot cdrom to reboot into the cd and typed 2<cr> to use the single user shell; ran format again.

The trick is to steal space from another partition and because a, b, g and h are all mounted as proper filesystems, the obvious place is to steal from the swap partition. We have about 49MB allocated to that, for a 16MB SS1, so stealing around 1.44MB will be OK. Type 0<cr> then partition<cr> then make these changes:


Partition
(type letter<cr>)
Cylinder (prompted)
(672 x 512b blocks/ cylinder)
Blocks
(prompted)
b 25 98784
d 172 3360

Then you need to check all the partitions add up; write the label (label<cr> y<cr>), then quit<cr> label<cr> quit<cr>. Finally you can reboot back into the emulator: reboot disk3<cr>.

You only have to do this once and you're now ready to do data transfer!

Performing A Data Transfer

As a test, I modified a version of dhrystone to provide more accurate results when using time() instead of the µsecond accurate system functions, then tar'd them up into a 40kB tar file. You could use any file as a demonstration. You need to measure the length of the files you want to transfer in this technique of course.

From the SS1 emulator I could then copy dhry.tar to d using:

dd if=dhry.tar of=/dev/sd3d bs=1024 count=40

Then I needed to logout, login as root; then /etc/halt the SS1 emulator and quit MAME. Now I could copy from the hard disk image to the host on my Mac mini using:

dd if=SunHd320.hd of=dhry.tar bs=1024 oseek=57792 count=40

(Since partition d was at cylinder 172 according to format: Partition: Print). If you had created a compressed, chd hard disk, you would need the MAME tool chdman and calculate absolute byte offsets and lengths ( e.g. ./chdman extracthd -i sunos412.chd -isb 84951040 -ib 40960 -o dhry.tar ).

I could then copy back to the SunHd320.hd, but this is a bit non-trivial, because I found dd truncates the file. I needed to do:

cp SunHd320.hd SunHd320B.hd

dd of=SunHd320.hd if=dhry.tar bs=1024 oseek=56112 count=40

Then add the rest of the hard disk image using:

dd of=SunHd320.hd if=SunHd320B.hd bs=1024 iseek=56152 oseek=56152

Finally I could boot back into the SS1 emulator and copy the data back (when logged in as root) using:

dd of=dhry.tar if=/dev/sd3d bs=1024 count=40

I could then untar it using tar -xvf dhry.tar.

Conclusion

MAME provides a real-time emulation of a SparcStation 1 on hardware at least a few hundred times faster (in my case, the 2.5GHz Dual-Core i5 on my Mac mini 2012). To make any emulator truly useful you have to get data in and out of it. However, this emulator is only barely working.

After quite a few dead ends, I realised I could use dd on the SparcStation emulator and the hard disk image on the host side to transfer data in a fairly clumsy way. It's very crude. Still, I have bi-directional data transfer and I can start playing around with some other SparcStation software installs.


Saturday, 17 May 2025

SIBlings! 80386 Addressing Modes Simplified

I can't believe it's taken me over 30 years to get around to actually learning the 80386 addressing modes. So, long in fact that the x86 architecture has now been obsolete for about 20 years (slightly earlier for the original AMD64).

This is a blog post about how to make sense of the x86 addressing modes, because it's really not that complex if it's presented in a clear way. You can skip ahead to the end, but first a bit of background on my 8086 coding experiences!

I used to do a significant amount of 8086 programming in the early-mid 1990s, when I was working at Micro Control Systems in Sandiacre, Nottingham. I was assigned to the solid-state storage products (called Silicon Disk) which implemented battery-backed SRAM, EPROM and Flash expansion cards for PC compatibles that emulated bootable hard disks via an Int13h interface and a boot rom. They were versatile in one sense, because you could combine different storage media on a single card or combine multiple cards to make a larger disk - even so, the disks were small by modern standards: a maximum of 3MB per full-length ISA card.

The EPROM and Flash disks were fairly rudimentary though: the user would have to first erase all the storage (using UV in the case of EPROM and using a software erasing tool for Flash); and then copy the data they needed to the Silicon Disk. The firmware hijacked the MSDOS INT23h(?) interface so that writes to the Silicon Disk used our code. Files were written directly to the non-volatile storage, but the firmware kept the FAT table in the PC's RAM so that they could be updated multiple times. Finally, the user was expected to close the disk, which caused the FAT table to be written to the disk properly, along with a boot ROM image so the PC could boot up from the device.

This meant that people were able to develop a full, solid-state PC. Of course, a PC with a read-only disk isn't terribly useful, so most EPROM or Flash disks would also have at least one bank allocated to battery-backed RAM. And in this sense the PC became a large microcontroller with up to 2MB of EPROM/Flash and 1MB of a RAM disk.

All the firmware was written in 8086 code which meant I ended up being pretty familiar with 8086 coding. Coming from a 68000 background that was fairly disappointing, but I learned to make decent use of the CPU. Eventually I persuaded the company that we could improve development time by only having to write the core routines in assembly, while the rest of it could be written in 'C'.

I rarely had to write in 80286 assembly - it was basically the same as 8086 programming with a few more instructions, pusha and popa being the most useful. We never had to write 80386 code, so I never had to learn that and my embedded programming jobs after Micro Control Systems never required it either.

But occasionally I'd come across 80386 code and realise that some of the basic stuff had changed: in i386 mode, addressing modes are more flexible and you can scale index registers so that the CPU can direct address 16-bit or 32-bit arrays. It's possible to read the code OK, but not write it unless you know what the constraints on the address modes really are.

So, finally I've had a go at trying to understand them and it turns out, it's not very complex.

Recap: 8086 Addressing Modes

The x86 series has a byte-oriented instruction set, which means it's just a series of 1 to however many bytes regardless of whether you're dealing with the original 16-bit 8086 or a 64-bit Core i7. Many instructions consisted of a specific initial byte followed by an effective address byte which told the CPU where to find the memory location (or register) to obtain the source or destination data. This was called the MOD:REG:R/M byte. In some cases this byte is sufficient, but in other cases, the MOD bits would indicate 8 or 16-bit literal offsets would then follow and these values were added to whatever memory location was indicated by the R/M bits. The meaning of the R/M bits themselves depended on the MOD bit value too, and could either be one of the 8, 8-bit registers (or 16-bit registers if it was operating on 16-bit data); one of the 4, 16-bit registers that could be used for indexing: BX, BP, SI and DI or a restrictive combination of a pair of those registers: BX+SI (or DI) / BP+SI (or DI).

In addition, although SP addressed memory (because it was the stack pointer) it wasn't possible to index via SP; instead the convention was to use BP to point to a frame of data in the stack and it was possible to use that with an offset. Intel did that, because frame pointers were a fairly common convention for Pascal in the 1970s when the 8086 was designed.

So, it was all pretty restrictive: only 3 address registers were generally available and indexing data with a computed offset was even more limited: only BX+SI (or DI) being the useful mode (because normally you wouldn't stick an entire array on the stack). Still experienced 8086 programmers were used to juggling registers in functions so that the right address registers just so happened to be available when the programmer needed them. It was slow, tedious, but surprisingly efficient.

By comparison, the 68000 was a delight, because you could use any one of 8 address registers (A0..A7) and index them directly; or with a post-increment / pre-decrement (directly implementing 'C's ++ and --); or with a 16-bit displacement or 8-bit displacement and a second register which could be any of the 16 registers D0..D7, A0..A7 treated as 16-bit or 32-bit offsets). Very flexible.

The whole set of addressing modes can be summarised below:

MOD R/M: 000 001 010 011 100 101 110 111
00 [BX+SI] [BX+DI] [BP+SI] [BP+DI] [SI] [DI] disp16 [BX]
01 disp8 [BX+SI] [BX+DI] [BP+SI] [BP+DI] [SI] [DI] [BP] [BX]
10 disp16 [BX+SI] [BX+DI] [BP+SI] [BP+DI] [SI] [DI] [BP] [BX]
11 (reg:) AL/AX CL/CX DL/DX BL/BX AH/SP AH/BP AH/SI AH/DI

This means there are 25 unique addressing modes on the 8086, ignoring the register mode, as it's not addressing memory.

80386 Addressing Modes

Intel could have kept the same set of addressing modes for the 32-bit 80386, but computer architecture design had advanced between 1978 and 1985 when the 80386 was released. Firstly, the 68000 CPU with its more flexible addressing modes, represented a significant amount of competition, primarily because it was already 32-bit and used for high-end workstations and secondly, RISC processor designs were starting to emerge and these showed that simple addressing modes were used most of the time.

Therefore, the i386 took the drastic step of changing the addressing modes in its 32-bit mode. Instead of the double-index register modes and single-index register modes being allocated to the R/M field, only a wider set of single-index register modes were allocated; and where SP would be being used as an index register a second address extension byte was added, the SIB byte. All the SIB byte does is provide a pair of 3-bit index register fields and a 2-bit scaling field for the first index register. These fields can be basically mixed and matched.

MOD R/M: 000 001 010 011 100 101 110 111
00 [EAX] [ECX] [EDX] [EBX] [Ix*n+ Base] disp32 [ESI] [EDI]
01 disp8 [EAX] [ECX] [EDX] [EBX] [Ix*n+ Base] [EBP] [ESI] [EDI]
10 disp32 [EAX] [ECX] [EDX] [EBX] [Ix*n+ Base] [EBP] [ESI] [EDI]
11 (reg:) AL/AX/ EAX CL/CX/ ECX DL/DX/ EDX BL/BX/ EBX AH/SP/ ESP AH/BP/ EBP AH/SI/ ESI AH/DI/ EDI

SIB

Ix:3 Scale:2 Base:3
EAX *1 EAX
ECX *2 ECX
EDX *4 EDX
EBX *8 EBX
(none) ESP
EBP *
ESI
EDI

(* if Mod=00 and Base=* then the addressing mode is disp32[Ix*n], i.e. a 32-bit displacement and a scaled index register without a base register).

Some of the SIB encodings overlap with existing addressing modes. E.g. Ix=No Index and Base=EAX..EBX will overlap with R/M= the same Base register. Also, Mod=00, Ix=Index, n=1, Base=* overlaps with disp32[R/M=Ix].

This means that if we count addressing modes on the 80386 the same way we count them on the 8086, we have 7*3+1 = 22 main addressing modes + the SIB addressing modes * 3. There are 8*4*5=160 SIB modes, giving another 480 modes + the special '*' mode, giving a grand total of 503 addressing modes (most of which are 2 bytes, yet these are also the least frequently used ones).

A programmer, of course, could restrict themselves to only using the same subset of addressing modes available on the 16-bit x86 CPUs, by merely substituting 32-bit index and base registers. This would have the same syntax, but a longer encoding for all the Index + Base addressing modes.

Conclusion

I did a lot of 8086 assembly programming in the 1990s, because developers had to do more assembly, because CPUs were slower and compilers were poor. However, the 16-bit x86 CPUs were also simpler devices.

I have never had to do i386+ programming, as embedded programming moved away from x86 and shifted away from assembly, but I was always intrigued by it. The hardest bit would always be the new addressing modes, but every time I read up on them, it just seemed like more effort than it was worth. Finally, after about 30 years I took it seriously and discovered they're more simple than the descriptions I've seen. So, this blog post covers my new understanding.

Of course, the 32-bit Intel era has been over, for more than 10 years, even though Intel is unable to delete it from its CPUs! Soon the Intel era itself might be over as ARM CPUs (and perhaps RISC-V CPUs) overtake its performance at all scales.

Tuesday, 21 January 2025

Burn's Night Is The Coldest

In winter I usually mark off 4 dates as the yearly cycle starts transforming into a more positive outlook. I call these 'Milestones'.

Milestone 1

This is the earliest evening. Most people are unaware that evenings start getting later, before the shortest day. In 2024, evenings in the UK started getting later from December 12.

Milestone 2

This is the shortest day, the winter solstice. Everyone knows about this, however, even after this day, the mornings are still getting later; it's just that the evenings are getting later quicker than the mornings are, so the days start getting longer.

Milestone 3

This is the latest morning. Again most people are unaware of this, but it happens right at the end of the year. In 2024 it happened around December 30 or 31.

Milestone 4

This is the coldest day of the year on average and the topic for this post. It's difficult to calculate this date, because daily temperatures vary wildly from day to day and also across years for equivalent days. Nevertheless, it's fairly easy to see that after the days start getting longer, they continue to also get colder for a while. This is because other environmental factors such as cloud cover, heat loss from the ground, air temperatures or the Jet Stream can continue to drive temperatures on average down faster than the sun adds energy to the atmosphere and land.

Anecdotally, I used to figure the coldest time of the year was at the end of January / beginning of February, so I set Milestone 4 on January 31. Later, however, I thought to myself that perhaps it's mid-way between the winter solstice and spring equinox, because all of these diurnal patterns tend to follow year-long sine waves.

Winter solstice is on December 21, and Spring equinox is on March 21. So, that's 10 days in December + January (31 days) + February (28.24 days) + 21 days of March. This is 10+31+28.24+21=90.24 days. Calculating Milestone 4 after 90.24/2=45.12 days, which, given 10 days at the end of December + 31 days in January leaves 4.12 days. So for the past several years I've been setting it on February 4.

But neither of these techniques are based on actual evidence. What if it's not symmetrical as I've assumed? What if temperatures simply aren't shifted mid-way? To figure that out I need real data.

Real Data

I was involved in an on-line, climate discussion trying to work out how temperatures had changed in the UK over the past decade or so and found an open Statistica page on it:


You can hover over the months to get the actual figures, downloading the raw data requires a subscription. It turns out that for nearly all the months in the year there's an upward trend, but for January there's no observable trend.

But as I was looking at it, I realised that I could use my new understanding of Fourier transforms to obtain a better approximation for the coldest day.

The Winding Principle

At University we covered quite a lot of math in the first year including Fourier Transforms (and Laplace Transforms). I was able to do the math, but I didn't remotely understand how one can isolate the set of harmonic frequencies from waveform data. It took a Hackaday article to help me. I can't do justice to the article, nor the associated animated video explainer, but I can précis the idea as far as the fundamental harmonic goes, which is all we care about here.

Every complex, repeating, sampled waveform can be constructed from a set of sine waves at 1x, 2x, 3x.. the fundamental frequency up to half the sample period just added together. However, if I want to isolate the fundamental frequency that turns out to be pretty easy. All you do is multiply each sample by the sine of the corresponding angle within the waveform and add the results together. If the fundamental is present, then its amplitude at any point will cohere with the sine wave itself, but higher frequencies will 'disappear', because their positive phases will end up getting multiplied by both the positive and negative phases of the reference sine wave. For example, consider an 8-sample wave containing a fundamental and 1st harmonic:

Sample# 0 1 2 3 4 5 6 7 Total
Ref Sine 0.000 0.707 1.000 0.707 0.000 -0.707 -1.000 -0.707 0.000
Fundamental 0.000 0.573 0.810 0.573 0.000 -0.573 -0.810 -0.573 0.000
^^^ x Ref Sine 0.000 0.405 0.810 0.405 0.000 -0.405 -0.810 -0.405 3.240
1st Harmonic 0.000 0.210 0.000 -0.210 0.000 0.210 0.000 -0.210 0.000
^^^ x Ref Sine 0.000 0.148 0.000 -0.148 0.000 0.148 0.000 -0.148 0.000

To fully calculate each harmonic you need to consider the phase of each harmonic. That's because a sine wave at any given phase can be generated by a pair of sine + cosine waves with two respective amplitudes; thus the above technique will only recover the sine wave component. For example, if the Fundamental was shifted by +90º, then the Fundamental * the Ref Sine would still end up with a total of 0, but here the wave * a Reference cosine wave would have an amplitude of 3.240.

Finding The Phase

Therefore, a Fourier analysis of the fundamental can tell us not only its amplitude, but also its phase. And it turns out we can obtain an accurate phase from relatively few samples. The phase is simply obtained from the vector obtained from ∑ waveform data * the Reference sine wave on the x axis ∑  waveform data * the Reference cosine wave on the y axis.

This means that even though all we have are monthly values for the temperature data, we can calculate the actual minima, zero-crossings and maxima at a much higher resolution.

The phase calculated is always relative to the reference angles. For example, if we started the reference angle at 30º and the samples were a sine wave starting at 30º, then the phase would still be 0º. If the reference angle was 0º, and the samples were a sine wave starting at -90º, then the relative phase would be reported as 90º, because the zero-crossing for the sine wave would be at 90º.

The phase therefore tells us the average temperature day and the minimum temperature day will be 90º earlier (or 91.31 days earlier). For UK temperatures, the minimum temperature is therefore reported as Jan 25.5. Ironically, this means that Burn's Night is the coldest.

There's one more aspect of the model that's worth mentioning, which is that the reference phases aren't equidistant, because the months don't all have the same number of days in them (though it's close). Therefore, in this calculation, the reference dates are taken from the mid-point of each month, on the basis that the average temperature for that month represents the temperature half-way through the month.

Minimal Temperature

Temperatures:

Your browser does not support the canvas element.