Sunday, 8 May 2022

A Tale Of Two Banners: VIC-20

 I previously posted about a Banner program I wrote for the 40th anniversary of the ZX Spectrum. One of my friend's followers tweeted that he always thought my friend was a Commodore C64 owner, who could PEEK and POKE with the best of them.

This set me thinking - what would a Commodore C64 version be like? And then, because a C64 version would be too easy, what would an unexpanded VIC-20 version be like? For sure, it's more challenging than a ZX Spectrum version.

Here's a bunch of reasons why:

  • The VIC-20 has a smaller screen area, just 22 x 23 characters; so an 8x5 character banner can't be done with 8x8 pixel characters.
  • The VIC-20 has no graphics commands. It can redefine the character set and that's about it. It can't easily PRINT AT any location on the screen.
  • The VIC-20 supports an ink colour per character, but only a global paper colour. That's because it only has 4-bits per colour byte instead of 8-bits (which gives room for an ink + paper per character). Therefore, I can't use the same colour trick as the ZX Spectrum.
  • The VIC-20's INKEY$ function (GET stringVar$) doesn't return proper ASCII upper and lower case characters, but PETSCII codes (weird).
  • The VIC-20 fouls up the character set pointer when you press Shift+[C=].
Nevertheless, I was able to do it, and here I'll describe how:














Smaller Screen Area

The VIC-20 has a smaller screen area, and if I understand it correctly, the screen can't be more than 512 characters (though they can be double-height!). Normally the screen is 22x23 characters, which isn't enough to fit 8 characters across made up from Battenberg, 4x4 pixels each. You'd need 32 characters across for that. However, it's almost enough to support 8 characters across made up from 6x6 block graphic fonts from 4x4 pixel Battenberg graphics.

And... the VIC-20 screen size can be redefined. By making it 24x21 there's room for 8 characters across x 7 characters down, even more than the ZX Spectrum!

Of course, on a VIC-20 it has to be done using POKEs:

1000 A=7504:POKE 36864,10: POKE 36867,42: POKE 36866,152


So, 36867 is the number of rows, *2 in bits 1..6, Address 36866 is the number of columns in bits 0..6. The default values were 46 and 150 respectively, so I changed them to 21*2 for 21 rows and 152 for 24 columns.

Where does all the information about the POKEs come from? Well the most concise information I've found is from here, an extensive resource on the VIC-20 memory map.

The values can be directly poked in, though I'd start with changing the dimension that gets smaller, so that the screen area is always <512b.

Finally, we need to adjust the left-hand side of the screen so that it's better centred. Address 36864 does that and changing it to 10 was found by experimentation.

There Are No Graphics Commands

However, the VIC-20 can display graphics characters, and there are Battenberg block graphics characters inherited from the Commodore PET. Strangely, and unlike the ZX81 or ZX Spectrum, they don't have a very logical order. Instead, in the sequence I'd use, the codes are:

9000 DATA 32, 124, 126, 226, 108, 225, 127, 251

9010 DATA 123, 255, 97, 236, 98, 254, 252, 160


Given that we have all the Block character graphics selected, all we need to do now is define the character set in terms of them. Unfortunately, that's not trivial either.  The first thing I did was to take a 6x6 bitmapped character set I'd used for a FIGnition example program:


 

















I have a java program which reads opens the image as a .png and then copies the pixels to an array where they can be subsequently transformed into a different image format.

I needed to be able to transform the character bitmaps so they could be represented in VIC-20 BASIC. I couldn't encode them as proper full bytes, because all 256 symbols can't be typed. I could have encoded them as 6-bit text, but again, the odd non-ascii use of VIC-20 characters made that more complex. So, I simply encoded them as 4-bit text using the characters A..O and then indexing each character (-65) in an array of Battenberg graphic characters. This meant the 96 printable characters would take up 864 bytes in themselves+ some overhead for the individual lines and BASIC commands, a good chunk the unexpanded VIC-20's 3.5kB memory space! Encoding as 6-bits could would have saved 33%, about 288 bytes.

Unfortunately, it wasn't likely to be feasible to just store the whole font in strings, so I figured that I could store them in DATA statements and then do RESTORE line to point to the right data statement where the character I wanted was defined.

Unfortunately, the VIC-20 only supports RESTORE to the beginning of the program. So, instead - yet again (and this is a common theme) I had to use memory PEEKing. I placed the data statements at the end, and when I'd read all the other data in the setup, I stored the system variable for where the DATA statement pointer was, and then literally PEEKed the right memory location to get the bytes.

It's possible to do a PRINT AT on a VIC-20 by printing the home and cursor control characters. Home is an inverse S, which you can display by literally typing PRINT " and then the home key, because the VIC-20 re-interprets keystrokes within quotes and similarly, you can move the print position to different locations by typing PRINT " and then the cursor keys themselves, for the same reason. This means that the VIC-20's screen editor, which is usually easy to use turns into a pain within quotes, because moving the cursor starts overwriting the rest of the text, so you have to wrestle with it to get it back into proper cursor mode (typing " usually works).

And colours work the same way, you type PRINT " and then a colour key and it will change the INK colour.

So, you can assign these to strings and then print "[HomeKey]";LEFT$(CD$,Y);LEFT$(CR$,X); to get the the right location, but it's fairly slow compared with poking directly into screen memory at 7680+22*row+column and of course, the cursor key technique doesn't work when the screen dimensions have been changed!

So, POKEing the screen is the best solution and you have to poke the colour attribute byte too, because the VIC-20 for some reason doesn't fill it in when it displays spaces. Clear screen, for example (PRINT "[Shift+Home]"; ) doesn't fill the attribute bytes with the current ink colour; it just clears the text bytes.

This is why in the real code I have to clear them explicitly:

FOR F=7680 TO 8183:POKE F,42:NEXT F

And the reason why it's code 42 and not 32 will be explained next:

Producing The Diagonal Stripes

I was pleased with how I generated the diagonal stripes on the ZX Spectrum, as it's a challenge when only 2 colours are allowed per character, and, helpfully enough, the VIC-20 does have a diagonal character!

Yet, doing the same thing on a VIC-20 is several times harder, because only 1 unique foreground colour can be defined per character and clearly we need two. Yet, it is just about possible, but only just!

The solution is that the VIC-20 supports 2-bits per pixel colours on a character-by-character basis, by setting bit 3 of every colour attribute byte. Each bit pair then selects one of four possible colours:

00: Which is the paper colour, bits 7-4 of location 36879.
01: Which is the border colour, bits 3-0 of location 36879.
10: Which is the auxiliary colour, bits 7-4 of location 36878 (the bottom 4 bits are the sound volume level).
11: Which is the ink colour of the character.

This means that one diagonal half can have a choice of 3 possible colours, while the other diagonal half (ink) can have a choice of 7 possible colours. We need to handle 5 colours: the black background (paper), Red, Yellow, Green and Cyan.





Using pairs of pixels also forces us to pair up the rows in the UDGs giving us an effective resolution of 4x4 for each character. You can see that the stripes are more blocky than an ideal 8x8 diagonal would be.

It also means we can't use the standard VIC-20 diagonal graphics character, because we actually need 5 different types of diagonal characters with bit pair combinations of xx/11 and 11/xx. This means we have to allocate space for a character set and in turn that means we can't use the built-in block graphics characters, we have to defines copies of those too. In total we need 16+5 characters (though in fact I used 16+6). In essence, then we need to first allocate space for the graphics characters:

5 POKE 52,29:POKE 51,80:POKE 56,29:POKE 55,80:PRINT CHR$(8);:CLR

Allocate the character set pointer to give us 64 graphics (thus the first character will be at code 64-6-16 = 42) and assign the Auxiliary and background colours.

1100 POKE 36878,112:POKE 36869,255:POKE 646,1:POKE 36879,11:P=7680

Copy over the block graphics from ROM (we could calculate them, but this is easier).

1010 READ P:P=P*8+32768

1015 FOR F=0 TO 7:POKE F+A,PEEK(P+F):NEXT F

1020 A=A+8:IF A<7632 THEN 1010

...

9000 DATA 32, 124, 126, 226, 108, 225, 127, 251

9010 DATA 123, 255, 97, 236, 98, 254, 252, 160


Generate the stripes characters:

1030 READ N,M:FOR F=0 TO 6 STEP 2:POKE A+F,N:POKE A+F+1,N:N=(N*4+M)AND 255:NEXT F

1040 A=A+8:IF A<7680 THEN 1030

...

9020 DATA 2,2,168,0,86,2,169,1,254,2,171,3


Clear the screen the hard way:

1105 FOR F=7680 TO 8183:POKE F,42:NEXT F:PRINT “[Home]”;


Then read the character codes for the stripes and place them at the right locations.

1120 FOR X=0 TO 4:READ N,M:P=8176+X

1130 FOR F=0 TO 7-X:POKE P+30720,M:POKE P,N:P=P-23:NEXT F

1140 NEXT X

...

9030 DATA 58,10,63,10,62,13,61,13,60,8


Ascii Code Conversions & Stopping Case Swapping

You can swap between Capitals + Graphics and Capitals and Lower Case (+ a few graphics) on the VIC-20 using Shift+[C=]. However, this doesn't affect what character codes are read by GET x$. Normal lower-case characters return upper-case ASCII characters and holding down shift gives the same codes + 128.

Fortunately, that's just a simple case of mapping the characters:

111 K$=CHR$((ASC(K$)+(32 AND (K$>=“A” AND K$<=“Z”)))AND 127)

Also, fixing the case swapping issue is fairly easy, it's done by printing a control character: PRINT CHR$(8) in line 5.

Conclusion

Early 80s computers had to be creative with graphics hardware, because the relatively high memory costs limited graphics detail, and lower memory bandwidth limited the range of colours. The ZX Spectrum and VIC-20, at first sight provided a very similar style of graphics, using 1 bit per pixel + an attribute byte for colour per character, but short-cuts in the colour memory (only 4-bits per character instead of 8) added even more limitations.

Consequently, porting a program from one architecture to another often involved a lot of additional work to map or work around the respective limitations. In the case of the VIC-20, a critical aspect of the Banner program (the diagonal red, yellow, green and cyan stripes against a black background) were only made possible by the VIC chip's ability to support 2 bit per pixel multi-colour graphics, plus the ability of one of those colours to be the ink colour at the character. An ordinary 2 bit per pixel graphics mode, such as that offered by the 6847 graphics chip could not have reproduced the stripes, even though, at a minimum, 96x84 pixels graphics would need 2kB of RAM vs the 932 bytes of RAM actually used.

Finally, even the differences in the implementation of what was accepted as the standard microcomputer language: BASIC could have serious ramifications; and often hacking directly into the OS or memory map was the only solution.

The Banner program is a great, and simple way of exploring the architectural differences, and at the end of it, it's fun to type out colourful chunky characters across the whole screen!

The Listing

Finally, here's the listing! There's about 1kB free on the unexpanded VIC-20 once it's been typed in. In VICE it's possible to copy and paste a line at a time, but you need to convert the characters to lower-case first!

5 POKE 52,29:POKE 51,80:POKE 56,29:POKE 55,80:PRINT CHR$(8);:CLR

10 POKE 36869,240:GOSUB 1000

100 FOR X=0 TO 2:BG(X)=PEEK(P+48+X):POKE P+48+X,54:NEXT X

110 GET K$:IF K$=“” THEN 110

111 K$=CHR$((ASC(K$)+(32 AND (K$>=“A” AND K$<=“Z”)))AND 127)

112 F=ASC(K$):IF F<32 AND F<>13 THEN 110

113 FOR X=0 TO 2:POKE P+48+X,BG(X)::NEXT X

116 IF ASC(K$)=13 THEN P=INT((P-7680)/24)*24+7752:GOTO 160

120 C=ASC(K$)-32:C=C0+(C AND 3)*9+INT(C/4)*44

130 I=INT(RND(0)*7)+1

140 FOR Y=0 TO 2: FOR X=0 TO 2:POKE P+X,PEEK(C+X)-23:POKE P+X+30720,I:NEXT X

150 P=P+24:C=C+3:NEXT Y:P=P-69

155 P=P-7680:P=(P-INT(P/24)*24)+INT((P+48)/72)*72+7680

160 IF P>=8184 THEN P=7680

170 GOTO 100

999 POKE 36869,240:POKE 36864,12:POKE 36866,150:POKE 36867,174:STOP

1000 A=7504:POKE 36864,10: POKE 36867,42: POKE 36866,152

1005 DIM BG(3)

1010 READ P:P=P*8+32768

1015 FOR F=0 TO 7:POKE F+A,PEEK(P+F):NEXT F

1020 A=A+8:IF A<7632 THEN 1010

1030 READ N,M:FOR F=0 TO 6 STEP 2:POKE A+F,N:POKE A+F+1,N:N=(N*4+M)AND 255:NEXT F

1040 A=A+8:IF A<7680 THEN 1030

1100 POKE 36878,112:POKE 36869,255:POKE 646,1:POKE 36879,11:P=7680

1105 FOR F=7680 TO 8183:POKE F,42:NEXT F:PRINT “[Home]”;

1120 FOR X=0 TO 4:READ N,M:P=8176+X

1130 FOR F=0 TO 7-X:POKE P+30720,M:POKE P,N:P=P-23:NEXT F

1140 NEXT X

1150 C0=PEEK(65)+256*PEEK(66)+7:P=7680

1999 RETURN

9000 DATA 32, 124, 126, 226, 108, 225, 127, 251

9010 DATA 123, 255, 97, 236, 98, 254, 252, 160

9020 DATA 2,2,168,0,86,2,169,1,254,2,171,3

9030 DATA 58,10,63,10,62,13,61,13,60,8

9100 DATA“AAAAAAAAAAKAACAACAFFAAAAAAANNINNIBBA"

9110 DATA"AOIBOADKAPECEGICBCJIAJMCBCCAKAAAAAAA"

9120 DATA"AJAAKAABAAGAAFAACAIIIFPACCCAKADLCACA"

9130 DATA"AAAAEAACAAAADDCAAAAAAAAAACAAECECACAA"

9140 DATA"JHIOCKBDAEKAAKABDADDIJDADDCDDIBDIDDA"

9150 DATA"EHAONIABALDCDDIDDAEDCLDIBDADDKAJAACA"

9160 DATA"JDIJDIBDAJDIBHCBCAAIAAIAAAAAAAACAECA"

9170 DATA"AJABIAABAMMIMMIAAABIAAJABAAJDIADAACA"

9180 DATA"JLIKDCBDCJDILDKCACLDILDIDDAJDCKAABDC"

9190 DATA"LGAKECDCALDCLDADDCLDCLDACAAJDCKDKBDC"

9200 DATA"KAKLDKCACBLAAKABDAAHCAFABCAFECFGABAC"

9210 DATA"FAAFAABDCOEKKCKCACOAKKGKCACJDIKAKBDA"

9220 DATA"LDILDACAAJDIKGKBDCJDILLACBCJDABDIBDA"

9230 DATA"DLCAKAACAKAKKAKBDAKAKGECACAKAKKKKBBA"

9240 DATA"GECEGACACGECAKAACADHCECADDCALAAKAADA"

9250 DATA"GAAAGAAACAHAAFAADAEGAAAAAAAAAAAAAMMM"

9260 DATA"EDAHCADDCEMAKFABDAOIAKFADCAEIAKAABCA"

9270 DATA"ENAKFABDAEIALDABDAEDAFDABAAEMAGNAEJA"

9280 DATA"OIAKFACBAAIAAIAACAACAAKAECAKAALLACBA"

9290 DATA"KAAKAABCAEIAPFACBAMIAKFACBAEIAKFABCA"

9300 DATA"JGAOJACAAJGAGNAABCAMAFAABAAAMABIADAA"

9310 DATA"FIAFAAADAIEAKFABCAIEAOCACAAIEAPPABCA"

9320 DATA"IEAFKACBAIEAGNAEJAMMAECADDAAJABKAABA"

9330 DATA"AKAAKAACABIAALABAAJJAAAAAAAJHIKHKBDA”



Friday, 6 May 2022

A Tale Of Two Banners: ZX

For the 40th Anniversary of the ZX Spectrum's announcement on April 23rd, 1982, I decided to write a little banner program.

Here's an example of what it can do:

May be a cartoon

ZX Spectrum BASIC is blessed with a large number of useful commands to access graphics, so the program was quite easy to write and relatively short:

No photo description available.

No photo description available.

As you can see, there's very little to this program. How does it work?

The main part of the program, obviously is the mechanism for enlarging the characters. This is surprisingly easy. The ZX Spectrum has 16 graphics characters arranged as Battenberg (as in the cake) graphics which occupy character codes 128 to 143.


The ZX Spectrum also has a POINT(x,y) function, which returns the pixel value at coordinates (x,y). By POINT ing at (x,y), (x+1,y), (x, y+1), (x+1, y+1) we can simply multiply each point by the appropriate Battenberg pixel value and add 128 to generate the correct graphics character. This is why you can see a tiny version of the '!' character in the bottom right-hand corner: the standard character is displayed and then it is scanned and the appropriate Battenberg characters are generated.

The program allows for the banner to be edited by simply supporting a carriage return feature and by wrapping the bottom line back to the top line.

The harder part is to generate the stripy coloured lines at the bottom right-hand corner. The difficulty with the ZX Spectrum is that it's not possible for more than 2 different colours (an INK colour and a PAPER colour) to occupy the same character square. So, to make the stripes we have to carefully make sure that each stripe is exactly 1 character wide and map the colours so that the stripe never clashes.

To do this we define a graphic character, UDG "a", whose address in RAM is returned by USR "a" as a diagonal triangle filling the bottom half of the character. This gives us the diagonal stripes by using the previous ink colour as the next paper colour on the next stripe - a classic ZX Spectrum colour trick! (You can see how it works from the image below where alternate diagonal rows are brighter ).

No photo description available.

To make the program a little bit more fun, each character is assigned a random colour and the cursor flashes.  Because the ZX Spectrum came out in a 16kB form, and one of my friends had one at the time, I thought it would be considerate to make sure the program would run on either model (it's so short, I could hardly fail 😉 !).

You can type in the program by going to the ZX Spectrum Javascript website! Here's the keyboard to help you! https://jsspeccy.zxdemo.org .

Following this I thought I'd do the same, for the Commodore VIC-20. It turns out that, despite arguably superior graphics capabilities (well, it can do 2 bits per pixel graphics!), it's far harder (see Part 2).


Sunday, 1 May 2022

Gini Sim: Interactive

 In May 2014 I wrote a post on modelling the  Lorenz Curve,  which is an income or wealth curve whose curvature is expressed as the Gini Coefficient. In this model an ideal society has a straight-line curve and the more unequal a society is, the greater the curvature.

The post shows how a pure, free-market economy naturally gives rise to the highest possible gini coefficient, approaching 1 over time. This is the case even if the population involved has no ulterior profit motive and all participants play by the same, equally applied rules. The post provides a program, written in JavaScript which simulates the process, but the program doesn't run, it's just a listing.

Informally, the algorithm works as follows. There are 100 players. At first each player is given the same amount of cash: $10. $1 is randomly taken from the pool of money, and so the player who owned it now has $1 less; then another $1 is picked randomly from the remaining pool and the player who owns that one is now given the previously taken $1. So, usually, one player loses $1 and another player gains $1 (unless the same player gets picked for both steps).

Intuitively you would think that the probabilities would even out. As a player loses money, they are less likely to have money taken from them (and given to them), but likewise, as a player gains money, they are more likely to have money taken from them (and also given to them).

This is not what happens. Instead as players gain money, they are more likely to gain in subsequent transactions. This is because the probabilities change between transactions, in favour of previous winners. For example, consider a situation near the end game where one player has $1 and the remaining player has $99,999. Although 99.999% of the time, the dominant player will have $1 removed, 99.999% it will be returned with another 0.001% of an opportunity that it goes to the lesser player. However, in the 0.001% of the time that the lesser player's $1 is removed, it becomes impossible to receive that $1, and in subsequent plays, they now have a 0% of winning.

In the real world, this corresponds to the way in which larger players, who occupy more of the market, are more likely to be chosen to trade with: thus increasing their market share. In this version, the javascript is embedded in the article itself and thus it can be played live. You can see a Lorenz Curve being mapped out in realtime as it becomes more extreme. A variant allows you to generate interest with a given probability (interest works by leaving the 'loser' with the original $1 they had), but it has no effect on the overall outcome: the richest get richer while the poorest lose everything.




Simulation

Your browser does not support the HTML5 canvas tag.

Saturday, 30 April 2022

Cubase Re-Lited Part 1

In the mid-1980s my parents kindly bought me a Casio CZ-101. It was a great introduction to synthesisers and my way back into keyboard instruments. I had been introduced to synthesiser music thanks to some of  school friends lending me albums by Jean Michel Jarre (Oxygene and Equinoxe), Vangelis (L'Opera Sauvage, Apocalypse Des Animaux, Heaven and Hell) and Tomita (Pictures at An Exhibition, The Planet Suite).

At the time I'd been learning the trumpet, but my mum was well-known in the locality, for playing the piano and organ in primary schools and working-men's clubs, alongside teaching piano the music theory, but I was never able to pick it up - my sister did much better than me, getting up to grade 3 or 4.

The CZ series was largely dismissed since the 1990s, primarily because CASIO wasn't associated with professional instruments; the Yamaha DX series were more capable, but also because new synthesis techniques superseded this early digital techniques. However, in recent years, as artists have increasingly mined and revisited older technology, Phase Distortion and the CZ series has been re-evaluated, alongside the emergence of DAWless production.

Over time I acquired a few more instruments: a Yamaha SY-22; an early Sample and (FM) Synthesis, but capable of creating interesting dynamic soundscapes thanks to its vector synthesis feature. For a while I had a Roland MT-32, which could handle up to 8 parts + a drum kit channel.

Around the same time I bought a copy of Cubase lite (bundled, I believe with a MIDI interface) for my Macintosh Performa 400 and with it sequenced quite a number of pieces and songs. Cubase Lite was well within my needs and even when running on a Mac Plus, it could handle everything I threw at it.

Stupidly I sold it because I didn't think that its sounds were particularly interesting and I never took the opportunity to learn how to program it properly. Finally, I bought a super-ugly Yamaha TQ-5 (for £35 on Ebay); which is a superb 4-operator FM synth / sound module with a built-in sequencer (horrible UI) and an effects unit. It's badly underrated as it's better than a DX-11.

Over the past two decades though I've used Garage Band for my music creation - again, because although it's an entry level program, it's still within my needs What I did miss was being able to connect Garage Band to my earlier synths.

However, now that DAWless music creation has become more fashionable, I thought I'd have a go at trying to re-unite my current keyboards and sound modules with a cheap or free, but simple to use DAWless MIDI application.

And it turns out that's quite a rabbit-hole. It's not that nothing is available. For example, there's a Linux program called RoseGarden. Now, typically, the first question I ask about any application, whether it's for Linux, macOS or whatever, is what are the requirements in terms of RAM and CPU? RoseGarden says it's "demanding". What does that mean? A 1GHz Athlon? A 2.5GHz Core i7? A Raspberry PI 3? Muse looks good, but there's no description at all about the hardware requirements. And at this point I'm already turned off. Why make the effort if my hardware might not run it? Also, as software developers have shifted from producing sequencers to full Digital Audio Workstations, they've blurred the descriptions of what their software does. For example, obviously Garageband does handle MIDI input, but it doesn't handle MIDI output, so I've found it progressively harder to figure out whether a given application would support what I want, or even how to ask the right kind of search question that would provide an answer.

Also, it's such a pain even just finding out about trivial things. Consider part of the Support FAQ (which of course, doesn't even tell you what CPU performance you need):

“MusE requires a high performance timer to be able to keep a stable beat. It is recommended that your operating system be setup to allow at least a 500hz timer, MusE will warn you if this is not available.”

An 8MHz Mac Plus could do this! An Atari ST could do this! An Amiga could do this! A 1980s PC running MS-DOS Could. Do. This! 

So, in the end, I figured that if a mid-80s 16-bit era computer (yes, the 68000 is 32-bit) and emulators for these computers are available that run much faster than the original hardware, then surely it should be possible to get an emulation of a Mac to run my original Cubase Lite and communicate with a MIDI interface.

Picking An Emulator?

Mini VMac

My normal go to Macintosh emulator is miniVMac. It is great and and easy to install on a number of platforms. It's also not too heavy on hardware requirements as it makes efforts to maximise emulator performance.

Also, someone has tried to interface miniVMac to real Midi hardware.

“Hi, i've implemented a midi bridge for Mini vMac which exposes emulator modem and printer midi ports to the host OS. so far no stuck notes, and sysex seems to work both ways. there's systematic jitter though which needs to be resolved.”
The cause of the systematic jitter is fairly easy to understand. Mini Mac has a fairly simple emulation core, which aims to emulate execution in 1/60th of a second chunks. That is, it works out how many instructions can be executed in 1/60th of a real second, according to the emulator's requested speed and executes them all in one go. It's not quite that simple, within the basic emulation loop it actually executes instructions until the next interrupt, but the same principle applies. It means that mini vMac only synchronises every ≈16.67ms. The upshot is that mini vMac receives MIDI events prior to its time slice all at once; sends multiple events too quickly during its time slice and worse still, is unable to send events prior to its time slice, as that would mean generating events backwards in time.





This is why messages end up with jitter. Now, the jitter is often not very noticeable, because there often aren't that many events every 17ms, but for an application which requires sub-millisecond latency, it will be enough to mess up recording and playback.

The developer of MiniVMac is slowly making progress on this (see newsletters 9 and 13), but for me, it's a non-starter.

Basilisk II

Basilisk II isn't capable of emulating a Mac Plus, instead it aims to emulate 68020 to 68040 Macintosh computers. Well, at least the most significant ones. Unfortunately, one of the ones it doesn't emulate is my Performa 400 (a.k.a LC II). In addition, Basilisk II substitutes the emulation of real Mac hardware with drivers to the equivalent peripherals. This will work in most circumstances, but if I understand things correctly, applications like Cubase Lite actually addressed the real hardware, in this case the SCC serial communications controller; rather than go through the driver.

PCE/MacPlus Emulator

I'd come across the PCE MacPlus emulator because of the Javascript version:

https://jamesfriend.com.au/pce-js/

I had originally thought it would be rather sluggish, because it was in Javascript, but in reality it performs pretty well. It was only very recently that I actually followed the links and found out it was a conventional emulator written in C++ and ported to Javascript. Importantly, it's possible to browse through the source code, at which point I found this crucial line:

#define MAC_CPU_SYNC 250
With a comment to say that the emulator is sync'd to the emulating computer that many time per second. So, I reasoned that, if I increased it to 1000, then the emulator's serial I/O's latency would be  

Then I came across a ToughDev article about using the PCE emulator to develop classic Macintosh applications, which gave a decent walkthrough:


Building it shouldn't be that hard, you just need X11-dev and SDL (the Simple Direct Layer) libraries, which I tried to first brew the build on my Mac mini; and later used apt-get on a Linux PC, but unfortunately in both cases, the configuration stage said SDL wasn't there.

So, then I tried it on a Raspberry PI 3 I had to hand, but the Raspberry PI 3 had its own problems with the latest version of apt-get. It kept complaining with messages a bit like:

W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://archive.raspberrypi.org/debian buster InRelease: Splitting up /var/lib/apt/lists/archive.raspberrypi.org_debian_dists_buster_InRelease into data and signature failed
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://raspbian.raspberrypi.org/raspbian buster InRelease: Splitting up /var/lib/apt/lists/raspbian.raspberrypi.org_raspbian_dists_buster_InRelease into data and signature failed
W: Failed to fetch http://raspbian.raspberrypi.org/raspbian/dists/buster/InRelease  Splitting up /var/lib/apt/lists/raspbian.raspberrypi.org_raspbian_dists_buster_InRelease into data and signature failed
W: Failed to fetch http://archive.raspberrypi.org/debian/dists/buster/InRelease  Splitting up /var/lib/apt/lists/archive.raspberrypi.org_debian_dists_buster_InRelease into data and signature failed
W: Some index files failed to download. They have been ignored, or old ones used instead.
I found that this was a common problem with libraries being updated to use the new Bullseye OS version, replacing Buster. Eventually I found couple [1], [2] of Raspberry PI forums that told me how to fix it. This meant typing in the following line:
sudo apt-get update --allow-releaseinfo-change
I was then able to follow the rest of the development installation as per the ToughDev web page on compiling PCE.
sudo apt update
sudo apt-get update
sudo apt-get install libx11-dev libx11-doc libxau-dev libxcb1-dev libxdmcp-dev x11proto-core-dev x11proto-input-dev x11proto-kb-dev xorg-sgml-doctools xtrans-dev
sudo apt-get install libsdl1.2-dev libsdl1.2debian
One problem I had was at the configure stage, I couldn't get it to acknowledge that SDL's configuration output supported SDL. It would always say things like:
Terminals built:         null x11
Terminals not built:     sdl

 To make it compile with SDL I had to type:
./configure -with-x --with-sdl=1 --enable-char-ppp --enable-char-tcp --enable-sound-oss --enable-char-slip --enable-char-pty --enable-char-posix --enable-char-termios
cp /usr/local/etc/pce/pce-mac-plus.cfg .

At this point I had a version of PCE that could actually run. In part 2 I'll describe the steps needed to get PCE to work as I'd intended.




Saturday, 23 April 2022

A Toast to the PowerMac 4400

 The PowerMac 4400 is probably one of the most hated Macs of all time. It wasn't really all that bad (merely Compromised, but it was a very non-Mac, style Mac which used a cheap PC case and had the floppy disk and CD drive the wrong way round). It's the Register's third most awful Mac:

“The Power Macintosh 4400 of November 1996 is widely regarded as one of - if not the - least distinguished Macs of all time. The best thing you could say about it was that it worked.”

And I had one. And I liked it. So, a couple of weeks ago I was wondering how much I'd actually paid for that 'lemon de jour' (thanks ArsTechnica) and I couldn't find out. It's like, surely everything is there on the internet. Well,  that's not quite true, I could find out the price in USD, but I needed to find the price of the PowerMac 4400 in GBP, because that's what I paid for it. And, I tried, for a least an hour or two: looking at Everymac, and LowEndMac and then just searching for reviews, and then checking out every online back issue of MacUser (UK) and Macworld (UK) I could find all to no avail.

It's just not there. Until now! Ready?

The PowerMac 4400 cost £799 for a 160MHz machine with a 1.2GB HD; 16MB of RAM; 8x CD ROM (not CDRW or DVD, this was November 1996!) and no display.

Now you know. But to tell you truth, I really liked mine. Let me explain the back story.

BackStory

In late 1996, I'd started an MPhil in Computer Architecture at Manchester University. At the time, I'd had a Performa 400 (the first Macintosh I'd bought); and with it I'd added a Zip drive; extra RAM and HD and put it on the internet, which I'd then used to get in touch with Steve Furber (who invented the ARM processor) to be able to get an interview for the course I wanted to do.

But the Performa 400 was never going to be capable of handling an MPhil (in retrospect I think it might have); so I was quite keen on replacing it with a funky, new PowerPC Mac.

I was actually quite keen on an all-in-one Performa 5200/5300 as they were the cheapest and I was a student. In fact I'd used one to do my Manchester University application on, while house-sitting for a friend and was impressed, but 18 months later better ones were available.

I'd been most impressed by the Apus 2000 Mac clone, but no-one seemed to have any any, but then came across the 5260, which seemed to have a similar spec. I'd gone as far as placing an order for one; and then discovered that it could only handle a 640x480 display! I was appalled; and cancelled the order.

I was getting fairly desperate for a new Mac (my Performa 400 was a shocking 3.5 years old by then 😮  ), but when I opened the latest issue of MacUser, I saw that there was a review of the PowerMac 4400, which even when adding a monitor was cheaper, and better than the 5260.

And then in the same magazine I saw that they were on offer at a Mac dealer called Gordon Harwood, based in the backwoods of Alfreton, Derbyshire.


Weird, a decent, successful Mac dealer, not in a big city! And more importantly, on the way from Manchester to my parent's house.

So, I promptly rang them up; reserved one (they had them in stock); then went to my parents for the weekend and bought it, either on the way down, or on the way back. I added another 16MB of RAM (I believe, taking it up to 32MB); and a Sony 15" Triniton SX; along with a student discount.

In Use

And frankly, it was a great computer! It could do everything I could throw at it; including full-screen QuickTime videos. The 8x CD ROM felt really nippy compared with my PowerCD. The Hard disk did seem a little small, even then and I had to shove quite a lot of stuff onto my Zip drive. I bought Nisus Writer 6 to write my thesis on it, and Nisus Writer 6 was an absolute joy of a word-processor (I also had Clarisworks 3 for day-to-day office documents). I even had a version of SoftWindows so that I could run Turbo C++ 4.5 and the H8, IAR compiler I was using to continue my Heathrow Express and Midlands Metro firmware.

I did most of my development on Metrowerks CodeWarrior 10 Gold, a multi-target 68K / PowerPC 'C' compiler with a great Object-oriented application framework called PowerPlant.

Later I bought a PAL based PCI TV card that worked with that Mac. 

Stupidly I Sold It

Things all went wrong when my PowerBook 100 got stolen from the University. It had been quite handy to have a laptop as a sort of satellite computer and my main computer back at home and I'd bought the PowerBook 100 with a 30MB hard drive just a few months before I'd bought the PowerMac 4400 and already I couldn't imagine computing without a laptop.

But when it got stolen I had a dilemma, which basically revolved around moving to only a laptop or having both. For a while I used a PowerBook duo 230 from Steve Furber himself (he'd just upgraded), but it came with the full dock and it was impossible to sell just the dock for a mini-dock as people wanted both. But the full dock took up quite a bit of space. So, eventually I figured I should sell both the PowerMac 4400 and get a PowerBook, and because the 4400 was quite cheap, I ended up with a low-end PowerBook 5300 (black and white), but with a second monitor card.

It wasn't terribly reliable, the HD was small and then it started to fail in the spring of 2000.

So, in the end, I think it was a bit of a mistake going down that route. The PM4400 would have done a good job of running up to Mac OS 8.x and completing my thesis; while I could bided my time and then chosen a more suitable laptop (e.g. a Powerbook 150, because although bulky, it supported an IDE drive) or perhaps even waited until after my thesis.

Conclusion

I originally wanted to know how much the PowerMac 4400 cost, to help work out if I've been spending more or less on Macs over time, in particular my MacBook Pro. In reality it's been quite variable with the PM 4400 in the middle of the pack. It was one of those rare occasions where the internet didn't have the answers, so I had to hunt down the facts myself.





Thursday, 26 August 2021

Virtually Lost: An Alternative Intel 80286 Protected Mode

 The Intel 80286 was the true successor to the unexpectedly, overwhelmingly dominant 8086. This post is intended to be part of a larger series on its protected-mode architecture, an alternative Paged-based memory management. We show that a far simpler design could have achieved far more for Intel Operating Systems.

The development of the 80286 is covered pretty well in the 80286 Oral History and some anecdotal information can be found in Wikipedia.

The original 80286 Protected Memory mode is a highly sophisticated, purely segmented design modelled on the Multics segmentation, that saw almost no practical use beyond providing access to the full 16MB physical address space in real Intel 80286 Operating Systems, including MSDOS, OS/2, Xenix (probably) and Windows. We won't discuss that further here (it's for future posts), instead we'll discuss a paged alternative called the 80286p here.

80286p Overview

If the designer(s) of the 286 had had enough foresight and a willingness to break with the ideology of Intel's i432 albatross, they could have implemented a paged memory version of it with simple 4Kb pages in a 16Mb Physical and virtual address space. This could have made a lot of sense given that the 80286 was only supposed to be a stop-gap CPU design until the i432 was released.


Address Translation


A simple 16-bit VM architecture for the 286 could have redefined a segment register to point to one of 64K x 256b pages. This would have extended the virtual address space to 16Mb with the same kind of incompatibility as an actual 286 whilst being conceptually similar to the 8086.

In fact the 8086 designers did consider an 8-bit shift for segments, however they rejected this in favour of 4-bit shifts on the grounds that its 16MB address space couldn't be accommodated in a 40-pin package without sacrificing other hardware design goals.


The VM side comprises of a 12-bit translation and 12-bit tag for user code access only and four access modes (none, code, read-only and read-write data ) for a total of 26-bit TLBs. Assuming the same register resources as an actual 80286, the four 48-bit descriptor caches and additional VM supporting registers could provide for up to 8 TLBs backed by a software page table (which would only need to be 8Kb in size at maximum) and kernel mode could be purely physically addressed). 8 TLBs isn't a lot, but even some DEC VAX computers only supported 8 TLBs.


Enabling the MMU


The PMMU is enabled using bit 15 of the the original 8086 flags register (which is defined to be 0 for the 80286 and 80386). Setting it to 1 enables the PMMU; resetting bit 14 to physically addressed kernel mode; where bit 13 is then "don't care" and full I/O access is automatically supported.


An MMU fault pushes the access mode used and virtual page tag onto the stack and switches to physical addressing (flags.i must be 0 and flags.k must be 1 for the MMU to translate addresses and flags.i can't be changed if virtual addressing is on). All further MMU handling is in software. The MMU uses an LRU (least recently used) algorithm for replacing TLBs (essentially a 3-bit counter): on return from the fault handler; the least recently used TLB gets replaced by the updated access mode and translation address.


The initial entry into user mode can be achieved by creating a system virtual page table containing translations to the current thread of execution; then setting bit 14. The following execution address causes a TLB fault, leading to the VM entry being mapped to the current physical page and execution continues. This implies at least one VM user process should be allocated to the kernel for 'booting' up user mode and User-side management (a 64kB kernel would only need 16 entries). Kernel mode support requires a kernel mode SS and SP register pair; this means that user mode is expected to provide its own settings for SS and SP.


Software Page Tables


A VM algorithm can be extremely simplistic even if we want to do is support a number of user processes in a multiple virtual memory space; while caching a fixed swap space and ignoring any virtual kernel mode. The TLB uses a round-robin algorithm and the instruction MOVT loads TLB[cl]'s physical translation from AX. A process's virtual address space has a simple organisation, a fixed region for code and read-only data followed by a space for read/write heap memory and finally a stack region which must be <64kB (because the stack is limited to a 64kB single segment). Each entry in the VM table references a Physical PTE and because we can deduce access rights from the VM tag, we don't need to store access rights within each VPte, so the PTE limit is up to 64K*4096 pages, or 256MB, easily enough for the lifespan of the 80286p (though only 16MB is actual physical memory, the rest are swap page entries).


We also assume that although there's a single user-space, an application will allocate a fixed code space + stack space and all data space is a shared, dynamically allocated and freed space. A virtual memory map looks like this:

A physical memory map also contains the dynamic memory allocations and application allocations. Because the code and stack spaces are fixed, it's simple to test for access violation by reading the page table (Page=(Seg>>4)|(Addr>>12)). The rules are fairly simple. If there's an access violation, then at least the access rights should match, otherwise it's a real access rights violation (erroneous code). Then if the translate address is in swap space, we page it into physical memory at the next page (mod user pages); paging out the previous virtual page at that physical address. If the page was RW, then we update the TLB as Read-Only, else we update as the actual page.

A Simplistic Swap Algorithm

Although there's more to a Virtual Memory implementation, the Swap algorithm is central. Here's a simplistic swap algorithm, which supports up to 256MB of swap space and 16 processes each of which can be up to 16MB.

void VmSwap(uint16_t aAccess)
{
  uint16_t tag=aAccess&0xfff; // got the page.
  uint8_t fault=(aAccess>>12)&kTlbAccessMask;
  uint8_t realAccess=VmAccess(gVmVPte, tag); // proper access rights
  uint16_t trans;
  if(fault==kVmAccessRo && realAccess==kVmAccessRw) { 
    gVmVPte->iPages[tag]=(aAccess|=(kVmAccessRw<<12));
    return;
  }
  trans=gVmVPte->iPages[tag]; // Phys page (possibly in swap)
  if(fault!=realAccess) {
    return Trap(&aAccess, &aMap); // Application faulted.
  }
  else if(trans>gVmPhys){ // access is OK and paged out; swap out next page (if needed).
    uint16_t swapOut=gVmPte[gVmPhysHead]; // vpte and vpte entry.
    tVmVPte *vPte=gVmVPteSet[swapOut>>12]; // got the process vPte.
    uint16_t swapBlk;
    swapOut&=0xfff; // each virtual table is <=4095 pages.
    if(vPte==gVmVPte) { // the swapOut page might be in the TLB.
      uint16_t tlb, tlbTag;
      for(tlb=0;tlb<kVmTlbs; tlb++) {
        __asm("mov cl,%1",tlb);
        __asm("movt ax,cl");
        __asm("mov %1,ax",tlbTag);
        if((tlbTag&0xfff)==swapOut) {
          __asm("xor ax,ax");
          __asm("movt cl,ax"); // clear the swapOut page from the TLB if so.
          tlb=kVmTlbs; // force end of for loop.
        }
      }
    }
    if(VmAccess(vPte, swapOut)==kVmAccessRw) { // write back.
      swapBlk=gVmSwapBase+gVmNextOut; // the swapout tail.
      SwapWrite(swapBlk, ((long)((gVmPhysHead)+gVmUserBase)<<20),kVmPageSize);
      vPte->iPages[swapOut]=gVmNextOut; // save swapped out location.
      gVmNextOut=gVmPte[gVmNextOut]; // Pte entry for free block points to next free.
    } // otherwise we don't need to write back.
    else { // Code and Ro pages still need to update the vPte.
      vPte->iPages[swapOut]=vPte->iRoBase+swapOut;
    }
    SwapRead(trans, ((long)((gVmPhysHead)+gVmUserBase)<<20),kVmPageSize);
    gVmPte[gVmPhysHead]=(gVmProcess<<12)|tag; // update Pte
    gVmVPte->iPages[tag]=gVmPhysHead; // update VPte to point to phys mem.
    swapBlk=gVmSwapBase+(gVmVPte[tag]<<gVmPtePerPage);
    if(++gVmPhysHead>gVmUserLim) {
      gVmPhysHead=0; // reset.
    }
  }
  aAccess=(aAccess&0xf000)|((gVmVPte[tag]&0xfff)+gVmUserBase); // Return the new Phys page
}

The PTE can do double-duty as both a reference to a Virtual table and a given entry within it, and as a reference to the next free page for modified Read/Write pages for spare pages. Swap-outs for non-modifiable pages don't require any writes and therefore they never move - they can be obtained by storing them in contiguous swap blocks when the application is loaded (moving other pages out of the way if needed, and if there's no space, then the application can't load).

We have to provide a means of invalidating specific TLB entries, because it's possible that a swapped out page is currently in the TLB, because it's part of the same process and then two different tags could map to the same physical entry. Thus, instructions to load and store TLB entries (movt ax,cl and movt cl,ax) are the minimum needed.


In this Vm system, dynamic memory allocations (including stack space) would allocate a read-write block in virtual memory space (which may currently be mapped into physical user space); code allocations would copy all the code to virtual memory. Similarly, deallocations would free the block in swap space. To create a new program with a given code and stack space, the heap between the end of the current code space and the additional code and data space must be free (the program must defragment the heap if needed to do this). User code can't access Kernel space in kernel mode, instead they're accessed via the INT interface. The Physical Page table is much smaller than the VPTE, comprising of, for example, only 128b for 256kB of physical memory (the IBM AT in 1984 only came with 256kB as standard), and would be smaller still given that the kernel space wouldn't be included.


But within these limitations it can be seen that a virtual memory implementation would be relatively simple, easily possible with an early 80286p operating system.


The 80286p could also support 8086-compatible mapping, whereby the segment is only shifted 4-bits, providing a virtual memory space of 1Mb (via a second flags register). The standard 80286p method for enabling the MMU and clearing the TLBs is to turn it off (by resetting the MMU flag) and then turning it on again. The TLB can have a simple 3-bit LRU head register, initialised to 0. Unmapped or access right faults lead to page faults which cause the next TLB entry to be updated with the returned physical page and access rights (the virtual page is unchanged). Thus initialising the TLB with all 0's means that no accesses will initially map correctly.

More Limitations

The original 80286 could virtualise interrupts (by providing an interrupt trap), but in this implementation, user code can't service interrupts. However, OS routines could provide mechanisms for jumping to user interrupts if needed.

The original 80286 provided mechanisms for jumping to different protection levels, but the 80286p supports only a physically addressed kernel and a virtually addressed user mode.

The original 80286 supported user I/O access, so it's possible that the 80286p could do too on a global basis. This would allow Windows 3.1 style user-side I/O access.

The original 80286 could support thousands of processes, because every LDT (Local Descriptor Table) could be a process. The 80286p doesn't really support any processes, but the simple software implementation above supports 16. This would be a small number by the standards of Unix in the 1980s, but desktop computer operating systems such as OS/2 1.0 and Mac OS Classic supported only a limited number of applications in memory (Mac OS Classic had a shared memory space too). Extending the Physical page table to 32-bit per entry could provide for up to 65536 address spaces each with 256MB of virtual address space per VTable. However, it's unlikely this would be necessary, since the 80286 was superseded by the 80386 in 1986 and by the time computers were reaching 25% of its physical memory limitations it had been replaced by the i386 and i486 in the early 1990s.

For the same reason, although it would be possible to increase the address space of the 80286 by having separate instruction and data spaces (so the virtual address space could be up to 32MB even though the physical address space would be 16MB, by simply differentiating TLB tag entries based on code vs data access rights), there's no point, because the processor would have been a minority player by the time this could be exploited.


Conclusion

Implementing a simpler 80286 paged memory management unit would have enabled software developers to provide most of what's needed by virtual memory in an operating system, whilst providing for simple software implementations that would have better leveraged software on the 80286; supporting full compatibility with the 8086 and retaining a similar segmentation model.


In turn this would have lead to a simpler 80386 implementation, accelerating the dominance of the IBM PC.