One Week Wonder: 2016

Sunday, 6 November 2016

uxForth: Unexpanded forth for a standard VIC-20. Part 2, the inner interpreter.

I'm the developer of the DIY 8-bit computer FIGnition, but it doesn't mean I'm not interested in other retro computers and the idea of developing a minimal Forth for the ancient, but cute Commodore VIC-20 is irresistable!

In the first part I talked about the appeal of the VIC-20 and how much usable RAM I thought I could squeeze out of it.

That turned out to be between 3947 bytes and 4646 bytes depending on whether we count the screen and the CPU stack. And this sounded more credible, except that I want at least 1Kb of RAM for user programs which brings me back to 2923 to 3622 bytes. A terrible squeeze after all.

There's one obvious way to tackle that: use the Token Forth model. A definitive articles covering all the trade-offs with developing Forth are in the series "Moving Forth" by Brad Rodriguez, but here, we just need to recap on the most popular Forth models.

Forth Execution Models

Forth normally represents its programs as lists of addresses which eventually point to machine code. The mechanism is handled by the inner Forth interpreter called "Next". The traditional Forth model implements what's called Indirect Threaded Code.

Here, each forth command (in blue) points to an indirect address (in green) which points to some machine code (in pink). Primitive commands in Forth (like DUP, >R, SWAP and R> here) have an indirect address which points to the next RAM location where the machine code starts. But commands written in Forth itself (like ROT) start with an indirect address which points to ENTER which implements a function call and is then followed by more Forth commands (in blue). A Forth command like this then ends in EXIT, which returns Forth execution to the next calling function (MYFUNC).

The next Forth model to consider is Direct Threaded Code. Here's the same thing:

Here, every forth command (in blue) points directly to machine code (in pink). Primitive commands are executed directly, but commands written in Forth itself (like ROT) start with a "JSR Enter" machine code instruction which saves the return address (to Forth code) on the normal stack and in the DTC Forth, this return address is used as the new Forth Instruction Pointer after pushing the old IP. We can see that DTC will normally be faster than ITC because there's less indirection.

Token Threaded Forth is essentially a byte-coded Forth, except that in the case of commands written in Forth itself, the NEXT routine uses the top bit of the token to denote an address. Thus, only a maximum of 128 tokens can be supported and only 32Kb of Forth code.

In this example, we can see that the Forth code has been reduced from 14 bytes to 8b, but there is a jump table of addresses which is the same size as the indirect entries in ITC (10b used for these entries). DTC used an additional JSR (3 bytes) for the ':' defined word, but TTC didn't need any extra bytes for the ':' definition (it uses a single bit, encoded in the $93A0 address). Here, the overhead of ITC weighs in at 24 bytes, TTC weighs in at 18 bytes and DTC weighs in at 17 bytes.

We can see that TTC could significantly reduce the size of Forth code if the forth tokens are used often enough, but traditionally a byte-coded interpreter is slower than a threaded code interpreter. uxForth won't beat a DTC Forth, so the question is whether it can compete with an ITC Forth.

Execution Timings

ITC Forth:

NEXT LDY #1

LDA (IP),Y ;Fetch

STA W+1 ;Indirect

DEY ;Addr

LDA (IP),Y ;to

STA W ;W

CLC ;Inc

LDA IP

ADC #2

STA IP ;IP.lo

BCC L54 ;If CY,

INC IP+1 ;inc IP.hi

L54 JMP IndW

IndW: .byte $6c ;JMP () opcode

W .word 0

This is the implementation from the original 6502 FIG Forth. It uses zero-page for IP and W. The indirection is achieved by jumping to an indirect Jump. It requires 41 cycles.

DTC Forth

NEXT LDY #1

LDA (IP),Y ;Fetch

STA W+1 ;Indirect

DEY ;Addr

LDA (IP),Y ;to

STA W ;W

CLC ;Inc

LDA IP

ADC #2

STA IP ;IP.lo

BCC L54 ;If CY,

INC IP+1 ;inc IP.hi

L54 JMP (W)

W .word 0

This is a simple derivation from the original 6502 FIG Forth. As before it uses zero-page for IP and W. The indirection is achieved using a simple indirection. It requires 36 cycles.

UxForth

lda (gIp),Y ;byte code.

asla ;*2 for gVecs index

;Also 'Enter' bit in Carry

iny ;inc gIp.lo

beq Next10 ;page inc?

;no page inc, fall-through.

Next5:

bcc Enter ;Handle Enter.

Next7:

sta Next7+4 ;modify following Jmp.

jmp (gVecs) ;exec byte code.

Next10:

inc gIp+1 ;inc page

bcs Next7 ;now handle token/enter.

This is the proposed UxForth implementation. The UxForth version has to handle both multiplying the token by 2 to get the index for the jump table (gVecs) and testing to see if it's a call to another Forth routine (bcc Enter). It requires 22 cycles, so we can see that it's almost twice as fast as the ITC version. This is because it has one natural advantage and uses several techniques to improve the speed:

Y is used to hold the low byte of IP, thus when we execute lda (gIP),Y , only the upper byte of gIP is used, the lower byte is always 0.
Branches are arranged so that the common case is the fall-through case. Thus when IP increments over a page boundary two jumps are needed.
We normally only have to read one instruction byte instead of two. This is the one natural advantage TTC has over ITC or DTC.
The vector is stored directly in the code (the second byte of jmp (gVecs) ).

18b vs 26b for ITC Forth and 25b for DTC forth. It's possible to use most of these techniques to improve the speed of ITC and DTC 6502 Forth, but I'm not so concerned about that, because the easiest to access VIC-20 Forth is the Datatronic Forth (which is an 8Kb ROM cartridge) and Datatronic Forth uses exactly the same version of NEXT as FIG-Forth.

Conclusion

RAM is still very tight, but we can reduce its usage by implementing a byte-coded Forth and we should find it's perhaps up to twice as fast as a traditional FIG-FORTH implementation.

In the next post we'll look at how we might map our Forth implementation to the available RAM regions!

Thursday, 3 November 2016

uxForth: Unexpanded Forth For A Standard VIC-20

I'm the developer of the DIY 8-bit computer FIGnition, but it doesn't mean I'm not interested in other retro computers and as far as it goes, the Commodore VIC-20 is one of the cutest to come out of the 1980 stables.

The VIC-20 was cute, because it had a combination of fun and dumb features. Like: a full quality 65 key keyboard - and only two cursor keys!

Or the ability to support business applications with a floppy disk drive, but only having 23 column text. Or multi-colour graphics (and even a 2-bit per pixel mode that can co-exist with a 1-bit per pixel mode) with a near complete lack of support for bitmapped graphics. Or it's 16Kb of ROM and only 5Kb of RAM (with just 3582bytes free when Basic boots).

So, the fun challenge here is to see how much of a Forth I can squeeze into the Basic, unexpanded VIC-20 given the RAM limitations. I'm pretty confident I can do this, given that a super-tiny Forth subset has been crammed into just 1Kb of 8086 code (itsy Forth). I'm aiming for something that's kinda more usable.

Dude, Where's My RAM?

The first step (and this is the topic of this blog) is to find out how much RAM we can really use. A VIC-20 boots up and proudly displays:

But it actually has 5Kb, so where has that other 1.5Kb gone? Armed with a detailed VIC-20 memory map we can see that areas of the first 1Kb have been nicked by Basic and the Kernal, which is a set of OS services abstracted from Basic and forms part of the ROM. For our purposes we don't want to use Basic, but we do want to use the Kernal, so we can read the keyboard, display to the screen and input/output between peripherals. For some of this 1Kb it's obvious which is used by Basic, but not all. So, here I decided to use the VIC-20 ROM disassembly. I first worked out that the Kernal starts at the address $E475, or thereabouts by observing that the rest of that code doesn't reference Basic. So, then I looked up all the system variables used by that section of code and found this set of addresses:

01,X 0100+1,X 0200,X 0259,Y 0263,Y 026D,Y 0277-1,X 0277

0277+1,X 0281 0282 0283 0284 0285 0286 0287

0288 0289 028A 028B 028C 028D 028E 028F

0290 0291 0292 0293 0293,Y 0294 0297 0298

0299 029A 029B 029C 029D 029E 029F 02A0

0300,X 0314 0314,Y 0315

(EABF) 0314 IRQ vector

(FED2) 0316 BRK vector

(FEAD) 0318 NMI vector

(F40A) 031A open a logical file

(F34A) 031C close a specified logical file

(F2C7) 031E open channel for input

(F309) 0320 open channel for output

(F3F3) 0322 close input and output channels

(F20E) 0324 input character from channel

(F27A) 0326 output character to channel

(F770) 0328 scan stop key

(F1F5) 032A get character from keyboard queue

(F3EF) 032C close all channels and files

(FED2) 032E user function

(F549) 0330 load

(F685) 0332 save

C533

C677

I also searched the Kernal code to find references to addresses within the Basic part of the ROM and found none, which meant that Basic sits properly on top of the Kernal. So, this tells us what areas of RAM we can use and it's as follows:

Address Range	Size	Owner	Available for Forth?
$000 .. $08F	$090	BASIC	Yes, it's Page 0 how could we avoid it :-) ?
$090 .. $0FA		KERNAL	No.
$0FB .. $0FE	$004	Nothing	Yes, not sure yet what for.
$0100..$013E		KERNAL	Unlikely (tape error log/correction buffer)
$013F..$01FF	$0C1	CPU	Yes, for the return stack
$0200..$0259	$05A	BASIC	Yes, for Forth code
$025A..$029F		KERNAL	No.
$02A0..$02FF	$060	None	Free, more Forth code.
$0300..$0313	$013	BASIC	Yes.
$0314..$03FF		KERNAL	No.
$033C..$03FB		KERNAL	Cassette buffer, maybe, but limited usage.
$03FC..$03FF	$004	Free	Some Vars?
$1000..$1DFF	$E00	BASIC	Forth code
$1E00..$1FF9	$1FA	VIC	(Screen)
$1FFA..$1FFF	$006	Free	6b, more vars?

This gives us a total of $F6B (3947) bytes, or 4453 bytes if we can use the screen, or 4140 (4646) bytes if we include the CPU stack, which of course we will.

In the next part we'll make some basic decisions about the uxForth model, this will help us decide how to use all these areas.

Sunday, 7 August 2016

Glamourising Carmageddon In a 1ºC Warmer World

A recent Guardian article nails it. We've already hit >1ºC of warming, so it's virtually impossible to stay under 1.5ºC despite the fact that the Paris Climate Agreement in December 2015 stated it as a goal.

There's an interesting incidental quote in the article:

“And by 2030 you will have to get rid of the combustion engine entirely”

Indeed, that's where it must lead and the realisation that infernal combustion engines are going to have to be phased out rather more quickly is going to dawn on us pretty soon. For example, if we want to have 0 fossil fuel cars on the road by 2050, then we have to stop making them by 2025. Norway, India and the Netherlands are already considering this.

So what this article is saying - if it doesn't mean merely ending production - is that we should already have stopped building fossil fuel cars in 2005, which means they'll be forcibly taken off the road. The fact that cities like Paris ban odd or even numbered vehicles on high pollution days is an indication of how this might take place, but whatever, they have to go and go quickly.

Yet, and this is a big yet - we are still glorifying cars in every advertising space possible: on TV in virtually every commercial break; on billboards, in papers, on facebook, on the Guardian, even over Guardian articles warning about climate change. There they are, the car companies, rubbing our faces in the pollution they're causing - because they get to advertise all the time and the counter-narrative gets almost no air-time.

And this needs to stop as well, because it's no different to tobacco adverts during the 1970s, actually it's worse because these products are a threat to the whole planet, not primarily the individuals who smoke. Deglamourise those tail-pipes and the incessant urban noise that the fossil fuel industry has immersed us in. End those adverts, it'll make it easier to end fossil fuel car production.

One Week Wonder