Saturday, 13 January 2018

EV Intentionality

We have a cute Renault Zoe EV, called Evie as it happens, and when I get a chance at traffic lights I slip it into non-ECO mode so I can leave all the hotshot BMWs / Audis / Mercedes and Jaguars in the dust as I zoom away (within the speed limit of course :-) ) !

EV Intentionality is about driving and thinking EV in such a way as to convey the genuine benefits of the technology. They're the rapidly approaching future (fuel cell cars aren't) and we're in competition with the Fossil Fuel industry who are orders of magnitude bigger than us (until their stranded assets catch up with them ;-) ).

Friends frequently ask me if it's better to (a) buy an EV now, (b) buy a hybrid or (c) drive their current car into the ground. (c) Seems like common sense, but actually it's worse for the environment and your pocket. This is why (assumes average ICE car driving 12703Km/year at 120g/Km):
The way to look at it is to add up your emissions over the long-term. Put simply, buying an EV involves a one-time emissions hit (the production of the car, including the extraction of its raw materials) and after that, it can be emissions-free. This assumes you'll charge it on renewable energy, because we do.

Therefore every fossil fuel mile you add now, adds to your final emissions. By 2040, the EV bought in 2018 still has the same emissions, but the one bought in 2022, just 4 years later resulted in another 60% emissions and the one kept going until 2040 resulted in nearly 4x the emissions - before the EV was eventually bought).

Let's consider what happens if you buy a Hybrid (at 100g/Km) or a Plug-in Hybrid (PHEV) (at 45g/Km) or an EV vs driving the same ICE car for as long as possible:

Basically, the EV results in lower total emissions than continuing with your ICE by 2024 (in 6 years), the PHEV manages it by 2027 (in 9 years), and the Hybrid by 2040 (22 years), in other words, a long time after its life expectancy. Looking at the life expectancy (on average by 2032, given our start date); by then, the PHEV's total emissions are 75% more than the EV, but the Hybrid car has more total emissions than the ICE. In other words - you're very unlikely recoup the manufacturing emissions by buying a new Hybrid car compared with driving an existing ICE into the ground, though of course it'll be less emissions than buying a new ICE.

From an intentional viewpoint though we want to promote the transformation of transport. Consider:

  • Every fossil fuel mile you drive now, is a donation to the fossil fuel industry. They're not a charity. The first question to ask is "how much of my money do I want to give them?" If it's nothing (which is the right answer), then your basic decisions are made for you.
  • EVs will come down in price over time and improve faster over time. But the rate of this depends upon how quickly we switch. If we takes decades to go clean the rate of improvement will be much slower. It's what's called a market signal, which is a vote. You put your money in the market you have faith in and the market responds accordingly.
  • EV production will get cleaner over time as industrial practices decarbonise, but this will happen much slower than we can switch to EVs. By switching to EVs and running them on clean energy, we send another market signal, that we want a carbon-free lifestyle sooner. This is another market signal.
And given there seems to be at least one car advert on every commercial break on TV, we're going to have to be 100% intentional for the foreseeable future :-) !

[Edit: Graphs updated to include manufacturing footprints for EVs and ICE cars based on This Guardian Article. The article provides only EV and ICE footprints; I've estimated a Hybrid and PHEV manufacturing footprints based on typical CO2 emissions for the technologies on the basis that the battery technology and/or drivetrain is what contributes to the higher manufacturing emissions in proportion to the battery technology provided. In addition, I've assumed that manufacturing footprints will fall linearly until they fully decarbonise by 2070. These are provisional calculations until I get better information. Similarly, the study used to estimate EV manufacturing at 8.8Tonnes may assume a Tesla Model S as the standard EV, and that will not be representative across the globe - and it might not even be true for the Tesla Model S.]

Wednesday, 3 January 2018

A Listing As High As The Moon!

Did you know that the source code for the Apollo Guidance Computer would reach all the way to the moon if it was printed out?

No? Good - because it's not true and we'll do the math here. There's a whole world of computing mythology that's sprung up over the decades. A classic claim is that the computers on the Apollo spacecraft were less powerful than the computer inside a typical digital watch.

That means there's more software on that watch than you need to reach the Moon!

It's all rubbish. The Apollo Guidance Computer was relatively complex; it was a 16-bit machine that had 36KW of firmware on it and ran. It would take a typical software engineer years to write the code that filled it, and in fact it did take a team of talented software engineers (headed by Margaret Hamilton who coined the term software engineer) years to write its code - in assembler.

Fortunately we can see the source code for the AGC these days as it's on github. Based on the Wikipedia article and assuming each Timing Pulse is 1.024MHz, then instructions would typically take between 12 timing pulses, which gives 85.3 KIPs and most instructions might require twice as much as that (assuming a memory access took another 12 pulses), giving speeds between 42.7KIPs and 85.3KIPs (which can be verified here).

36KW of firmware is equivalent to about 72Kb of firmware. That means the computer could handle software about as complex as microcomputers from the year 1984, where the limits of 16-bit addressing and the shift from writing in assembler to writing in high level languages were being tested. Some microcomputers from that era did have more RAM (e.g. the ACT Sirius), but here I'm taking common home computers as the baseline.

When you look at the software in GitHub you find that there are a reasonable number of files, usually of a reasonable length. The source code there has a GitHub header of about 26 lines on every file, but the rest of it is the original software.

Let's first think about how much code might be on there. Each assembler line is probably 1 instruction, which is one word (though it might be more if they used a macro assembler). Typical listing paper had 66 lines per page, So, 36,864 words would be 36,864/66 = 558.5 pages. So, how thick is a page? You can still buy listing paper, and it works out at 24.1cm for 700 pages, so the whole listing is, at a minimum: 558.5/700*24.1cm = 19.22cm high, just under half-way up to my knee.

So, at a first guess, it's not very high. But I could be badly wrong, because the real software was different for the Command module and the Lunar module; and also they would have had many comments in the code, which would have lengthened it. So, I took a look at the actual source files.

Helpfully, they have page numbers in them. Luminary099, which was the Lunar module's software has 1510 pages, and the command module's software in Commanche055 had 1516 pages. The total is thus: 3026 pages which is: 114.5cm tall. That's a bit more like the diagram, but it still only goes up to my chest.

Again this might not quite be accurate, because some of the code could have been shared, I hope to post a future update when I've figured out how much is!

Sunday, 31 December 2017

Z180 MMu-tiny

There's surprisingly little in the way of clear explanations for Z180's MMU, which is mostly OK because as an 8-bit CPU, the Z180 isn't very popular any more.

So, here's a bit of an explanation. The Z180 is basically a Z80 with some extra built-in peripherals, one of which is a bank-switching MMU which provides the CPU with the ability to manage 1Mb of physical memory.

The MMU itself is very simplistic. It divides the normal 16-bit address space into 3 sections: a section (called Common 0) that's always mapped to the beginning of physical memory; a section (called the Banked Area) which can be mapped anywhere in the 1Mb of physical memory; and a final section (called the Common 1 area) which can also be mapped anywhere in the 1Mb of physical memory.

The two banked areas can be made to start at any 4Kb region in the logical 64Kb memory space; though the Common 1 area should be made to start after the Banked Area.

The MMU is controlled via 3 x 8-bit registers in I/O space, at addresses at 0x38 to 0x3A as follows:

CA and BA are both reset to 0xf, which means that only the top 4Kb is mapped on boot-up, and since BBR and CBR are reset to 0, then this means that the top 4Kb is mapped to 0x0f000, which is good because the MMU can't be switched off.

Zilog (and Hitachi who first implemented the scheme) intended the MMU to operate so that only the Bank Area would move during the lifetime of any Z180 process, which is why the top segment is called "Common Bank 1", rather than, say "Bank Area 2" (or '1' if we'd numbered the Bank Areas from 0).

But this scheme severely restricts the flexibility of the MMU. That's because a flexible banked system needs at least 4 moveable banks for a process. You need a non-moving banked area of data to hold the stack and key global variables. Similarly, you need a non-moving banked area of code to hold key common routines, e.g. the bank switching code or a vector to the OS. Then to be able to use the 1Mb space for larger applications you need a relocatable segment for code and another for data; which means 4 banks in total. The Z180 only provides 2 movable banks.

If I had designed the Z180 MMU I would have chosen a pure bank-switching technique where each virtually addressed 16Kb bank is independently mapped somewhere in the physical address space. What might that look like if we were confined to the equivalent set of registers as for the real Z180 MMU?

Here the registers are called the FBP (Fixed Bank Page), the DBP (the Data Bank page) and the CBP (the Code Bank Page).

Each register is 8-bits and they map the virtual address to a 22-bit virtual address space (providing 4Mb) as follows:

So, the Code and Data bank pages map to the first two 16Kb banks anywhere in the physical address space and the last 32Kb normally maps to a 32Kb bank. There are two use-cases for how we might want to do the mapping. Firstly, there's the general purpose OS case. Here, we assume that it's fairly easy to map fixed code and data to consecutive banks (moving a pre-allocated moveable bank if needed). Hence we can make do with a single bank register for this purpose. The second use-case is for embedded systems where there would be a single application (in ROM). Here, the X bit can be set so that the top 16Kb can be mapped into RAM in the other half of the physical address space.

The logic involved in this scheme would be at least as simple as the original Z180, because no adder is required to compute the virtual Address + the bank register and no comparison logic is needed to determine which segment any particular virtual address belongs to. Instead, the top two virtual address bits directly index the MMU registers.

One additional feature is implemented: a RST xx instruction, including RST 0 - reset, will clear the CBP. This means that it would be possible for an application or process running from banked or the fixed code bank to execute a ROM routine - in rather the same way it works with CP/M.

Thursday, 10 August 2017

Xenophobia on the Virgin Express

I've heard a lot about the growing Brexit xenophobia being nurtured by the likes of the Daily Mail, The Sun and Express, but prior to our holiday trip to the lovely North East town of Seahouses a couple of weeks ago I'd not really witnessed any.

I mean we've seen the video of anti-immigrant or racist abuse on the Manchester Metrolink right?

So, let's get down to the specifics. On Saturday, July 22 we were on the (I believe) Virgin Intercity (though it might have been Cross County) to Newcastle from Birmingham New Street. I think we would be arriving at 14:46 and then catching the next train to Berwick upon Tweed.

Me and my wife had an allocated pair of aisle seats at a table as it happened and there was a couple in the window seats too. I was travelling backwards. Slightly behind me and on the opposite side window seat was a lady - clearly very pregnant - enjoying the ride.

That is, until the conductor came along and asked for her ticket. Instead she presented him with a printed A4 sheet with her booking details including a code for the ticket. She didn't know this wasn't a proper ticket, because she was from Hungary and hadn't been in the country that long.

Instead of being gracious and explaining how she could get her ticket, the ticket inspector started to get really bolshy with her; threatening her with a fine or throwing her off the train - but she didn't have any real money with her either, her partner was the one who had actually bought the ticket online.

So, I said to the inspector it was obviously an innocent mistake so couldn't she get it sorted out at the end of the journey? To which he spouted "They wouldn't allow that where she comes from!" and stormed off down the carriage like a true Daily Express reader!

Me and my wife did a double-take. So, did the couple next to us at the window side. Then between us we all did a double-double take. Then we chatted with the woman for a few minutes and explained how she should be able to sort it out when she gets off the train by going to Information.

And the post-script for any xenophobe blog readers: I've no idea if they'd immediately chuck you off a train in Hungary for only having the booking information instead of an actual ticket, but who cares? Why should we define our behaviour by the worst we might imagine about another country's? To me it just sounds like a mirror image of your own inner thoughts and you can't fix that by targeting Eastern Europeans.

Saturday, 1 April 2017

Sign up to Signed Timeouts

I'm Julian Skidmore, the developer of the DIY 8-bit computer FIGnition. Most of my career has been spent doing embedded software, and timeouts have become a major bugbear of mine.

As I see it, the typical timeout mechanisms used on pretty much every low-end embedded system I've seen are terrible and most people probably don't even realise they are, so let's swallow the red pill and embrace a new way of doing them.

Let's start with typical timeout parameters for low-end systems.

On 8-bit and 16-bit embedded systems 16-bit jiffy clocks are often used, because we still want to economise on resources even in the 21st century. They typically have a 1KHz frequency (a millisecond timer) which means they wrap around every 64s, and this has to be carefully considered when calculating timeouts.

On ARMs and the arduino library and undoubtably many other systems, 32-bit jiffy clocks are often used which extends the range to 4096000 seconds, which is roughly 1.5 months; and so for long-running applications; wrap-around problems apply to them as well.

The Evolution Of Timeouts

Embedded programmers usually approach timeouts in a progressive way, from naive implementations to ones that make better use of system resources over time. A programmer might start off by implementing countdown timeouts in the polled part of the mainloop.

However, they then find out that the main loop isn't polled deterministically, so they move the countdowns to an interrupt routine, which does the same thing.

A timeout of this kind is a form of Signal. The timeout starts at 0, which signals to the consumer (the main loop) that a timeout has taken place (or not yet started) and to the producer (the interrupt routine), that it can't decrement it. The consumer then sets the timeout to the number of jiffy ticks desired and this signals to the consumer that it's not going to write to it until it's complete, and signals to the producer that it should decrement the countdown until it gets to 0 (so now the producer has possession of the countdown). The consumer then monitors the countdown until it reaches 0; whereupon it regains possession.

These forms of timeouts have two main strong points. Firstly, they're really simple to implement (just an assignment, a decrement, and a couple of if statements), and secondly, they have an indefinite polling window (the consumer can poll a 1ms countdown after an hour and it'll still report that it's complete). Unfortunately, they soak up CPU time for every countdown timer, even if the timer isn't running and they make it harder to write modular code (because you have to hardcode in the countdown to the interrupt, unless you use a data structure for your set of countdowns).

So, programmers then generally move onto timeouts based on simple comparisons. For example, they'll set the timeout expiry to the current time in milliseconds and then keep polling the milliseconds to see if it's passed the timeout:

... // The timeout is set up,


... // then later in a main loop, we keep testing it:

if(millis()>=gMyTimeout) {
    // The timeout has expired.

The first timeout code is always constructed like this, because we see time as being linear. But of course it's buggy, because eventually millis() will get to its maximum, or near it, and when you add aPeriod, it wraps around to 0, so the first test for millis()>=gMyTimeout will think it's expired when it hasn't.

So, the programmer does the logical thing and incorporates the 'wrapping' issue into the algorithm by adding a flag to determine if the timeout wrapped around. Then we get something like:


if(millis()>=gMyTimeout && gMyTimeoutWrap==FALSE ||

    millis()<=gMyTimeout && gMyTimeoutWrap==TRUE) {
    // The Timeout flag has expired.

Except this doesn't work properly either, because if, for example, gMyTimeout is close to the maximum value, then there's only a small window where millis()>=gMyTimeout even though there was no wrap and similarly, in the wrapping case, there's only a small window where millis()<=gMyTimeout.

There are similar problems if, for example we just have the first test, and a different test for when the timeout wrap is TRUE, e.g:

if(gMyTimeoutWrap==TRUE) {

        gMyTimeoutWrap=FALSE; // we've wrapped!
else if(millis()>=gMyTimeout){
   // We've expired!

The code must be polled often enough for the condition to be met to reset the wrap flag.

A this point, assuming that the embedded programmer actually becomes aware of these issues, they'll realise that a simple concept like a timeout is far more complex than they'd imagined.

They may then try another common pattern. Instead of adding aPeriod to the current time to obtain the timeout time, both the aPeriod and the original time are maintained. We then compute a timeout as follows:


.../// and in the main loop
if(millis()-gMyTimeout>=gMyTimeoutPeriod) {
  .. // We've timed out.

In this code we solve the major issues with the previous code: by subtracting the initial timeout set time from the current time we end up calculating the relative time since gMyTimeout. It will always start out at 0 and will eventually become >= gMyTimeoutPeriod. At last we have code that isn't actually buggy!

But it's poor, very poor, because the timeout must be polled every gMyTimeoutPeriod milliseconds or the window can be missed. For example, a 1ms period means the polling must take place within a millisecond or it will look like the timeout won't expire for another 65535ms.

We ought to find a better way, and this is how:

GROT: The Golden Rules Of Timeouts

The First Golden Rule Of Timeouts is that for any circular jiffy time of n bits you always need at least n+1 bits of storage to maintain a timeout.

This is true in the first case, we had n-bits for millis() and the flag represented an extra bit of storage (assuming we managed to correct the issue). It's also true for the second case; we had n-bits for millis() and since both a timeout and the aPeriod were represented, we needed 2n bits of storage.

This rule is a direct consequence of the requirement we must have for timeouts, if a jiffy clock has a period of x before wrapping, then we need to be able to represent a time period of up to 2*x to determine a timeout:

This also means that because we have at least n+1 bits of storage for the timeout, we must perform at least n+1 bit arithmetic to calculate timeouts. Again, we can see that's true in both cases: in the first case we have some explicit code to extend the arithmetic by an extra bit and in the second case we must perform an initial subtraction (millis()-gMyTimeout) and then another subtraction will be performed for the comparison.

The second golden rule is this: timeouts must be computed as relative quantities if the jiffy clock wraps around. The primary reason why timeout calculations are buggy in the first case even when the extra bit is added, is because absolute comparisons are made instead of relative calculations. In the second algorithm, when we compute millis()-gMyTimeout, we actually compute a relative value. For example, when gMyTimeout is initially a high value, e.g. 0xff01, then when millis()==0, the calculation will be 0-0xff01 which is 0xfe, i.e. 254 milliseconds later.

Now the interesting thing about that calculation is that we're no longer strictly using unsigned arithmetic here, because 0 is much less than 0xff01 so the result is a negative number whose lower 16-bits result in a small positive number. So, what you have is a maths error, which happens to deliver the right results. In some (often older) architectures it's possible to trap on overflow; and this is what would happen here, a timeout would lead to a system error.

Signed Timeouts Are Sweet Timeouts

The observation that timeouts are relative leads neatly to this insight: timeouts should be signed, as should the jiffy clock.

It's counter-intuitive, but simply by defining, say the jiffy clock as:

int16_t gJiffy;

Incrementing it in the timer tick routine (let's assume it's 1KHz):

ISR(TimerTick) {
    // ClearTimerTickInterruptFlag

Let's say millis() is a int16_t:

int16_t millis() AtomicRead(gJiffy)

Defining a timeout as:

int16_t gMyTimeout=millis()+aPeriod;

And then to test for a timeout with:

if(millis()-gMyTimeout<0) {
    ..// We've timed out.

Solves every problem we've found with timeouts so far; does it neatly and cleanly; and maximises the timeout period, in this case to n-1 bits, i.e. a range of 32s when the whole jiffy clock period will be 64s.

Math Madness

Why is it that the timeout calculation is expressed as millis()-gMyTimeout<0 instead of millis()<gMyTimeout when surely they're the same thing? In basic algebra we're taught that:

a<b <=> a-b<0

You simply subtract (or add) b at both sides to derive the equivalent equation. The answer is that it's only true on an infinite number line: for modulo arithmetic, both unsigned and signed, there are cases where they are not equivalent.

a b a-b signed(a<b) a-b<0 unsigned(a<b)
0x00000x0001 0xffff True True True
0xfffe0x0001 0xfffd True True False
0x80000x7fff 0x0001 True False False
0x7fff0x8000 0xffff False True True
0x00010xfffe 0x0003 False False True
0x00010x0000 0xffff False False False

This is because when you take two quantities in the range 0..x and subtract one from the other, the total range of values you can get on an infinite number line is -x to x; and therefore if you can only represent x different values, then the same values must be re-used. This explains why, when you use unsigned values for timeouts, you get a large polling window if a is very low and b is really high, but as a gets higher, the polling window becomes increasingly squeezed.

It also explains why if we want to maximise the efficiency for timeouts, we should maximise (and standardise) the polling interval, and the way to do that is to limit the timeout range to maxTimeout=x/2, and then the range of results will be -x/2 to x/2.

The Intuitive Angle

The upshot is this: using signed timeouts is the cleanest solution out. It works, because when we subtract the current time from the timeout, we obtain the timeout relative to now: a positive number if it's in the future (it hasn't timed out) or a negative number if it's in the past (it has timed out). The window is the maximum theoretically possible, because given two arbitrary times, the only way to tell if a value has timed out is if you guarantee your timeouts and polling period are within half the maximum range, 180ยบ of your data type.


There are many ways in which software engineers perform timeouts, but only signed timeouts are clean and efficient in storage, computation and code size. We should use them as standard.

Saturday, 18 March 2017

New Record Low Arctic SIE Maximum Reached

The Arctic Sea Ice is dying. We've known this since the mid-1990s from satellite measurements available since the end of the 1970s and there is some pre-satellite evidence to show that it has been in decline since the 1960s via the early environmentalist Rachel Carson.

However, since the mid-2000s it's been accelerating. Normally the big news has been with the Sea Ice Extent minimum reached in September, but recently the decline in the Sea Ice Extent maximum in March is becoming increasingly concerning.

This year we have reached a new record low Arctic SIE Maximum, about 40,000Km2 lower than the previous maximum reached in 2015. This is after 6 months in 2016 where the Arctic SIE was at record low levels and even this year it has spent about 30% of the time in record low territory over and above (or should it be under and below) the record lows over that period in 2016.

The record itself was reached near the beginning of March (March 06), but because the extent can vary quite significantly up and down at the maximum point, it's not safe to call the maximum until it can be reasonably known that it's peak will be exceeded.

That point has been reached, the current extent reached 13.61mKm2 as of March 16 and there is no year from the year 2000 to 2016 where SIE has risen by more than the 270,000Km2 that would be required for 2017 to break its current peak.

Here's the graphic.

Wednesday, 8 March 2017

uxForth: Unexpanded forth for a standard VIC-20. Part 3, the memory map

I'm the developer of the DIY 8-bit computer FIGnition, but it doesn't mean I'm not interested in other retro computers and the idea of developing a minimal Forth for the ancient, but cute Commodore VIC-20 is irresistable!

Part 1 talks about the appeal of the VIC-20 and what a squeeze it will be to fit Forth into it's meagre RAM.

In Part 2 I discussed choices for the inner interpreter and found out that a token Forth model could be both compact and about as fast as DTC.

Now I'm going to allocate the various parts of the Forth system to VIC-20 memory to make the best of what's there. Some of it will be fairly conventional and some somewhat unorthodox.

(An Aside, The Slow uxForth Development Process)

From the presentation of the blog entries it looks like I'm working these things out as I'm going along. For example, it's worthwhile asking why it looks like I can leap to fairly concrete decisions about the inner interpreter or even that I think I'll be able to fit the entire system into the available space.

The simple answer is that I've already done much of the work to make this possible. I've already written the code that implements the primitives (in fact I've written, modified and rewritten it a few times as I've improved it). I've made use of the wonderful resources at 6502 org, particularly the idea of splitting the instruction pointer (called gIp in my implementation) into a page offset and using the Y register to hold the byte offset: it really does improve the performance of the core Next function.

Similarly, I've written the non-primitive code and accounted for the space. It's written in Forth with a home-brew meta-forth compiler written in 'C'. So, there will be a future blog on that too!

However, it's not a cheat as such. The code is not tested yet; nor even loaded into a real VIC-20 nor emulator (I don't have a real VIC-20 :-( ). I have real decisions to make as the blog continues, which means I can make real mistakes too and have to correct them. What I've done, really, is basically a feasibility study, so that you don't waste your time reading the blog. And of course, the whole of uxForth will be released publicly, on a GPL licence via my GitHub account.

Admittedly, it's being released slowly, a 2.75Kb program I hope to release over the course of 2017!

The Memory Map

Page 0

Page 0 is the gold dust of every 6502 system: versatile and in short supply. BASIC uses the first 0x90 bytes and the KERNAL uses the rest. We'll use all 0x90 bytes for the data stack and some key system variables:

Addr Size Name Comment
$00 2 gIp Instruction pointer, lower byte always 0.
$02 1 gTmpLo Temporary byte
$03 1 gTmpHi Temporary byte used for indirect access.
$04 2 gILimit The limit for the inner-most do.. loop. uxForth (and FIGnition Forth) differ from most Forths in that the inner most loops values, the limit and the current value are held in global locations. do causes the previous gILimit and gCurrent to be pushed to the stack; thus r is equivalent to j on other forths.
$06 2 gICount The current loop count for the inner-most do.. loop.
$08 1 gUpState The current compilation state.
$09 1 gUpBase The current number base
$0a 2 gUpDp The current dictionary pointer.
$0c 2 gUpLast A pointer to the header of the most recent dictionary entry compiled
$0e 2 gUpTib The pointer to the input buffer (I'm not sure if we need this)
$10 128 gDs The data stack
$fb 2 gTmpPtr0 Spare pointer 0
$fd 2 gTmpPtr1 Spare pointer 1

Page 1

Page 1 is the return stack as you might expect. Oddly enough, we only get 192b, because the KERNAL uses $100 to $13F.

Page 2

There are 89 bytes available here, because they're used by BASIC. I plan to use them for the byte code vectors which are:

# Name # Name # Name # Name
$00 (nop) $0b (+loop) $16 u/ $21 rp!
$01 ;s $0c 0< $17 @ $22 drop
$02 exec $0d 0= $18 c@ $23 dup
$03 (native) $0e + $19 ! $24 over
$04 (lit8) $0f neg $1a c! $25 swap
$05 (lit16) $10 and $1b r> $26 (vardoes)
$06 0 $11 or $1c >r $27 (constdoes)
$07 (0branch) $12 xor $1d r $28 inkey
$08 (branch) $13 >> $1e sp@ $29 emit
$09 (do) $14 << $1f sp! $2a at
$0a (loop) $15 * $20 rp@ $2b

The codes that are greyed out have no names in the dictionary to save space; the way you'd insert them into code would be with [ nn c, ] sequences.

Page 3 and Page 4

There are a total of 116 bytes free from $2A0 to $313, I'll fill that area with some of the actual native definitions.

The cassette buffer is at $33c to $3fb. We'll be using the cassette for storage so we can't use it for code. 

Pages 16 to 31 ish ($1000 to $1dff)

This is the area of RAM reserved for BASIC. It will contain the rest of the Forth system.

The screen RAM ($1e00 to $1ff9)

The end of RAM for an unexpanded VIC-20 is used for the screen. The plan here is to use that area for the editing space.  Instead of implementing a line editor (ACCEPT in FIG-forth and early FIGnition Forth), we use key to call the KERNAL editor and allow it to manage the editing of a line including cursor movement. Pressing Return doesn't execute the command line, instead, pressing F1 exits the editor and sets the interpretation point to the current cursor position. The end of the interpretation point is set to the end of the screen and emit is turned off until interpretation gets to the end of the screen. Importantly, pressing return doesn't start interpretation.

In addition, pressing F2 saves the screen bytes onto cassette.

This is how I'll implement storage in a fairly minimal way. By implementing save via F2 I can save a block (actually the 506 screen bytes are roughly half a traditional block), but LOAD is a normal word, so multiple blocks can be loaded (you just add load to the end of the block).

So, this is how you'd do normal editing operations. For normal words you would place the cursor near the end of the screen and edit to the end of the screen; cursor to return to the first character you want to interpret and then press F1. In a sense this is easy, because you can just press Return and then cursor up until you get there. The same method would also work if you wanted to compile a whole screen's worth of code. Load itself would reset the cursor position to [home] and then return to the interpreter, so placing a load at the end of the screen would load the next screen without any recursion. That way you'd be able to develop programs that were longer than just one screen without manual reloading.


In the memory allocation of uxForth, we've squirrelled away about 1053 bytes of RAM, embedding the line buffer in the screen and a number of system variables in page 0. We've also included 212 bytes of what we'd use for the program proper. It won't get much better than this!

In the next post I hope to talk in more detail about the implementation of the primitive words and the code used to test them.