Showing posts with label machine code. Show all posts
Showing posts with label machine code. Show all posts

Saturday, 7 August 2021

Fig-Forth At PC=Forty (Part 5)

FIG-Forth was a popular and very compact, public-domain version of the medium speed Forth systems programming language and environment during the early 1980s. In part 1, I talked about how to get FIG-Forth for the IBM PC running on PCjs and in part 2 I implemented a very rudimentary interim disk-based line editor. Part 3 dives into machine code routines and a PC BIOS interface, so that I could implement the screen functions I needed for my full-screen editor and in Part 4 I used them to implement that full-screen editor.

Here I'd like to explore a bit more graphics, since the PC BIOS interface can plot pixels (in any of 4 colours).

So, let's start with Plot. I get most of my BIOS programming information from wikipedia, though I've used an independent web page too. Implementing plot is just an INT10H function (where AH=24), so let's try it:

: PLOT ( CLR X Y )
  >R >R 3072 + 0 R> R>
  INT10H
;

4 VMODE takes us into 320x 200 graphics. You can still type in text, but you can't see the cursor. CLS fills the screen with a stripe - the bios call doesn't work the same way. We can create a graphics cls with:

: CLG 1536 SWAP 21760 * 0 6183 INT10H 0 0 AT ;

We can fill the screen with a colour:

: FCOL 200 0 DO I 320 0 DO OVER OVER I SWAP PLOT LOOP DROP LOOP DROP ;

So, 0 CLS 1 FCOL will then fill the screen in cyan in 37.6s. This makes the plot rate 37.6/(320*200=64000) = 1702 pixels per second, or if we exclude the non-plot functions (43µs+32µs*4)*320*200 = 10.9s for the FORTH code itself, so 26.456s or 2419 pixels per second. Writing a simple random number generator:

0 VARIABLE SEED

: RND SEED @ 1+ 75 * DUP SEED ! U* SWAP DROP ;

Means we can fill the screen with random pixels with:

: RNDPIX 200 0 DO I 320 0 DO 4 RND OVER I SWAP PLOT LOOP DROP LOOP ;

Is what we get part of the way running this after CLS. It randomly plots successive pixels with the colours black, cyan, magenta or grey.

Lines

More usefully we can draw lines. The Jupiter Ace manual gives a nice routine for drawing lines (page 79), however it uses the definition PICK which isn't available on FIG-Forth. It will also turn out that the Bresenham routine, which although it's faster in pure assembler, is slower when the instruction execution rate is much slower than division or multiplication. And that's true for the 8088 where each Forth instruction is about 30µs, but a multiplication also takes about 60µs. Thus, if the Bresenham algorithm is at least 2 instructions longer, multiplies (or divides) are faster. And the Bresenham algorithm on the Jupiter Ace takes: 14 basic loop instructions + DIAG (some of the time) = 4 instructions + SQUARE the rest of the time = 8 or 7 instructions. And Step = 12 instructions. So, that's about 14+6+12 = 32 instructions per loop. By comparison, the main loop in FIGnition is 22 instructions. On this basis we'd be able to achieve about 1500 points per second, a diagonal line across the screen would take about 0.2 seconds.

It's possible to consider the fastest potential line drawing code and base the full line drawing algorithm around that. The quickest way is to consider that again, for the longest axis, its coordinate will increment by 1 on each pass and on the other axis, some fraction of 1. So we can consider a 16-bit fraction, in the range 0..1 on that axis, which we can multiply by the longer axis. In each case we need to add an offset for each coordinate to get the final location.

: DAxis ( col grad offsetg offseth limh )
  0 DO ( col grad offg offh)
    OVER >R >R >R 2DUP ( c g c g : f h f)
    I U* SWAP DROP R> + ( c g c g*i+f : h f)
    R I + PLOT R> R> SWAP
  LOOP
;

So, in this version we need about 20 instructions and we need a second version where the x coordinates map to the do loop. The problem with this version is handling negative gradients, because using U* to generate gradients won't generate negative results in the high word. However, this can be fixed by sorting the coordinates. Consider (with a normal x-y coordinate system) a vector in the second quadrant at about 153º (a gradient of about -1/2). If we sort the coordinates so that we draw from right to left, then the y coordinates will draw upwards. Similarly a line in the 4th quadrant drawn at 288º (a gradient of 2/3), if we sort the coordinates so that we draw from top to bottom, then the x coordinates are ordered left to right. And we can achieve this by XOR'ing the DO loop coordinate by 0xffff (and adding 1 to the DO LOOP coordinate offset). Furthermore we can 'improve' the line drawing by modifying the plot routine to add an origin ( ox, oy) and set the colour ( fgCol). This gives us:

0 VARIABLE oxy 0 ,

: Oplot ( col dx dy )
  >R >R 3072 + 0 oxy 2@ R> + SWAP R> +
  INT10H
;

So, OPLOT is 4 words longer than PLOT. Also we want x in the first word of oxy and y in the second word. Then DAxis is:

0 VARIABLE XDIR

: DAXISY ( col grad limh sgn&dx sgn&dy)
  OXY >R R 2+ +! R> +!
  0 DO ( col grad)
    2DUP ( c g c g)
    I U* SWAP DROP ( c g c g*i)
    XDIR @ I XOR OPLOT ( c g )
  LOOP DROP DROP
;

: SGN 0< MINUS ;
( Here Y is the major axis. There are 4 cases,
  maj>0, min>0 = first quadrant, normal.
  maj >0, min <0 = second quadrant [ \ ]. Set ox+=dx.
               Since the gradient is always unsigned,
               a left to right draw will cover the correct x direction.
               However, y will draw from bottom to top, giving [ / ]
               So, y needs to be drawn from top to bottom too,
               XDIR=-1, oy+=dy.
  maj <0, min <0. = third quadrant, set ox+=dx and oy+=dy. That's because
                dy<0, dx<0 is / kind of line so simply moving oxy fixes
                the problem.
  maj <0, min >0 = fourth quadrant.
               XDIR=-1.
  So, if dy^dx <0, then XDIR=-1 else 0.
)
: QUADFIX ( c min maj )
  2DUP >R >R ( c min maj : min maj)
  ABS SWAP ABS SWAP ( c |min| |maj| : min maj )
  >R 0 SWAP R U/ SWAP DROP R> ( c  g |maj| : min maj)
  R> R> 2DUP XOR SGN XDIR ! ( c g |maj| min maj sgn(min^maj)!XDIR [Quadrants 2 and 4])
  OVER SGN SWAP OVER AND >R ( c g |maj| min sgn.min : maj&sgn.min)
  AND R> ( c g |maj| min&sgn.min maj&sgn.min)
;

: DAXISX ( col grad limh sgn&dy sgn&dx)
  OXY >R R +! R> 2+ +!
  0 DO ( col grad)
    2DUP ( c g c g)
    I U* SWAP DROP ( c g c g*i)
    XDIR @ I XOR SWAP OPLOT ( c g )
  LOOP DROP DROP
;

So, this is 13 words + 4 for OPLOT, Saving 3 words. Now drawing the longest line ought to take about (43µs+4*32µs+32µs*16)*320 = 0.22s. In practice, timed basic plotting rate is 27.8s=32000 pixels, or 0.278s for the 320 pixels so that's a maximum of 1151 pixels per second, which is slowish, but tolerable.


: DRAW ( col dx dy)
  2DUP OR 0= IF
    DROP DROP DROP
  ELSE
  OVER OXY SWAP OVER @ + >R ( col dx dy &OXY : X')
  2+ @ OVER + >R ( col dx dy : Y' X')
  OVER ABS OVER ABS > IF
    SWAP QUADFIX DAXISX
  ELSE
    QUADFIX DAXISY
  THEN
THEN
  R> R> OXY 2!
;

: RLINE
  3 RND 1+ ( COLOR)
  200 RND 320 RND 2DUP OXY 2!
  320 RND SWAP - SWAP 200 RND SWAP - DRAW
;

: RLINES BEGIN RLINE ?TERMINAL UNTIL ;



Circles

Our circle algorithm, on the other hand will be copied straight from the equivalent FIGnition version.

Method: we know x^2+y^2 = const. So, we start at [0,r], which gives r^2 We can go straight up, which gives:
   [x^2+[y+1]^2] - x^2-y^2 => a difference of +2y+1.
Or we can do [x-1]^2 => a diff of 1-2x.
So, the rule is that when the accumulation of 2y+1>1-2x, then we do 1-2x. By only calculating the error, we don't need to calculate r*r and therefore there is no danger of 16-bit arithmetic overflow even for radius's larger than the width of the highest screen resolution.

: NEXTP ( X Y DIFF )
 OVER DUP + 1+ + >R  ( CALC WITH INC Y )
 OVER DUP + 1 - R> 2DUP > IF
   SWAP DROP
 ELSE
   SWAP - >R SWAP 1 - SWAP R>
 THEN SWAP 1+ SWAP
;

0 VARIABLE FG

: DXYPLOT ( COL DX DY)
  >R 2DUP R OPLOT R> ;

: OCTPLOT ( CX CY DX DY)
  4 0 DO
    DXYPLOT SWAP
    DXYPLOT NEG
  LOOP
; ( -- CX CY )

: CIRC ( COL X Y R )
  >R SWAP OXY 2! ( COL : R)
  R> 0 0 >R ( COL DX=R, DY=0 : DIFF=0)
  BEGIN
   OCTPLOT R> NEXTP >R
  2DUP < UNTIL R>
  DROP DROP DROP
;

: CIRCS ( CIRCLES)
  0 DO 3 RND 1+ 160 100 I CIRC 2 +LOOP DROP DROP ;




: CIRCBM 0 DO I 3 AND 160 100 98 CIRC LOOP ;

And with a ticks routine of the form: HEX : TICKS 40 6C L@ ; DECIMAL we can time it fairly well. 20 CIRCBM draws 98*2*pi pixels per circle and takes 174 ticks. So that's 12315 pixels in 9.56s or 1288 pixels per second. Amazingly, a bit faster than line drawing!

Conclusion

Once we can choose a graphics mode and implement a plot function we can build on that with simple line-drawing and circle-drawing algorithms. An original IBM PC running FIG-Forth draws lines like a lazy 8-bit computer, but for graphics of moderate complexity, that's tolerable. The real challenge is that a classic line drawing algorithm involves quite a lot of stack variables with extensive stack shuffling, making an efficient program far more involved than the equivalent even in 8-bit assembler and that the overhead of the Forth interpreter means the Bresenham line drawing algorithm is slower than one that uses division.

Sunday, 25 July 2021

Fig-Forth At PC=Forty (Part 4)

In part 1, I talked about how to get FIG-Forth for the IBM PC running on PCjs. FIG-Forth was a popular and very compact, public-domain version of the medium speed Forth systems programming language and environment during the early 1980s. Then part 2 covered how to implement a very rudimentary disk-based line editor as a precursor to an interactive full-screen editor; while part 3 dives into machine code routines and a PC BIOS interface, because there was no real screen cursor control via the existing commands.

Let's Edit!

Now, at least we can implement a screen editor. One of the constraints I'll impose will be to keep the editor to within 1kB of source code. At first that'll be easy, because all I want to support is normal characters, cursor control, return and escape to update. However, I know that I'll probably want to add the ability to copy text from a marker point using <ctrl-c>. But even a simple decision like this raises possible issues. Consider this, if I type VLIST I find I can press <ctrl-c> to stop the listing:

However, if I type this definition:

: T1 BEGIN KEY DUP . 27 = UNTIL ;

I find that <ctrl-c> doesn't break out of T1, instead it simply displays 3 and I really do have to press <esc> to quit the routine. But it could be that Forth still checks it automatically, just not with the above sequence of commands in T1. No, from the FIG-Forth source we can see, the breakout of VLIST is simply due to it executing ?TERMINAL and exiting if any key has been pressed. So, that potential problem is sorted!

VLIST Source Code:

DB 85H
DB 'VLIS'
DB 'T'+80H
DW UDOT-5
VLIST DW DOCOL
DW LIT,80H
DW OUTT
DW STORE
DW CONT
DW AT
DW AT
VLIS1 DW OUTT ;BEGIN
DW AT
DW CSLL
DW GREAT
DW ZBRAN ;IF
DW OFFSET VLIS2-$
DW CR
DW ZERO
DW OUTT
DW STORE ;ENDIF
VLIS2 DW DUPE
DW IDDOT
DW SPACE
DW SPACE

DW PFA
DW LFA
DW AT
DW DUPE
DW ZEQU
DW QTERM
DW ORR
DW ZBRAN ;UNTIL

DW OFFSET VLIS1-$
DW DROP
DW SEMIS

The Editor Itself

The first goal in the editor is to convert y x coordinates in the current screen to a memory location in a buffer. This is a minor change from the initial part of the code in EDL:

: EDYX& ( Y X -- ADDR)
  >R 15 AND 8 /MOD SCR @ B/SCR * +
  BLOCK SWAP C/L * + R> +
;

We want to be able to constrain Y and X coordinates to within the bounds of the screen (in this case with wrap around):

: 1- 1 - ; ( oddly enough missing from FIG-Forth, but I use it quite a bit in the editor)

: EDLIM ( Y X -- Y' X')
  DUP 0< IF
    DROP 1- 0
  THEN
  DUP C/L 1- > IF
    DROP 1+ 0
  THEN
  SWAP 15 AND SWAP
  OVER 2+ OVER 4 + AT
;

The key part of a screen editor is to be able to process characters, I've picked the vi cursor keys:

: DOKEY ( R C K  )
  >R
  R 8 = IF ( CTRL-H, Left)
    1-
  THEN
  R 12 = IF ( CTRL-L, Right)
    1+
  THEN
  SWAP R 11 = IF ( CTRL-K, Up )
    1-
  THEN
  R 10 = IF ( CTRL-J, Down)
   1+
  THEN
  SWAP R 13 = IF ( CR or CTRL-M)
    64 +
  THEN
  EDLIM
  R 31 > R 128 < AND IF ( PRINTABLE)
    2DUP EDYX& R SWAP C!
    R EMIT 1+ UPDATE EDLIM
  THEN
  R>
;

Finally we want to put it all together in a top-level function:

: ED ( scr -- )
  CLS LIST 0 0 EDLIM
  BEGIN
   KEY DOKEY
  27 = UNTIL
  DROP DROP
;

Interestingly, once I'd cleared up an initial bug where the cursor wasn't advanced when I typed a character, and another where I'd missed an AND when checking for printable characters, I was able to use the editor itself to edit improvements ( namely, putting the initial cursor at the right location instead of (0,0)).

Finally, although Forth isn't always as compact as its proponents like me often claim, in fact this editor in itself uses a mere 306 bytes, probably the most compact interactive editor I've seen and there's still over half the screen left for improvements. For example, there's no support for delete ( left, space, left); for inserting a line; copying text, nor blocks. But for the moment, it's easily far more enjoyable than the line editor it replaces.

Exercise For The Reader

The biggest user-interface problem I've found with extensions to the editor has been to decide which control character to use to mark the text location for copying. To explain: many archaic screen editors, including some early word processors used a mark, edit sequence for text manipulation. The user would move the cursor to where they wanted to perform an 'advanced' edit operation; mark the initial location; then move the cursor to either where they wanted the edit operation to finish; mark the end of that edit text; then finally, possibly move the cursor to some other location and complete the edit. For example, on the Quill word processor for the Sinclair QL, you'd Type F3, 'E' (for Erase), move the cursor to where you wanted to erase a block; press enter; move to where the erase should finish (and it would highlight the text as you went along); press enter; confirm you wanted to erase it and then it would. Or in the Turbo C editor you'd Mark an initial starting location; then Mark again an ending location and then perform an operation like 'Copy' to duplicate the text, or 'Delete' or 'Move' to move the text.

Or on a BBC Micro, the BASIC editor was essentially a line editor which supported two cursor positions (!!) If you needed to manipulate a line instead of just retyping it, you'd list the line (if it wasn't on the screen), then move the cursor keys and a second cursor would appear, moving to where you wanted on the screen; while the cursor at your editing position would remain. You'd then hit COPY and it would copy from the second cursor to your editing position, advancing both cursors.

So, in my system, which is similar to how editing works on FIGnition, I'd want to Mark the position where I wanted to copy / erase from; move to where I wanted to paste or finish a delete to and then COPY / MOVE a character at a time from the source to the destination (or Erase the text).

However, the most obvious control character to use, <ctrl-m> is already used for Return, and everything else seems rather contrived. Then I thought, what happens if I use <ctrl-symbol> instead? Do they produce interesting control codes? I found out quite a number produce 0s, but some actually generate the control codes in the range 0..31 that you can't generate from <ctrl-a> to <ctrl-z>.

This is what I found out:

 Ctrl+  Code  Ctrl+  Code
 \  28  ]  29
 6  30  -  31

So, what I'd like to know is whether this is just an artefact of the simulator being used on a Mac or whether it's common to other emulators and an actual IBM PC?

Conclusion

Once I'd implemented some earlier, critical definitions it turned out to be quite easy and satisfying to write a full-screen editor. The biggest challenges were in making sure certain key presses wouldn't collide with any system behaviour and finally thinking about some user-interface decisions for some future enhancements.

The editor also nicely illustrates some key Forth aspirations: the editor turns out to be very compact (though of course it's very rudimentary too); and I was able to use it to debug and improve itself once I'd reached some critical level of functionality. It was so easy and tiny, I wonder why it wasn't a standard part of FIG-FORTH, given that it was designed in an era when cursor addressable VDUs were already the norm.

Sunday, 18 July 2021

Fig-Forth At PC=Forty (Part 3)

 In part 1, I talked about how to get FIG-Forth for the IBM PC running on PCjs. FIG-Forth was a popular and very compact, public-domain version of the medium speed Forth systems programming language and environment during the early 1980s. Then part 2 covers how to implement a very rudimentary disk-based editor as a precursor to an interactive full-screen editor.

It would be possible to implement a full-screen editor entirely using the existing word set, if it provided definitions that could control the position of the cursor on the screen and the ability to clear the screen.

Unfortunately, it's not possible to do that either by sending display control codes via EMIT, nor via any other special commands. EMIT does support some control codes, carriage return is 13 EMIT, backspace is 8 EMIT, cursor right is 9 EMIT and cursor down is 10 EMIT. 

The rest just produce graphics characters.

Let's Do Some Machine Code!

It's inevitable I'd have to get onto some machine code at some point, and it turns out, pretty early on. That means I need some useful Forth and 8086 resources.

Firstly, there's an indispensable guide to the FIG-Forth core: The Systems Guide To FIG-Forth. In it, it says you can write machine code definitions using the ;CODE command. The idea is that you'd write something of the form:

: myMachineCodeDef ;CODE opCode0 C, opCode1 C, etc... ;

But that doesn't work as I imagined. Instead I found you need to do:

CREATE myMachineCodeDef opCode0 C, opCode1 C, etc... SMUDGE

Here, CREATE generates a CFA which points to the parameter field (by default), and because FIG-FORTH is an Indirect Threaded Forth, that's the machine code that gets executed. It's not quite the only way of doing it. The Jupiter Ace's method for executing machine code is to define a CODE word which jumps to machine code in the parameter field:

DEFINER CODE DOES> CALL ;
CODE Noop 253 C, 233 C,

And Direct threaded Forths merely need to build a header without a CFA, because in these cases, the CFA is machine code itself. 

Probably the compact resource for translating 8086 instructions is the 8086 datasheet itself. I obtained a copy from Carnegie Mellon University (which incidentally did some pioneering work in parallel processors in the 1970s).

The most critical action a machine code definition must perform is to jump to the next command. My solution is to use the NOOP word whose behaviour does nothing but jump to the next word to execute. We find the CFA of NOOP and take the contents to find the first executable 8086 instruction:

' NOOP CFA @ 

NOOP is just a single byte jump instruction followed by an 8-bit displacement. Because it's a relative address, we need to add the address following the jump to the 8-bit displacement and because a displacement is a signed 8-bit integer, we need to perform a sign extension to find the true address for NEXT. Finally, we'll need the jump instruction that can handle a 16-bit displacement, which is code 233. This gives us the following, new definitions:

: SXT DUP 127 > IF 256 - THEN ;
: NEXT [ ' NOOP @ DUP 1+ C@ SXT ( 2+ ) +  ] LITERAL 233 C, HERE - , ;

A simple, obvious machine code definitions to add to FIG-FORTH is a pair of shift operations, because shifts are really common operations in systems languages, but in FIG-FORTH it seems strangely absent.

To write a workable machine code definition we also need to know what 8086 registers must be preserved in Forth and which can be overwritten. The Forth.ASM assembler code from the original FIGFORTH.ZIP file tells us that SI=IP, SP points to the parameter stack, BP points to the return stack; AX must be preserved and CS, DS, SS all point to the same segment for the Forth executable. However, DX, BX, CX, DI, ES can all be freely modified. So, the shift operations will involve popping the count from the top of the stack into CX (which can be trashed); then the value into BX (which can be trashed); shifting BX by CL and then pushing the result. This gives us:

CREATE << HEX 59 C, ( pop cx) 5B C, ( pop bx)  0D3 C, 0E3 C, ( shl bx,cl) 53 C, ( push bx)
NEXT DECIMAL SMUDGE

CREATE << HEX 59 C, ( pop cx) 5B C, ( pop bx)  0D3 C, 0EB C, ( shr bx,cl) 53 C, ( push bx)
NEXT DECIMAL SMUDGE

This means we can now e.g. multiply or divide by a power of 2 over 100 cycles (20µs) faster than before :-) .

BIOS Functions

Let's go back to cursor control now. The easiest way to do that is via the BIOS functions on an IBM PC.  It turns out all the screen control functions are INT 10H BIOS functions, so by creating a generic INT10H BIOS definition, we can then simply supply all the parameters to it in a higher level Forth definition. This function will be simple and only involves popping the registers DX through to AX; then calling INT10H.  It isn't documented, but INT10H can foul up BP.

CREATE INT10H ( AX BX CX DX --)
HEX
  05A C, ( POP DX )
  059 C, ( POP CX )
  05B C, ( POP BX)
  89 C, 0F8 C, ( MOV DI,AX mod=11 reg=111=di r/m=000=AX )
  058 C, ( POP AX )
  057 C, ( PUSH DI)
  1E C, ( PUSH DS)
  55 C, ( PUSH BP [101])
  0CD C, 10 C, ( INT10H)
  5D C, ( POP BP)
  1F C, ( POP DS)
  058 C, ( POP AX)
  NEXT
DECIMAL SMUDGE

The only real complexity is that we need to load AX, but we also need to save AX too. It doesn't matter if DI gets trashed as Forth doesn't use it.

There are now quite a number of fun things we can add that use INT10H:

: AT ( R C ) SWAP 8 <<  + >R ( DX) 512 0 R> INT10H ; ( jupiter ace command for gotoxy)
; VMODE ( n -- ) 0 0 0 INT10H ; ( 0= 40 column 2= 80 column 4=cga)
: CLS  1536 15 0 1999 INT10H 0 0 AT ;

So, we can do 0 VMODE then 1536 HEX 1E00 0 1827 DECIMAL INT10H to put the screen into a 40 column mode with yellow text on a white background.



And with these commands, we can now write a full-screen editor!

Tuesday, 14 July 2020

Toggle Booting a PIC MCU!

Before the 1970s people had to boot computers by laboriously flipping a set of toggle switches on the front of the main processor. Today, all computers have built-in boot ROMs and even embedded microcontrollers are programmed using powerful external computers.

I only consider a processor to be computer if it can support self-hosted programming, so I wondered what it would take to manually toggle in a program on an MCU with no computer and minimal logic. I've produced a video based on this blog here.



I chose a primitive 8-bit PIC, because it has a simple programming algorithm, but even so, I have to enter 56 bits per instruction flashed. It was so tedious I printed out the entire sequence and used a ruler to make sure I didn't lose my position. Here's the 11-instruction program itself:

 0 Bsf Status,5 ;01 0110 1000 0011
 1 Bcf Trisc,4  ;01 0010 0000 0111
 2 Bcf Status,5 ;01 0010 1000 0011
 3 Movlw 53     ;11 0000 0011 0101
 4 Movwf T1con  ;00 0000 1001 0000 65536 cycles at 8us/cycle=
 5 Btfss Pir1,0 ;01 1100 0000 1100 3.8s per full blink.
 6 Goto 5       ;10 1000 0000 0101
 7 Movlw 16     ;11 0000 0001 0000
 8 Xorwf PortC,f;00 0110 1000 0111 (Don't care about RMW issues)
 9 Bcf Pir1,0   ;01 0000 0000 1100
10 Goto 5       ;10 1000 0000 0101

You don't need a GOTO start at address 0: if you don't have any interrupt code you can just start your program at address 0.

The reason why it's so tedious is that programming each instruction involves four command sequences:

  1. Load the instruction into (I presume) an internal buffer.
  2. Burn the instruction into flash memory.
  3. Verify the flash memory instruction you've just programmed (always worthwhile as it's so easy to make a mistake).
  4. Increment the address of the PC. Unlike many MCU flash programming algorithms, the 8-bit PICs can only have their programming address reset (by unpowering the MCU, connecting the PIC's /MCLR signal to 12V, then powering the rest of the MCU at 5V) and then incremented. Thus making a mistake (most likely because you've left out a bit or duplicated one) means you have to start again.

In addition, each programming command expects all the data to be entered from the LSB to the MSB, so in fact I toggled in all the commands backwards, reading each line from right to left. So, the first full command really looked as follows (with '|' separating individual fields):

Bsf Status,5 |0|01 0110 1000 0011|0|000010 Bit 5 is RP0, status is 3
  (Prog)     |                     |001000 Bsf is 01 01bb bfff ffff.
  (ReadBack) |0|-- ---- ---- ----|0|000100
  (Inc)      |                     |000110

The PIC needs just two signals to program it: a clock input and data signal. My hardware is (relatively speaking) super simple.

I use an astoundingly primitive 555 timer chip in its monostable (one shot) mode to debounce the clock. All I needed was a couple of capacitors and resistors to give a 0.2s delay and it would eliminate bounce. All the information I needed came from the wikipedia page on the 555 timer.

The data button was a bit more challenging. I used a few resistors and a red LED as I needed Vdd when I pressed the button, less than 0.8V when I let go, but I also needed to enable the DAT line to drive the LED when verifying data and not exceed current limits even if I accidentally pressed the button when the DAT was trying to output. One of the downsides to this approach is that the LED is barely visible on the video, though in retrospect I think I could have halved the resistor values and it would have been fine.

Finally, I needed a 12V programming voltage for the PIC, and then used a basic 7805 voltage regulator to generate the 5V Vdd voltage for the 555 and PIC. The currents are so small I didn't need to worry about a heatsink for the 7805.

On a PIC it's not good enough just to program the flash, I needed to change the internal configuration to stop the Watchdog from resetting the PIC after 4ms and to use the internal oscillator instead of an external RC oscillator. The spec on programming the configuration is rather confusing, because the sequence to go into config mode requires the full 14-bit configuration setting, and then you have to enter it again as a normal programming instruction.

With some experimentation I got it right in the end! With a bit more detail, I started off by finding out how I could enter one instruction (after running the erase command), and then two instructions. I made two attempts to program the entire program - in the first attempt I made a mistake on the last-but-one instruction and had to start again, I videoed both of them so although the toggling sequence is complete and not spliced from multiple attempts, it wasn't the first attempt.

I had a similar problem with the configuration. It took a few attempts at that to get it right and at one point I thought I had configured it to use an external oscillator and couldn't read the flash anymore. In fact I had mis-inserted the VPP to the right of the 7805's IN pin.

So, it's possible, but not very practical to manually toggle program an MCU, but perhaps survivalist geeks might find it useful in some post-apocolyptic dystopia!