Down the Forth Rabbit Hole
I have recently stumbled down the Forth Rabbit hole. It’s been an enjoyable few weeks. I seem to rediscover Forth and its concatenative cousins every decade of my career, have a bit of a dabble and then move on. But this time I went deep.
I implemented my own Forth system. As you do. Quarter Forth. Initially following Virgil Dupras’ excellent Tumble forth tutorial which taught me details of the x86 architecture and boot process. I was very keen to explore implementation from the ground up: Using a high level language (for example Haskell, my favorite) doesn’t seem to really embrace the spirit of Forth I was learning about.
Initially I was concerned that by implementing my own Forth system, although fun and instructive, I wouldn’t get the essence of Forth – that would require coding in Forth. But soon I realized this worry was not well founded. Implementing a Forth system requires writing lots of Forth code. To begin with (following the tutorial) everything was in x86 Asm. But as my system grew, I started implementing more and more in Forth itself. This is probably obvious to any die-hard Forth-ist, but I had to discover it myself!
Self Hosting
There is something glorious about seeing how much of Forth can be implemented within itself. Perhaps the first step of this self-hosting joy is when you see how the if
/then
control structures are not special builtins, but just standard Forth words. They are marked as immediate
so they act like macros and run at compile time. The following definitions suffice in Quarter Forth:
: if
['] 0branch compile, here 0 ,
; immediate
: then
dup here swap - swap !
; immediate
The if
and then
words are run during the compilation of a word which uses them. They work together to generate code (0branch
) which conditionally branches over the sequence of words between if
and then
. (We will see this below in an example) The 0branch
word and the immediately following destination slot is generated by if
. The destination slot gets backpatched by then
which will know the target of the branch. The location of the slot to be backpatched is left on the stack at compile time; left by if
, and consumed by then
, which computes the relative offset, as required by 0branch
, and patches the dummy 0
value initially written there by if
. Phew!
We can make use of if
and then
in the definition of (
which implements support for comments; another example of self hosting. Comment support is not builtin, but is again defined within Forth, for example by the following definition:
: (
key [char] ) = if exit then recurse
; immediate
( Woo hoo. Now we have comments )
The definition of (
skips all following input characters, obtained via key
, until a closing )
is encountered. Code to conditionally skip over the call to exit
is generated by if
and then
. The looping structure is setup by recurse
which unconditionally branches back to the start of the current definition. We can use the disassembler see
to inspect the generated code.
see (
: ( key lit 29 00 = 0branch 05 00 exit branch ea ff ; immediate
We see the code generated by if
/then
is 0branch 05 00
, which skips forward when the equality test is false; continuing at branch ea ff
. The branch distance is five because it must skip over the literal 05 00
which takes two bytes and the call to exit
which takes three bytes. The unconditional branch compiled by recurse
has distance ea ff
, being the little-endian 16-bit representation of -22.
With just a little more work we could define comments which properly nest by counting the opening and closing parenthesis while skipping. We can also implement useful features like string literals (.s"
… "
) all within Forth itself. See forth.f for examples.
In Quarter Forth, the disassembler (see
) and memory dump tools (dump
/xxd
) are also written in Forth, and were for me quite substantial pieces of code to develop. Also implemented within Forth is a basic system for buffered line input, with support for backspace editing no less! But even as I was working/struggling to implement these Forth applications, I knew how much harder it would be to work at the level of x86 Asm.
Of course, the other application which can be implemented within Forth is the Forth text interpreter itself…
Next Time
In the next article
- We implement the Forth text interpreter in Forth.
- See why this is a good idea.
- Deal with the bootstrapping issue into which we run.