Variables, Constants, and Arrays
In froth: Emulate a 'Forth' Programming Environment

knitr::opts_chunk$set(echo = TRUE)
library(froth, quietly = TRUE)

Forth-like languages are nice because they don't require a lot of variables. However, sometimes there's a problem that requires saving a particular value for later usage that can't be on the stack. This is where variables, constants, and arrays come in.

Variables

Let's say we want to store a value somewhere outside of the stack. We can initialize a variable using the VARIABLE word.

fr> VARIABLE testvar
ok.

Nothing happened! That's because this word only allocates space for a variable without doing any other actions. We can now call the name of the variable at any time to push its address onto the stack. In Forth, this would be an actual memory address. froth, however, uses an abstracted memory map.

fr> test .s
[[1]]
[1] "addr <test, cell 0>"

[[2]]
NULL

ok.

We can see the address of the variable at the top of the stack. We'll come back to the cell 0 bit later.

To store a value in a variable, we use !. We can copy the value in an address to the stick using @.

fr> variable newvar
ok.
fr> 20 newvar !
ok.
fr> newvar @ .
20 ok.

Notice that for ! we push the value first, then the address. It's also important to note that the variable retains its value even after calling @.

fr> variable newvar
ok.
fr> 20 newvar !
ok.
fr> newvar dup @ . dup @ . @ .
20 20 20 ok.

@ . is a pretty common call to inspect the value of a variable. In fact, it's so common that there's a shortcut word to perform both, ?.

fr> variable newvar 20 newvar !
ok.
fr> newvar ?
20 ok.

Variables are often useful as counters, using the +! command. +! is similar to !, except that it adds to the variable rather than replacing it.

fr> variable newvar 
ok.
fr> 0 newvar !
ok.
fr> 5 0 do 1 newvar +! loop newvar ?
5 ok.

Constants

Sometimes we just need a name for a value. Variables that will never change values are called constants, and can be fittingly be defined with the CONSTANT word.

fr> 100 CONSTANT hundred
ok.
fr> hundred .
100 ok.

Constants avoid the hassle of dealing with addresses, and can be accessed slightly faster than variables. Calling the name of the variable copies its value directly onto the stack.

Arrays

Remember when I said we'd come back to that whole "cell 0" thing? By default, initializing a variable creates space for one "thing" at an address. Unlike Forth, froth creates this space at an abstracted memory location, and that space is big enough for any arbitrary R object. However, sometimes it's nice to have a variable that can store more than one value, like in the case of vectors or lists in R.

Let's start by initializing an array. The definition proceeds like in normal variables, we just have to further specify that we want the variable to be larger than a single value:

fr> VARIABLE myarray 5 CELLS ALLOT
ok.

This initializes a variable with 5 slots for things. Internally, this is the same as a length 5 list. The word ALLOT allots memory for the preceding variable, and we'll come back to CELLS in a moment.

Previously, we saw this behavior:

fr> myarray .s
[[1]]
[1] "addr <myarray, cell 0>"

[[2]]
NULL

ok.

cell 0 means that this address is pointing to the first value of the array myarray. Cells are 0-indexed in froth, so this is the first entry in the array. Now as before, we can store a value at the first cell. I'm assuming the address is still on the stack, so remember we need to call swap to ensure the top value is the address.

fr> 1 swap !
ok.
fr> myarray ?
1 ok.

Ok, now we can look at the word CELLS. Let's start by inspecting what we get by pushing CELLS to the stack:

fr> 4 CELLS .s clear
[[1]]
[1] "addr <, cell 4>"

[[2]]
NULL

ok.

It's the fourth cell of...nothing? CELLS creates a unique kind of address--it stores cell number, but it doesn't point to any variable. Words like ALLOT expect one of these objects to tell it how much space to allocate. While CELLS calls don't point to anything, we can add them to variable addresses to get to other values in the array.

fr> myarray
ok.
fr> 4 CELLS + .s
[[1]]
[1] "addr <myarray, cell 4>"

[[2]]
NULL

ok.

Notice how we've now gotten an address to myarray that's offset by 4 cells. Since the size of myarray is 5, this is the final entry in the array (indexing starts at 0, so the 5 entries are 0-4). This address works like the examples in the Variables section, so we can use !, @, +!, and ? on it:

fr> VARIABLE myarray 5 CELLS ALLOT
ok.
fr> 0 BEGIN dup 5 < WHILE dup dup myarray swap CELLS + ! 1 + REPEAT drop
ok.

Here we're putting it all together. See if you can guess what the contents of myarray now look like before reading the next section.

We start by initializing an array myarray of size 5. Then, we put 0 on the stack and begin a loop. At each iteration of the loop we:

Dup the top of the stack and check if less than 5; if not, go to (8)
Dup the top of the stack twice.
Load the address of myarray, swap it with the top duplicate
Create a cell offset equal to the top duplicate, add to the address of myarray
Store the second duplicate (from 2) at the offset location
Add 1 to the value on the stack
loop
finish loop

So what happens? When the stack is 0, we store 0 at the 0th position of myarray, then 1 at the 1st position, 2 at the second, etc. Let's print out the entire array:

fr> 5 0 do myarray i cells + ? loop
0 1 2 3 4 ok.

It's also possible to initialize an array to a specific value using the CREATE and , words. Pay very close attention to this syntax!

fr> CREATE myarray 0 , 1 , 2 , 3 , 4 ,
ok.
fr> 5 0 do myarray i cells + ? loop
0 1 2 3 4 ok.

Did you notice how we used ,? Unlike English, , is a froth word, meaning that it needs to be space separated and it acts upon the stack. This means that we must include a comma after the last element to ensure it's added to the array.

Modifying Arrays

It's often useful to be able to quickly change the values or the size of allocated arrays.

To change the values in an array, we can use the FILL word. FILL expects three items on the stack: a variable address, a CELLS offset n, and a value x. It then fills the first n cells with x.

fr> variable filltest 3 cells allot
ok.
fr> 3 0 do i filltest i cells + ! loop
ok.
fr> 3 0 do filltest i cells + ? loop
0 1 2 ok.
fr> filltest 3 cells 10 fill
ok.
fr> 3 0 do filltest i cells + ? loop
10 10 10 ok.

There's also the shortcut word ERASE, which is equivalent to 0 FILL.

fr> filltest 3 cells erase
ok.
fr> 3 0 do filltest i cells + ? loop
0 0 0 ok.

That changes values, but what if we end up with the wrong size array altogether? For that, we have two options: EXTEND and REALLOT. Both of these words have the same form; they expect the stack to be comprised of a variable address v and a value n. EXTEND will make the array at v larger by n cells, and REALLOT will change v to have exactly n cells (copying the contents of the first n cells, if any).

fr> variable myarray 3 cells allot
ok.
fr> myarray 5 cells + ?
Error: array accessed out of bounds!
fr> myarray 2 cells extend
ok.
fr> myarray 5 cells + ?
0 ok.
fr> 10 myarray 1 cells + !
ok.
fr> myarray 3 cells reallot
ok.
fr> myarray 5 cells + ?
Error: array accessed out of bounds!
fr> myarray 1 cells + ?
10 ok.

If you're ever unsure about how large your array is, use the LENGTH word to copy the length of an address to the stack, and LENGTH? to print it out.

fr> variable myarray 10 cells allot
ok.
fr> myarray LENGTH .
10 ok.
fr> myarray 20 cells reallot
ok.
fr> myarray LENGTH?
20 ok.

Execution Tokens

Variables storing values is nice, but sometimes it's useful to be able to store a function inside a variable. This is a common practice in R (e.g., any of the *apply functions), and also exists in C in the form of function pointers.

Forth also incorporates a way to do this, using a unique word to put an "execution token" on the stack. An execution token is essentially just the location of a particular type of code. When you want to execute that token, Forth runs the code at the location referenced by the execution token.

This capability is implemented in froth, but it does have a few differences to classic Forths due to how the memory is abstracted. Let's start with a simple example:

fr> : GREET ." Hello!" ;
ok.
fr> GREET
Hello! ok.
fr> ' GREET .
executable token < greet > ok.
fr> ' GREET EXECUTE
Hello! ok.

We have two new words here: ' and EXECUTE. The first of these, ' (pronounced "tick"), creates an execution token for the word that follows it, then pushes it to the stack. If we print this value out, we get executable token < greet >. Forth languages would traditionally return memory location, but froth just shows the word that the execution token corresponds to. The second word, EXECUTE, does just that: it executes the execution token at the top of the stack.

In a way, ' GREET EXECUTE is just a really roundabout way of accomplishing the same thing as just calling GREET directly. Naturally, one might wonder what the purpose of this is. We can use ' to store functions in variables so we can indirectly execute them later on. Let's see how to do this with an example:

fr> : HELLO ." Hello" ;
ok.
fr> : GOODBYE ." Goodbye" ;
ok.
fr> VARIABLE 'aloha ' HELLO 'aloha !
ok.
fr> : ALOHA 'aloha @ EXECUTE ;
ok.

I'll show what happened in a minute, but I'd encourage readers to first think about this code. What's going on here? What would running 'aloha do? Read through the code, then continue on to the explanation below.

The first two lines should be simple by now. We've defined two functions, HELLO and GOODBYE, that simply print out Hello and Goodbye (respectively). Then, we initialize a variable called 'aloha, and store an execution token for HELLO inside that variable. Finally, we define a function ALOHA that executes the execution token stored in 'aloha.

This implies two things. First, if we run ALOHA, we should expect it to call HELLO:

fr> ALOHA
Hello ok.

And indeed it does. Secondly, if we store a different function pointer in 'aloha and then call ALOHA, we should expect to see a different result:

fr> ' GOODBYE 'aloha !
ok.
fr> ALOHA
Goodbye ok.

Here lies the power of execution tokens. Using a single variable, we can call multiple functions!

I'll also note that we called our variable 'aloha ("tick-aloha"). This convention of preceding the variable name with a ' is used in Forth to denote variables that store execution pointers.

For Forth programmers

There is a one major difference here between Forth and froth. froth is completely interpreted, and thus does not distinguish between variable definition lines and non-definition lines. What this means for Forth programmers is that the froth word ' is equivalent to the Forth word ['] when used in a definition context. froth cannot look at the input stream, and will instead always look at the next word on the execution stack. I'm thinking of changing this to be more consistent with Forth in the future, but I haven't gotten to it yet.

This implications of this are demonstrated by the following:

fr> : SAY ' 'aloha ! ;
ok.
fr> SAY HELLO
Error: can't find 'aloha
fr> : COMING ' HELLO 'aloha ! ;
ok.
fr> : GOING ' GOODBYE 'aloha ! ;
ok.
fr> COMING ALOHA GOING ALOHA
Hello Goodbye ok.

In traditional Forth, the GOING definition would be : GOING ['] GOODBYE 'aloha ! ;, and SAY HELLO would not produce an error. However, this isn't the case here. ['] is also provided as an alias for '. Again, I plan to change this eventually.

Words in this chapter

VARIABLE xxx: creates a variable named xxx
! ( n addr -- ): stores the value n at address addr
@ ( addr -- n ): copies the value at addr to the stack
? ( addr -- ): prints the value of addr
+! ( n addr -- ): adds the value n to the value at addr
CONSTANT xxx (n -- ): creates a constant called xxx that stores n; xxx returns n when called
ALLOT ( addr ncells -- ): allocates ncells cells at addr
CREATE xxx y1 , y2 , ... yn ,: creates an array xxx with values y1, y2, ... yn
CELLS ( n -- ): creates a memory address offset for arrays
FILL ( addr ncells val -- ): fills ncells cells of memory beginning at addr with val
ERASE ( addr ncells -- ): fills ncells cells of memory beginning at addr with 0
REALLOT ( addr ncells -- ): reallots array at addr to have size ncells.
EXTEND ( addr ncells -- ): extends the array at addr by ncells cells
LENGTH ( addr -- len ): pushes the length of the array at addr onto the stack
LENGTH? ( addr -- ): prints the length of the array at addr
' xxx ( -- addr ): attempts to find xxx in the dictionary, and pushes an execution token for xxx to the stack if found
EXECUTE ( xt -- ): executes an execution token on top of the stack
['] xxx ( -- addr ): currently equivalent to ' for froth