What, another Forth implementation
I have known of Forth for many years, but did little with it.
Why bother? Wasn’t Forth just an HP calculator on steroids?
Then in early 1996, I read History of Programming Languages - II,
by Bergin and Gibson. The chapter on Forth hooked me and
I decided it was time to learn Forth. I decided the best
way to truly understand the language was to build an implementation
myself, so I started coding one in C++.
I was about 10 hours into this project when I realized two things: (1) the world didn’t really need another low end x86 Forth, and (2) I had been looking for a project to do in Java for a while and maybe what the world really needed was a Java implementation of Forth.
I scrapped the C++ implementation and began a Java implementation. The first version was a hack designed to unearth all the problems I would have. I finished this in late 1996. After finishing, I went back and did a design for the implementation I am working on now. I expect this design to hold up.
Design Goals
It is misleading to speak of design goals as if I sat down
and decided up front what I wanted out of Misty Beach Forth.
I didn’t. I started with a general idea of what I wanted, but
no grand plan. Today, I am implementing Misty Beach Forth to
meet the following requirements:
Run as a Java applet using neither processor- nor OS-specific code
Part of what allows untrusted Java applets to run securely on a client
machine is a multi-level security mechanism. One of the tiers of this
mechanism performs a simple proof on each method in each Java class.
Among other things, this proof checks that all the operations are type-safe.
I want a Forth that runs with, or on top of, the JVM and that passes all the
security checks so that I can run Forth programs as applets. This requirement
contributes to the difficulty in implementing Forth directly on the JVM.
Forth allows words to have variable stack effects; the Java security model does not.
Comply with the ANS Forth standard
This is self explanatory.
Run multi-threaded Forth programs One of my interests is parallel programming, and I want a Forth that allows multi-threading.
Implement better defined semantics than typical Forths
Most current Forth implementations provide no protection against
indexing off the ends of arrays and generally stomping all over
memory. In a protected operating system like Unix or Windows NT,
the damage is contained to the Forth program and other programs
continue to execute. I want to contain the damage even further
because I dislike memory corruption bugs manifesting themselves
long after the offending corruption.
Provide a pleasant development environment
My programming background is primarily in Pascal, C, C++, and Java.
All of these languages have inexpensive implementations with IDEs,
source code debuggers, profiles, etc. I want Misty Beach Forth to
provide an equally pleasant and powerful development environment.
Run at speeds comparable to native Forths
Not surprisingly, this is the hardest goal to achieve and I am
unsure whether it is actually achievable. The restrictions
provided by the JVM, combined with the fact that Java applets
are not coded to the native instruction set of the processor
they run on are two hurdles to overcome. The largest barrier,
however, is my desire to implement a Forth with more defined
semantics than typical Forths. I may have to relax the precise
semantic requirment to get even close to the speed I desire.
Restrictions of the JVM Forth is stack based. Java is stack based. At first glance, implementing Forth on the JVM appears to be no more difficult than implementing Forth on other CPUs. However, unlike most other CPUs, the JVM is designed only to run Java programs and little consideration is paid to other programming languages. In this sense, the JVM is a special purpose CPU, much like Forth and Lisp engines. The clash between programming models comes in several flavors:
Java and the JVM are strongly typed, Forth is not
Most programming languages provide typing either for variables
or values. Some provide typing for both. Forth provides typing
for neither, relying instead on the programmer to know how to
interpret blocks of bits. Table One illustrates the differences:
Table One
Language | Typing | Example | Explanation |
C | variables | int x = 5; |
Declare a block of memory large enough to hold an integer, and treat the bits as if they are an integer unless told otherwise, e.g., (float)x. Other values assigned to this variable will be converted to an integer before being assigned, e.g., x = (int)9.45;. |
Smalltalk | values | x := 1. |
Declare a variable, x, and make it refer to a new object of type Number. If you tell the runtime to treat the Number object as if it were a Set, you get an error. You can make x refer to something else, and this something else can be a Set, e.g., x = Set new.. |
Java | variables and values |
Integer x = new Integer (1); |
Declare a variable, x, and make it reference a new object of type Integer. x can only reference objects of type Integer or objects of child classes of type Integer. Trying to make x reference a Set generates an error. You cannot tell the runtime to treat Integer as if it were a Set without getting an error. |
Forth | neither | VARIABLE X |
Declare a new variable, x. Then store an integer 1 value in it. It is just as legal to store float values or pointers or characters in x. Further, the integer stored in x can be treated as a pointer, a character or a floating-point number. The Forth runtime doesn't care. |
This typing for both variables and values contributes to the security provided by Java applets. The Forth approach of treating memory as a sea of bits that the programmer can interpret as desired maps poorly onto the Java model. Adding to the mismatch, the JVM implements the strong typing called for in the Java language. Writing the sort of untyped programs allowed by Forth is legal, but means that the resulting program and implementation will not pass the Java applet verifier and will not run as a Java applet.
Java and the JVM have no pointer math
Forth, like C, is built on pointer math. Simple arrays in
Forth are pointers that are indexed using pointer arithmetic.
An example of Forth code creating an array of two elements
and storing two values in it:
CREATE TESTARRAY 2 CELLS ALLOT 220 TESTARRAY ! 340 TESTARRAY CELL+ !
This has a simple equivalent in C:
int *p = (int *) malloc(2 * sizeof(int)); *p = 220; *(p + 1) = 340;
But the Java equivelent eliminates the pointer math:
int[] p = new int[2]; p[0] = 220; p[1] = 340;
This difference is not just syntactic sugar! Arrays in Java, unlike those in C, are objects with a well defined API. In C, this sort of rewrite improves readability, but does not change the underlying operations. In Java, the array orientation is necessary because p is not a pointer to memory, but instead is a reference to an array object. This can be worked around, with tremendous effort, but the resulting Java byte codes once again will not pass the Java applet verifier and will not even run on all JVMs.
The JVM stack can be discontinuous
While Java and the JVM are stack based, in the sense that operations
take place on a stack of operands, the JVM breaks the stack up into
stack frames, one frame per function call. These stack frames are
all independent. To quote from The Java Virtual Machine Specification:
Because the stack is never manipulated directly except to push and pop frames, it may actually be implemented as a heap, and Java frames may be heap allocated. The memory for a Java stack does not need to be contiguous.This causes no trouble for programs written in the Java programming language, since the Java language has no way to express the notion of accessing variables in a caller’s stack frame. It does, however, mean that a subroutine-threaded Forth implementation using the JVM stack is effectively impossible, and any Forth implementation will be hard pressed to use the JVM stack as the data stack.
The Java applet security mechanism does not allow variable stack effects
To pass the applet-verification process, functions and blocks can not
have variable stack effects. The depth of the stack upon leaving a
function or block must be a fixed value relative to its depth upon
entering the function or block. This restriction provides yet another
hindrance to using the JVM stack as the data stack for Misty Beach Forth.
Implementation details
Since it appeared that using the JVM stack as the data stack was
not possible, I decided to build a Forth implementation roughly
comparable to native-mode Forths written in C. The data and
return stacks are simple arrays. I don’t worry about optimal
register allocation.
Some implementation details were easy to decide. Java defines
the underlying virtual machine as a 32-bit, two’s complement,
big endian machine. This is a legal ANS configuration, so that
is what Misty Beach Forth uses. The next big decision was
deciding what Cell
would look like.
Several observations drove the implementation of Cell:
The second and third observations led to the creation of a Cell
that contained an integer an array (possibly null). Misty Beach Forth cells
look like this:
public class Cell { int intValue; Cell[] arrayValue; }The implementation of +, for example, adds the
intValue
element of two cells together.
The arrayValue
element is needed to deal
with address operations. To see how, we’ll walk through
the array example from above:
CREATE TESTARRAY 2 CELLS ALLOTCreate a new dictionary entry. The dictionary entry is an array of two
Cells
.
220 TESTARRAY !Place 220 on the stack:
intValue | arrayValue |
Place TESTARRAY
on the stack:
intValue | arrayValue |
reference toTESTARRAY Cell[] |
|
Set ‘the value of TESTARRAY’ to 220. This executes Java code that looks like this:
TESTARRAY.arrayValue[TESTARRAY.intValue / 4].intValue = 220; TESTARRAY.arrayValue[TESTARRAY.intValue / 4].arrayValue = null; Top Of Stack = Top Of Stack - 2;We divide by 4 in the array index, because cells are four bytes wide.
340 TESTARRAY CELL+ !Place 340 on the stack:
intValue | arrayValue |
Place TESTARRAY
on the stack:
intValue | arrayValue |
reference toTESTARRAY Cell[] |
|
Add the size of a Cell
to the value on the top of the stack:
intValue | arrayValue |
reference toTESTARRAY Cell[] |
|
Set ‘the value of TESTARRAY + 4’ to 340. This executes Java code that looks like this:
TESTARRAY.arrayValue[TESTARRAY.intValue / 4].intValue = 340; TESTARRAY.arrayValue[TESTARRAY.intValue / 4].arrayValue = null; Top Of Stack = Top Of Stack - 2;This approach works well for reasonable Forth programs. Adding two addresses, incrementing that value and then subtracting one of the original addresses produces a valid address in most native Forth implementations. In Misty Beach Forth, it may or may not produce a valid address. Hopefully, code of this nature will be rare.
The Misty Beach Forth architecture has the notion of a Forth Engine. This engine contains the data and return stacks, is responsible for tokenizing input strings, maintains the current state (interpreting, compiling) and contains the dictionary. The engine is almost completely ignorant of the variously defined words. With some work, I expect to make the engine ignorant of all the defined words and thus completely decoupled from them.
The object-oriented nature of Java provides a framework for implementing the
Forth words. Forth words can be thought of as objects with both data and code.
Misty Beach Forth implements each Forth word as an object of a Java class
descending from a common base class, ForthWord
. Each word
contains methods to:
Table Two. Speed of stack operations
: INNER 10000 0 DO 34 DROP LOOP ; : OUTER 10000 0 DO INNER LOOP ; OUTER
Forth Implementation | Time in seconds |
Jax4th 1.25 | 144 |
Misty Beach Forth V0.30 on Netscape 3.x | 92 |
Misty Beach Forth V0.30 on Internet Explorer 3.x | 67 |
Misty Beach Forth V0.30 with Symantec JIT 2.0.54 | 59 |
Win32Forth 3.2 build 0819 | 59 |
Misty Beach Forth V0.30 on Netscape 4.x | 54 |
LMI Forth (no optimizations) | 27 |
LMI Forth (optimized with NCC and COMPILE:) | 7 |
Table Three. Speed of variable operations
VARIABLE TEMP : INNER 10000 0 DO 34 TEMP ! LOOP ; : OUTER 10000 0 DO INNER LOOP ; OUTER
Forth Implementation | Time in seconds |
Misty Beach Forth V0.30 on Netscape 3.x | 203 |
Jax4th 1.25 | 186 |
Misty Beach Forth V0.30 with Symantec JIT 2.0.54 | 144 |
Misty Beach Forth V0.30 on Internet Explorer 3.x | 133 |
Misty Beach Forth V0.30 on Netscape 4.x | 116 |
Win32Forth 3.2 build 0819 | 68 |
LMI Forth (no optimizations) | 35 |
LMI Forth (optimized with NCC and COMPILE:) | 3 |
All tests ran on a 90 MHz Pentium with 64 MB RAM running Windows NT 4.0 with no patches applied. No other applications were running.
I selected the native code Forths, based on the belief that Jax4th was a simple shareware Forth (a few hundred man hours of effort), Win32Forth was a serious shareware Forth (thousands of man hours of effort) and LMI was a serious commercial Forth implementation. What I did not realize was that, while Win32Forth is a serious shareware Forth, it emphasizes maximum functionality instead of speed. Serious shareware Forths emphasizing speed should perform much better.
The variable operation time for Misty Beach Forth (Table Three) is much worse than the stack operation time (Table 2) . This is primarily caused by the extra indirection and memory accesses introduced to provide the better semantics I want. Preliminary tests indicate that the stack operation time can be brought down to about 15 seconds (half as fast as LMI) and the variable operation time can be brought down to about 18 seconds if I relax the safety requirment and implement a few other speed optimizations. I expect to investigate this over the next few months. An environment allowing safer runtime during test and debug with the option to convert over to a faster, but more dangerous, runtime environment later might be the best mix of speed and safety.
Conclusion
I am pleasantly surprised at how fast Misty Beach Forth runs without
serious performance tuning, and expect that, with sufficient work, it
can eventually run within a factor of two of serious native-mode Forths.
As Java Just-In-Time compiler technology improves, the performance gap
may be even lower. One of the advantages that Misty Beach Forth has is
that companies like Borland, Microsoft, Sun and Symantec are pouring
millions of dollars into improving their Java Just-In-Time compilers.
As these improve, Misty Beach Forth’s speed improves as well. Misty
Beach Forth’s speed has already seen vast improvement going from
Netscape 3.x to Netscape 4.x, and future improvements seem certain.
Misty Beach Forth can be found at: http://www.mistybeach.com
References
The Java Virtual Machine, Tim Lindholm and Frank Yellin.
Addison Wesley, 1997, ISBN 0-201-63452-X
Smalltalk/V Tutorial and Programming Handbook, Digitalk, Inc. 1987 (user manual, no author or ISBN number)
History of Programming Languages - II, Thomas J. Bergin, Jr. And Richard G. Gibson, Jr. Editors, Addison Wesley, 1996, ISBN 0-201-89502-1