Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!snorkelwacker!mit-eddie!mit-amt!adamk From: adamk@mit-amt.MEDIA.MIT.EDU (Adam Kao) Newsgroups: comp.lang.forth Subject: writing a C interpreter in Forth? Message-ID: <1326@mit-amt.MEDIA.MIT.EDU> Date: 6 Jan 90 01:08:36 GMT Reply-To: adamk@media-lab.media.mit.edu.UUCP (Adam Kao) Organization: MIT Media Lab, Cambridge MA Lines: 83 I've been seriously considering extending Forth into a C interpreter. I should be careful to distinguish this from writing a traditional interpreter/compiler, such as Dojun Yoshikami and Ian Green seem to be discussing. A traditional compiler is usually broken into modules something like the following: a lexical scanner that tokenizes the input a grammar/parser that places the tokens in a syntax tree a typechecker that looks at nodes in the syntax tree a code generator that looks at nodes in the syntax tree For details see Aho, Sethi, and Ullman's "Dragon Book" (not its real name). Since Dojun Yoshikami and Ian Green are using these terms, I assume they are using the standard compiler architecture as a starting point. But Forth's philosophy is that words do their own work, even compiling words. In my opinion this feature is not nearly emphasized enough. It is truly a unique feature to Forth; the only similar feature I know of is the continuation in Scheme. The Forth compiler shows us how to implement traditional control structures with words that execute themselves, using the current state of the stacks to record where they are. There is no monolithic compiler. Compiling words "know" what they do. The traditional compiler distinction of compiler as procedures versus source code as data does not exist in Forth. Therefore to extend Forth into a C interpreter means to use the Forth philosophy while writing a C compiler, defining words like if, {, goto, and so on. Ideally every word would look at the current state (current expression, unprocessed input stream, etc.) and perform the correct actions possible up to that point. There are some interesting issues that I have been thinking about: 1. The Forth "scanner" is rudimentary. Essentially it does only three things: interpret numbers, use whitespace as a separator, and make up words out of everything else. C has certain features that depend on a traditional scanner, most notably that non-alphanumeric tokens (eg +, ->, {) are automatically separated. Thus we must have a traditional scanner. 2. C uses infix notation. The Forth-style infix operators I have seen were all unwieldy or non-intuitive (I would love to be corrected on this). Thus, for expressions at the very least we will need a traditional parser. 3. Actually it is misleading to try and match C tokens with Forth words. Incomplete expressions generally cannot be parsed. There is a much better match between C statements and Forth words. Both specify complete actions and so can be compiled as a unit. This leads us to some simple design decisions. We can view ; and perhaps { } and , as fundamental separators. Upon reaching one of these, our more sophisticated scanner can eliminate whitespace and form a token list. We may still be able to associate actions with each token in this list, since they can now look ahead to the end of the statement and be guaranteed a complete expression. Thus we execute each token, with words like + leaving a reminder on some stack. I believe C control structures can be direct copies of Forth control structures, although I have not thought deeply about it. This is all preliminary speculation; I have not written a single line of code yet. I want to hear ideas and discussion, I want to know if anyone has tried this before. The most helpful responses will be those that start "The right way to do is . . . " Every reader of this group should know what advantages Forth would bring to C. The advantages of C for Forth are somewhat harder to quantify. Forth users may benefit from access to existing C code. Perhaps C programmers could be "lured" into a Forth environment this way, and then learn about Forth's advantages. And following the Forth tradition, I intend to make this interpreter half as big and twice as fast as any existing C interpreter or compiler! :-) But after all, the only real justification is that it is an interesting problem. I hope it interests you, as well. Adam