Path: utzoo!utgpu!watmath!att!rutgers!ucsd!sdcc6!ir230
From: ir230@sdcc6.ucsd.edu (john wavrik)
Newsgroups: comp.lang.forth
Subject: Forth Implementation (long)
Keywords: Forth implementation
Message-ID: <5172@sdcc6.ucsd.edu>
Date: 17 Nov 89 01:37:23 GMT
Organization: University of California, San Diego
Lines: 543


Bill Bouma writes:

>   I fail to see what all this has to do with implementing forth in
>   assembly language? I see no reason a forth interpreter written in
>   any other language could not be made to behave identically to one
>   written in assembly! 

>   Also, I have to disagree with the rest of what was said because the
>   main thing that distinguishes forth from other languages is the 
>   parameter stack. Other languages (yes, even C) provide means to 
>   build new data types using structures. One can build new operations
>   by writing functions. A language which does not allow the user to add
>   to it would not be very useful!

I'd like to point out to everyone how this exchange began: As a person who has 
a lot of experience with Forth, I received from a "proud papa" a version of 
Forth written in 'C' (I will refer to it as XFORTH). It was lacking some 
critical words (in particular the double precision wordset) and had others 
improperly implemented (like UM*), which prevented me from testing it on 
existing applications. I went to add the missing words and fix the errors (a 
trivial task on traditional Forth's). When I tried, I found that mechanisms to 
add the words were lacking too. This inspired some thinking about the problems 
inherent in trying to implement Forth using other languages. The precise 
statement of mine was: 

# I'd be happy to see a full Standard version of Forth implemented in 'C' 
# which allows a user the same control as he enjoys in an assembly language 
# version (in particular the ability to add new primitives without 
# understanding 'C', the 'C' source code for the implementation, and how his 
# 'C' compiler works). Failing this, I can only conclude that the purpose of 
# implementing Forth in 'C' is to provide 'C' programmers with a toy version 
# of Forth. 

It's a shame that people homed in on the second sentence rather than the 
first.
                                     ----

Before we go any further, I'd like to take a vote:
   All Forth programmers who believe that "the main thing that
     distinguishes Forth from other languages is the parameter stack"
     please raise your hand.
   [see Bill, none of them do!]
        
                                     ----

   What would a 'C' programmer do if he received from a friend a program which 
used declared variables of type "long" [note for Forth programmers: this is 
what they call double precision integers] and his 'C' compiler does not 
support this type?

              a.  Ask your friend to recode her program
              b.  Buy a new 'C' compiler
              c.  Modify your 'C' compiler, even though you
                  don't have the source code
              d.  Write a note to comp.lang.c telling everyone
                  that you have a "toy" compiler
              e.  Let off steam by writing to comp.lang.forth
                  and harassing the Forth programmers

Most 'C' programmers would do b. and e. (they have too much pride to do a. or 
d.)  Almost any Forth programmer (using a traditional implementation of Forth) 
would do the equivalent of c. 

Please note that the problem is to add the data type "long" in such a way that 
it has the same status as built-in types. Treating a "long" as an array or a 
"struct" and defining functions like long_add would not do!

[to Pascal programmers: a struct is what they call a record]
[to Forth programmers: we don't have one, but by the time someone has finished 
explaining what it is, you'll have figured out how to make one]
 
You want to read, without alteration, the source code of a person who has 
"long" as a primary type. She has declarations like:
 
                              long x,y,z;
and statements like:
                               z = x + y;
or worse:
                               z = x + w;
(with w an "int").

Her compiler recognizes "long" as a primary type (and handles the attendant 
overloading of symbols and coercion), yours doesn't! 

[to 'C' programmers:  in order not to lose you for the rest of this message,
take my word for it: you can't add a new structure like this without (1) 
changing the compiler, (2) using your compiler to write a new compiler, (3) 
writing a pre-processor -- which is almost as hard as (2). Please read on and 
convince yourself later] 

                                    -------

Let's look at the task of defining a double-precision addition in Forth. Here 
is the definition in F-83 due to Henry Laxen:
 
\ 32 bit Arithmetic Operations                    11OCT83HHL
    CODE D+  (S d1 d2 -- dsum )                                     
      AX POP   DX POP   BX POP   CX POP   CX DX ADD   BX AX ADC     
      2PUSH END-CODE                                                
 
Besides being as fast as possible, this definition is quite simple. I would 
rank this as very low in difficulty -- ANY Forth programmer could do a 
definition like this. (I think Henry Laxen is a very imaginative and creative 
person -- but I don't think this is an example. While I don't teach assembly 
language in my course, any of the students who pass my course would be capable 
of doing this as one problem on a weekly homework (part of the problem being 
to learn enough about assembly language to do it).

For comparison, let's look at the task of doing this in high-level Forth.
The definition of D+ itself is simple -- we want to add the low order words 
together and then add the high order words WITH THE CARRY FROM THE LOW ORDER:

    : D+  ( d1 d2 -- d3 )  ROT >R >R  ( high order on top )  
          0 ADC  R> R> ROT  ADC  DROP  ;                     

Unfortunately, high level Forth (just like high level anything else) does not 
provide us access to the carry flag -- so we must define the word
      ADC ( n1 n2 c -- n3 c' ;;; add with carry )
which adds with a carry but does not use the hardware carry flag. In the code 
below, we have synthesized this by treating 16-bit addition as double-
precision 8-bit addition. To make things easier to understand, we introduce a 
collection of pseudo-registers to hold the low and high bytes of intermediate 
numbers. [To Forth programmers:  I know I've used too many variables, but I'm 
trying to make a point]  The word SPLIT ( n i -- ) splits the 16-bit integer n 
into high and low bytes and stores these in the i-th slots of the arrays.  The 
only other thing to explain is that << and >> are shift words provided in 
XForth:  
    <<  ( n k -- n' )  n' is n (logically) shifted left by k bits. 

CREATE LOW0  4 CELLS ALLOT                         
CREATE HIGH0 4 CELLS ALLOT                         
: 'LOW  ( i -- addr-low )  1- CELLS LOW0  + ;      
: 'HIGH ( i -- addr-high ) 1- CELLS HIGH0 + ;      
: LOW   'LOW @  ;                                  
: HIGH  'HIGH @ ;                                  

: SPLIT ( n i --  ;;; store )  >R  DUP 255 AND     
         R@ 'LOW !  8 >> R> 'HIGH !  ;             
                                                   
: ADC  ( n1 n2 c -- n3 c' ;;; add with carry )     
     >R  2 SPLIT 1 SPLIT   1 LOW 2 LOW + R> +      
     3 SPLIT   \  3 LOW has sum of low bytes,  3 HIGH has carry
     1 HIGH 2 HIGH +  3 HIGH +
     4 SPLIT   \  4 LOW has sum of high bytes, 4 HIGH has carry
     4 LOW 8 <<  3 LOW +  4 HIGH  ;                

In contrast to the low-level solution this is slow, complicated, and ugly.
(I wrote it to see what would be involved and make no claims that optimal in 
speed or beauty.) It runs about 60 times slower than the assembly language 
when both are run under F-83. It is 240 times slower when running on XForth.
It is not something an average Forth programmer can be expected to do (it 
takes a brown belt at least). 

>   The thing that bothers me about this is that you seem to be implying
>   that the user knows the host assembly language? If not, then what is
>   the difference between having your forth core written in assembly, or
>   having it written in basic? Speed!

I think that the misconception is thinking of Forth as a program which is 
written in assembly language. Most programming languages produce a program 
which is compiled to leave behind "THE RESULT". It is immaterial to someone 
who runs "THE RESULT" which language was used to produce it. The choice of
language is more a matter of convenience to the programmer.

Forth is the assembly language for a virtual machine which is mapped to an 
underlying real processor. It is a combination extensible compiler, extensible 
language, and extensible operating system WHICH IS MOSTLY WRITTEN IN ITSELF. 

What I have called "traditionally implemented Forth" is a version of Forth in 
which a great many words written in Forth itself are mixed with a few words 
written in the assembly language of the host processor. I have included the 
entire double-precision wordset for F-83 as an appendix. It has a higher 
proportion of assembly language words than normal. It should nevertheless be 
clear that the definitions of all words, both colon and code, are short and 
simple. A more typical illustration is the I/O interface (including keyboard, 
video, and files) which, in F-83, consists entirely of high level Forth words 
based upon a single code primitive: 
 
     BDOS     Load up the registers and do a DOS system call.
        return the result placed in the A register on the    
        stack.                                               

CODE BDOS  (S n fun -- m )                          
  AX POP   AL AH MOV   DX POP   33 INT   AH AH SUB  
 1PUSH END-CODE                                     

(INT for "interrupt" is a call to either the BIOS ROM or the operating
system. There are a number of books that list all the available interrupt
calls stating what they do, what has to be loaded into which registers, and
what is in which registers upon return. Any Forth programmer who knows how to 
read can integrate OS hooks into a Forth program provided that he knows which 
registers are used by Forth and how the stack is implemented. In F-83, the 
machine stack is the Forth parameter stack, BP is the return stack pointer, 
and SI is the Forth instruction pointer -- everything else is free within a 
word)

Traditional Forth is not "implemented in assembly language". It is really 
implemented in Forth -- but Forth is so closely mapped to the underlying 
processor that key words can be easily defined in terms of the host's 
assembly language. In addition to the use of assembly language, a major 
advantage of the traditional implementation is that it provides the user with 
access to the details of the implementation. The fact that a user can extend 
the compiler as part of programming has a *great* deal to do with ease of 
programming. 

It is important to understand that the Forth approach to problem solving is 
radically different than the conventional approach. In Forth we produce a 
problem-oriented application language and then attack the problem. Thus Forth 
is more a toolkit for building languages than a language.

It would be unfair to say that a great deal of Forth programming requires the 
use of assembly language or access to the implementation. But it would be fair 
to say that the 5% or 10% which does use them is responsible for many of the 
claims made by Forth programmers that they can easily achieve things that 
would be difficult (or impossible) to do in conventional languages. 

Please do not assume that it takes a great deal of assembly language knowledge 
to make major alterations in a traditionally implemented Forth. Several years 
ago, when I used Kitt Peak VAX-Forth for my class, I wanted to make it look 
like the dialect of Forth found in the first edition of Brodie's "Starting 
Forth". Kitt Peak was a traditionally implemented overgrown FIG-Forth and I 
had to make it look like Brodie's anticipation of Forth-83. The good people at 
Kitt Peak had found (in 1982) ways to make their 32-bit VAX version compatible 
with their 16-bit PDP-11 version. Their Forth included "everything but the 
kitchen sink" -- in particular floating point and double-precision floating 
point numbers and a lot of "better ideas" like prefacing all byte operations 
by "b" rather than "c"). I had to add a few missing words ( <CMOVE, I' and SP! 
as I recall ) using assembly language -- and, believe me, I do not know VAX 
assembly language! I also had to undo most of their "better ideas" -- I had to 
change their "endif" to "then", their "b@" to "c@", etc. 

Forth programmers take doing things like this for granted (I don't think I've 
ever mentioned it to anyone before). But in the world of conventional 
languages, it would be effectively the same as a programmer fixing his Turbo-C 
so that it works just like Microsoft-C. (The fact that altering a 'C' compiler 
is in fact several orders of magnitude more difficult is part of the point I'm 
making.) 

>  Why not? What do registers have to do with it? If you are writing from
>  "within the forth environment" you are writing forth, right? Then why
>  does it matter what that forth was written in?

I hope I've answered this.


                                                  John J Wavrik 
             jjwavrik@ucsd.edu                    Dept of Math  C-012 
                                                  Univ of Calif - San Diego 
                                                  La Jolla, CA  92093 


P.S.  I don't understand how this newsgroup has become split into two 
      subgroups. On the one hand we have the people who want to know how many 
      pins will be on the RTX-3000 (and Phil Koopman who replies by supplying 
      the phone number of Harris, Inc). On the other, we have 'C' programmers 
      who are intent at modifying a language they haven't bothered to 
      understand.
      
      Apparently the people who are doing interesting and portable things in 
      Forth are too busy to take the time to tell us about them and post some 
      source code. 


============================================================================
                  APPENDIX  -- F-83 DOUBLE PRECISION WORDSET
                      shadow screen comments are indented


     \ 16 bit Arithmetic Operations   Unsigned Multiply    26Sep83map
     You could write a whole book about multiplication and division,
     and in fact Knuth did.  Suffice it to say that UM* is the basic
     multiplication primitive in Forth.  It takes two unsigned 16 bit
     integers and returns an unsigned 32 bit result.  All other
     multiplication functions are derived from this primitive one.

     It probably isn't particularly fast or elegant, but that is
     because I never liked arithmetic and I stole this implementation
     from FIG Forth anyway.

     U*D is a synonym for UM*

\ 16 bit Arithmetic Operations   Unsigned Multiply    22Aug83map
CODE UM*      (S n1 n2 -- d )
  AX POP   BX POP   BX MUL   DX AX XCHG   2PUSH END-CODE

: U*D   (S n1 n2 -- d )   UM*   ;

     \ 16 bit Arithmetic Operations   Division subroutines 05MAR83HHL

     These are various subroutines used by the division primitive in
     Forth, namely U/.  Again I must give credit for them to FIG
     Forth, since if I can't even understand multiply,
     divide would be completely hopeless.


     \ 16 bit Arithmetic Operations   Unsigned Divide      05MAR83HHL
     UM/MOD
        This is the division primitive in Forth.  All other division
        operations are derived from it.  It takes a double number,
        d1, and divides by by a single number n1.  It leaves a
        remainder and a quotient on the stack.  For a clearer
        understanding of arithmetic consult Knuth Volume 2 on
        Seminumerical Algorithms.


\ 16 bit Arithmetic Operations   Unsigned Divide      22Aug83map
CODE UM/MOD   (S d1 n1 -- Remainder Quotient )
  BX POP   DX POP   AX POP   BX DX CMP   >=  ( divide by zero? )
  IF   -1 # AX MOV   AX DX MOV   2PUSH   THEN
  BX DIV   2PUSH END-CODE


     \ 16 bit Comparison Operations                        05MAR83HHL
     0=
       Returns True if top is zero, False otherwise.
     0<
       Returns true if top is negative, ie sign bit is on.
     0>
       Returns true if top is positive.
     0<>
       Returns true if the top is non-zero, False otherwise.
     =
        Returns true if the two elements on the stack are equal,
        False otherwise.
     <>   Returns true if the two element are not equal, else false.
     ?NEGATE   Negate the second element if the top is negative.


\ 16 bit Comparison Operations                        04OCT83HHL
ASSEMBLER  LABEL YES     TRUE # AX MOV   1PUSH
           LABEL NO     FALSE # AX MOV   1PUSH
CODE 0=      (S n -- f )
   AX POP   AX AX OR   YES JE   NO #) JMP END-CODE
CODE 0<      (S n -- f )
   AX POP   AX AX OR   YES JS   NO #) JMP END-CODE
CODE 0>   (S n -- f )
   AX POP   AX AX OR   YES JG   NO #) JMP END-CODE
CODE 0<>  (S n -- f )
   AX POP   AX AX OR   YES JNE  NO #) JMP END-CODE
CODE =       (S n1 n2 -- f )
   AX POP   BX POP   AX BX CMP   YES JE    NO #) JMP END-CODE
: <>         (S n1 n2 -- f )    = NOT   ;
: ?NEGATE    (S n1 n2 -- n3 )   0< IF    NEGATE   THEN   ;

     \ 16 bit Comparison Operations                        27Sep83map
     U< Compare the top two elements on the stack as unsigned
        integers and return true if the second is less than the
        first.  Be sure to use U< whenever comparing addresses, or
        else strange things will happen beyond 32K.
     U> Compare the top two elements on the stack as unsigned
        integers.  True if n1 > n2 unsigned.
     <  Compare the top two elements on the stack as signed
        integers and return true if n1 < n2.
     >  Compare the top two elements on the stack as signed
        integers and return true if n1 > n2.

     MIN     Return the minimum of n1 and n2
     MAX     Return the maximum of n1 and n2
     BETWEEN  Return true if min <= n1 <= max, otherwise false.
     WITHIN   Return true if min <= n1 < max, otherwise false.

\ 16 bit Comparison Operations                        11OCT83HHL
ASSEMBLER  LABEL YES     TRUE # AX MOV   1PUSH
CODE   U<   (S n1 n2 -- f )
   AX POP   BX POP   AX BX CMP   YES JB    NO #) JMP END-CODE
CODE   U>   (S n1 n2 -- f )
   AX POP   BX POP   BX AX CMP   YES JB    NO #) JMP END-CODE
CODE <   (S n1 n2 -- f )
   AX POP   BX POP   AX BX CMP   YES JL    NO #) JMP END-CODE
CODE >   (S n1 n2 -- f )
   AX POP   BX POP   AX BX CMP   YES JG    NO #) JMP END-CODE
: MIN   (S n1 n2 -- n3 )   2DUP > IF   SWAP   THEN   DROP   ;
: MAX   (S n1 n2 -- n3 )   2DUP < IF   SWAP   THEN   DROP   ;
: BETWEEN   (S n1 min max -- f ) >R  OVER > SWAP R> > OR NOT ;
: WITHIN   (S n1 min max -- f )   1- BETWEEN  ;


     \ 32 bit Memory Operations                            09MAR83HHL
     2@
        Fetch a 32 bit value from addr.

     2!
        Store a 32 bit value at addr.

\ 32 bit Memory Operations                            13Apr84map
CODE 2@     (S addr -- d )
   BX POP   0 [BX] AX MOV   BX INC   BX INC   0 [BX] DX MOV
   2PUSH  END-CODE

CODE 2!     (S d addr -- )
   BX POP   0 [BX] POP   BX INC   BX INC  0 [BX] POP
   NEXT  END-CODE


     \ 32 bit Memory and Stack Operations                  26Sep83map
     2DROP
        Drop the top two elements of the stack.
     2DUP
        Duplicate the top two elements of the stack.
     2SWAP
        Swap the top two pairs of numbers on the stack.  You can use
        this operator to swap two 32 bit integers and preserve
        their meaning as double numbers.
     2OVER
        Copy the second pair of numbers over the top pair.  Behaves
        like 2SWAP for 32 bit integers.
     3DUP    Duplicate the top three elements of the stack.
     4DUP    Duplicate the top four elements of the stack.
     2ROT    rotates top three double numbers.

\ 32 bit Memory and Stack Operations                  11OCT83HHL
CODE 2DROP     (S d -- )
   AX POP   AX POP   NEXT END-CODE
CODE 2DUP     (S d -- d d )
   AX POP   DX POP   DX PUSH   AX PUSH   2PUSH END-CODE
CODE 2SWAP     (S d1 d2 -- d2 d1 )
   CX POP   BX POP   AX POP   DX POP
   BX PUSH  CX PUSH  2PUSH END-CODE
CODE 2OVER      (S d2 d2 -- d1 d2 d1 )
   CX POP   BX POP   AX POP   DX POP   DX PUSH   AX PUSH
   BX PUSH  CX PUSH  2PUSH END-CODE

: 3DUP  (S a b c -- a b c a b c )        DUP 2OVER ROT   ;
: 4DUP  (S a b c d -- a b c d a b c d )  2OVER 2OVER   ;
: 2ROT  (S a b c d e f - c d e f a b )   5 ROLL  5 ROLL  ;

     \ 32 bit Arithmetic Operations                        05MAR83HHL
     D+
        Add the two double precision numbers on the stack and
        return the result as a double precision number.


     DNEGATE
        Same as NEGATE except for double precision numbers.


     S>D
        Take a single precision number and make it double precision
        by extending the sign bit to the upper half.
     DABS
        Return the absolute value of the 32 bit integer on the stack

\ 32 bit Arithmetic Operations                        11OCT83HHL
CODE D+  (S d1 d2 -- dsum )
  AX POP   DX POP   BX POP   CX POP   CX DX ADD   BX AX ADC
  2PUSH END-CODE
CODE DNEGATE  (S d# -- d#' )
  BX POP   CX POP   AX AX SUB   AX DX MOV
  CX DX SUB   BX AX SBB   2PUSH END-CODE
CODE   S>D      (S n -- d )
   AX POP   CWD   AX DX XCHG   2PUSH   END-CODE

CODE DABS   (S d# -- d# )
   DX POP DX PUSH   DX DX OR   ' DNEGATE @-T JS   NEXT  END-CODE

     \ 32 bit Arithmetic Operations                        06Apr84map
     D2*
        32 bit left shift.
     D2/
        32 bit arithmetic right shift. Equivalent to divide by 2.

     D-   Subtract the two double precision numbers.
     ?DNEGATE    Negate the double number if the top is negative.


\ 32 bit Arithmetic Operations                        06Apr84map
CODE D2*   (S d -- d*2 )
   AX POP  DX POP  DX SHL  AX RCL   2PUSH  END-CODE
CODE D2/   (S d -- d/2 )
   AX POP  DX POP  AX SAR  DX RCR   2PUSH  END-CODE

: D-    (S d1 d2 -- d3 )   DNEGATE D+   ;
: ?DNEGATE  (S d1 n -- d2 )     0< IF   DNEGATE   THEN   ;

     \ 32 bit Comparison Operations                        01Oct83map
     D0=     Compare the top double number to zero.  True if d = 0
     D=      Compare the top two double numbers.  True if d1 = d2
     DU<     Performs unsigned comparison of two double numbers.
     D<      Compare the top two double numbers.  True if d1 < d2
     D>      Compare the top two double numbers.  True if d1 > d2
     DMIN    Return the lesser of the top two double numbers.
     DMAX    Return the greater of the the top two double numbers.

\ 32 bit Comparison Operations                        01OCT83MAP
: D0=   (S d -- f )        OR 0= ;
: D=    (S d1 d2 -- f )    D-  D0=  ;
: DU<   (S ud1 ud2 -- f )   ROT SWAP 2DUP U<
   IF   2DROP 2DROP TRUE
   ELSE  <> IF   2DROP FALSE  ELSE  U<  THEN
   THEN  ;
: D<    (S d1 d2 -- f )   2 PICK OVER =
   IF   DU<   ELSE  NIP ROT DROP <  THEN  ;
: D>    (S d1 d2 -- f )    2SWAP D<   ;
: DMIN  (S d1 d2 -- d3 )   4DUP D> IF   2SWAP   THEN   2DROP ;
: DMAX  (S d1 d2 -- d3 )   4DUP D< IF   2SWAP   THEN   2DROP ;

     \ Mixed Mode Arithmetic                               27Sep83map
     This does all the arithmetic you could possibly want and even
     more.  I can never remember exactly what the order of the
     arguments is for any of these, except maybe * / and MOD, so I
     suggest you just try it when you are in doubt.  That is one
     of the nice things about having an interpreter around, you can
     ask it questions anytime and it will tell you the answer.

     *D  multiplys two singles and leaves a double.
     M/MOD  divides a double by a single, leaving a single quotient
        and a single remainder. Division is floored.
     MU/MOD  divides a double by a single, leaving a double quotient
        and a single remainder. Division is floored.


\ Mixed Mode Arithmetic                               04OCT83HHL
: *D   (S n1 n2 -- d# )
   2DUP  XOR  >R  ABS  SWAP  ABS  UM*  R>  ?DNEGATE  ;
: M/MOD   (S d# n1 -- rem quot )
   ?DUP
   IF  DUP >R  2DUP XOR >R  >R DABS R@ ABS  UM/MOD
     SWAP R> ?NEGATE
     SWAP R> 0< IF  NEGATE OVER IF  1- R@ ROT - SWAP  THEN THEN
     R> DROP
   THEN  ;

: MU/MOD  (S d# n1 -- rem d#quot )
   >R  0  R@  UM/MOD  R>  SWAP  >R  UM/MOD  R>   ;

     \ 16 bit multiply and divide                          27Sep83map

     */ is a particularly useful operator, as it allows you to
     do accurate arithmetic on fractional quantities.  Think of
     it as multiplying n1 by the fraction n2/n3.  The intermediate
     result is kept to full accuracy.  Notice that this is not the
     same as * followed by /.  See Starting Forth for more examples.

\ 16 bit multiply and divide                          04OCT83HHL
: *   (S n1 n2 -- n3 )   UM* DROP   ;
: /MOD  (S n1 n2 -- rem quot )   >R  S>D  R>  M/MOD  ;
: /     (S n1 n2 -- quot )   /MOD  NIP  ;
: MOD   (S n1 n2 -- rem )    /MOD  DROP  ;
: */MOD  (S n1 n2 n3 -- rem quot )
   >R  *D  R>  M/MOD  ;
: */    (S n1 n2 n3 -- n1*n2/n3 )     */MOD  NIP  ;