Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site utcsri.UUCP Path: utzoo!utcsri!greg From: greg@utcsri.UUCP (Gregory Smith) Newsgroups: net.lang,net.lang.forth Subject: Re: What's so good about FORTH? Message-ID: <2993@utcsri.UUCP> Date: Wed, 18-Jun-86 12:10:14 EDT Article-I.D.: utcsri.2993 Posted: Wed Jun 18 12:10:14 1986 Date-Received: Wed, 18-Jun-86 12:18:36 EDT References: <433@astroatc.UUCP> <5654@alice.uUCp> Reply-To: greg@utcsri.UUCP (Gregory Smith) Organization: CSRI, University of Toronto Lines: 81 Keywords: longish, but partly asm. code. Summary: how threaded code works In article <5654@alice.uUCp> ark@alice.UucP (Andrew Koenig) writes: >> FORTH appears to be the only major language that uses threaded code as the >> primary means of expressing algorithms internally. Other languages and >> applications do use threaded code but by no means even close to the extent >> that FORTH does. Threaded code is used for compactness and simplicity. > >Have a look at the Spitbol compilers sometime. Some DEC pdp-11 FORTRAN compilers do this too, usually as an option. For those interested, I will try to show the mechanics of this: Suppose the program is I=1 K=350 J=J+1 A=B CALL FOO The compiled output would be: .WORD MOI$1M,I ; move-int-1-to-mem .WORD MOI$IM,350,K ; move-int-immediate-to-mem .WORD ADI$1M,J ; add-int-1 to mem .WORD MOF$MM,A,B ; move-float-mem-to-mem .WORD CAL$,FOO ; call foo. There were a *lot* of operations defined, as you can imagine. This is executed with R4 pointing to the above list of words. The routines used are as follows ( more or less) MOI$1M: MOV #1,@(R4)+ JMP @(R4)+ MOI$IM: MOV (R4)+,@(R4)+ JMP @(R4)+ ADI$1M: INC @(R4)+ JMP @(R4)+ MOF$MM: MOV (R4)+,R0 ; get source MOV (R4)+,R1 ; dest MOV (R0)+,(R1)+ ; move one word MOV (R0),(R1) ; move the other JMP @(R4)+ CAL$: MOV (R4)+,R0 ; get subr. address MOV R4,-(SP) ; save threaded PC JSR PC,(R0) ; call it MOV (SP)+,R4 ; get R4 back JMP @(R4)+ Thanks to JMP @(R4)+ the threading overhead is minimal. However, for integer operations, this overhead is almost half of the instructions executed. The win comes with floating point stuff, especially double prec. and complex. The complex multiply A=B*C can be done by .WORD MOC$MS,B ; mov-cplx mem to stack .WORD MUC$MS,C ; mul-cplx mem to stack .WORD MOC$SM,A ; mov-cplx stack to mem. Of course, the advantage is only in code size. ( the definitions of MOC$MS, MUC$MS and MOC$SM are left as an exercise for the reader :-) ) Another interesting point is that JMP @(R4)+ does not modify condition codes, so comparison routines like CMI$MI ( CMP @(R4)+,(R4)+/ JMP @(R4)+) can set condition codes, to be used by a conditional threaded branch ( which does either MOV (R4),R4 if the branch is taken or TST (R4)+ if not). This would not be possible on, say, an 8080, where the code to go to the next handler would be fairly long and would trash the c. codes. Sorry for those of you who don't read PDP-11. A 68K can't do anything like JMP @(R4)+. It would have to do something like this at the end of each handler: MOVA.L (A6)+,A0 ; get next in A0 JMP (A0) ; go do it. Condition codes are still unaffected, but the threading overhead is now that much bigger. All of this is from memory, so I may have spelled a few of the threaded names wrong. This method can obviously be used by any language, and is especially attractive on horrible CPUs like the 6502 where in-line code is impractical (no 16-bit regs, except PC :-( ) -- "Shades of scorpions! Daedalus has vanished ..... Great Zeus, my ring!" ---------------------------------------------------------------------- Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg