Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site utcsri.UUCP
Path: utzoo!utcsri!greg
From: greg@utcsri.UUCP (Gregory Smith)
Newsgroups: net.lang,net.lang.forth
Subject: Re: What's so good about FORTH?
Message-ID: <2993@utcsri.UUCP>
Date: Wed, 18-Jun-86 12:10:14 EDT
Article-I.D.: utcsri.2993
Posted: Wed Jun 18 12:10:14 1986
Date-Received: Wed, 18-Jun-86 12:18:36 EDT
References: <433@astroatc.UUCP> <5654@alice.uUCp>
Reply-To: greg@utcsri.UUCP (Gregory Smith)
Organization: CSRI, University of Toronto
Lines: 81
Keywords: longish, but partly asm. code.
Summary: how threaded code works

In article <5654@alice.uUCp> ark@alice.UucP (Andrew Koenig) writes:
>> FORTH appears to be the only major language that uses threaded code as the
>> primary means of expressing algorithms internally. Other languages and
>> applications do use threaded code but by no means even close to the extent
>> that FORTH does. Threaded code is used for compactness and simplicity.
>
>Have a look at the Spitbol compilers sometime.

Some DEC pdp-11 FORTRAN compilers do this too, usually as an option.
For those interested, I will try to show the mechanics of this:

Suppose the program is

	I=1
	K=350
	J=J+1
	A=B
	CALL FOO

The compiled output would be:
	.WORD	MOI$1M,I	; move-int-1-to-mem
	.WORD	MOI$IM,350,K	; move-int-immediate-to-mem
	.WORD	ADI$1M,J	; add-int-1 to mem
	.WORD	MOF$MM,A,B	; move-float-mem-to-mem
	.WORD	CAL$,FOO	; call foo.

There were a *lot* of operations defined, as you can imagine.
This is executed with R4 pointing to the above list of words. The
routines used are as follows ( more or less)
MOI$1M:	MOV	#1,@(R4)+
	JMP	@(R4)+
MOI$IM:	MOV	(R4)+,@(R4)+
	JMP	@(R4)+
ADI$1M:	INC	@(R4)+
	JMP	@(R4)+
MOF$MM:	MOV	(R4)+,R0	; get source
	MOV	(R4)+,R1	; dest
	MOV	(R0)+,(R1)+	; move one word
	MOV	(R0),(R1)	; move the other
	JMP	@(R4)+
CAL$:	MOV	(R4)+,R0	; get subr. address
	MOV	R4,-(SP)	; save threaded PC
	JSR	PC,(R0)		; call it
	MOV	(SP)+,R4	; get R4 back
	JMP	@(R4)+

Thanks to JMP @(R4)+ the threading overhead is minimal. However, for
integer operations, this overhead is almost half of the instructions
executed.  The win comes with floating point stuff, especially double
prec. and complex. The complex multiply A=B*C  can be done by

	.WORD	MOC$MS,B	; mov-cplx mem to stack
	.WORD	MUC$MS,C	; mul-cplx mem to stack
	.WORD	MOC$SM,A	; mov-cplx stack to mem.

Of course, the advantage is only in code size. ( the definitions of
MOC$MS, MUC$MS and MOC$SM are left as an exercise for the reader :-) )
Another interesting point is that JMP @(R4)+ does not modify condition
codes, so comparison routines like CMI$MI ( CMP @(R4)+,(R4)+/ JMP @(R4)+)
can set condition codes, to be used by a conditional threaded branch
( which does either MOV (R4),R4 if the branch is taken or TST (R4)+ if not).
This would not be possible on, say, an 8080, where the code to go to
the next handler would be fairly long and would trash the c. codes.

Sorry for those of you who don't read PDP-11. A 68K can't do anything
like JMP @(R4)+. It would have to do something like this at the end of
each handler:
	MOVA.L	(A6)+,A0	; get next in A0
	JMP	(A0)		; go do it.
Condition codes are still unaffected, but the threading overhead is
now that much bigger.
All of this is from memory, so I may have spelled a few of the threaded
names wrong.
This method can obviously be used by any language, and is especially
attractive on horrible CPUs like the 6502 where in-line code is
impractical (no 16-bit regs, except PC :-( )

-- 
"Shades of scorpions! Daedalus has vanished ..... Great Zeus, my ring!"
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg