Xref: utzoo comp.arch:17183 comp.lang.misc:5179
Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!icdoc!qmw-cs!eliot
From: eliot@cs.qmw.ac.uk (Paul Davison (postmaster))
Newsgroups: comp.arch,comp.lang.misc
Subject: Re: It looks like he's at it again!
Message-ID: <2518@sequent.cs.qmw.ac.uk>
Date: 17 Jul 90 12:45:00 GMT
References: <2328@l.cc.purdue.edu> <1990Jul10.072443.4844@cs.UAlberta.CA> <37569@ucbvax.BERKELEY.EDU> <2358@l.cc.purdue.edu>
Reply-To: eliot@cs.qmw.ac.uk (Eliot Miranda)
Organization: Computer Science Dept, QMW, University of London, UK.
Lines: 156

I use threaded code in my dynamic translation Smalltalk virtual machine,
but its written in C.  I use a single asm statement to do the 'jump to next
threaded opcode' & I use simple sed scripts on assembler code to turn C
procedures into threaded opcodes.

I've found this combination provides portability & speed.  It took half a day
to port the machine independent parts of the vm (i.e. not the graphics)
from SUN 3 (mc68k) to SUN 4 (sparc).

In straight C we could define a threaded code interpreter thus:

	void	(**tcip)();	/* threaded code instruction pointer */

	void	inner_interpreter()
	{
		do
			(**tcip++)();
		while (1);
	}
where an opcode could be written
	void	do_something()
	{
		....
		return;
	}

The resulting system would probably spend most of its time doing return/call
pairs.
With a little extra effort (& help from a good C compiler) we can eliminate
the call/return pairs & build a conventional threaded code interpreter that
jumps from routine to routine but is still written in C (well 99.9% anyway).

Here's how it works:

I use GCC so I can declare some oft used global variables in registers:
On mc68k:
	register OOP	*stackPointer asm("a3");
	register TCODE	*tcip asm("a5");
On sparc:
	register OOP	*stackPointer asm("%g5");
	register TCODE	*tcip asm("%g7");

Threaded code is a sequence of 32 bit words organized as 
	<pointer to threaded opcode (C procedure)>
	<operand>
	<pointer to threaded opcode (C procedure)>
	<operand>

First some convenience defines:
#define TBEGIN {
#define TEND JUMPNEXT; }
All threaded opcodes begin with TBEGIN instead of { & TEND instead of }.

Here's a simple threaded opcode that pushes an operand onto the vm's stack:

void	pushLit()
TBEGIN
	*++stackPointer = (OOP)*tcip++;
TEND


JUMPNEXT jumps to the next threaded opcode.  On mc68k its defined as
#define JUMPNEXT \
	do{asm("mov.l (%a5)+,%a0; jmp (%a0)");return;}while(0)

and on sparc as
#define JUMPNEXT \
	do{asm("ld [%g7],%o0; jmpl %o0,%g0; add %g7,4,%g7");return;}while(0)

JUMPNEXT is analogous to (*tcip++)(), but jumps instead of calls.
On a hypothetical pure C machine it could be:
#define JUMPNEXT return

So on the sparc pushLit is actually
void	pushLit()
{
	*++stackPointer = (OOP)*tcip++;
	do{
		asm("ld [%g7],%o0; jmpl %o0,%g0; add %g7,4,%g7");
		return;
	}while(0);
}

Which compiles to:
.global _pushLit
	.proc 1
_pushLit:
	!#PROLOGUE# 0
	save %sp,-80,%sp
	!#PROLOGUE# 1
	add %g5,4,%g5
	ld [%g7],%o0
	st %o0,[%g5]
	add %g7,4,%g7
	ld [%g7],%o0; jmpl %o0,%g0; add %g7,4,%g7
	ret
	restore

Since each threaded opcode is jumping to the next we don't want the prolog or
the epilog.  I apply the following sed-script to the assembler to strip them:
/^_.*:$/{n
N
N
s/	!#PROLOGUE# 0\n	save %sp,[-0-9]*,%sp\n	!#PROLOGUE# 1//
}
/	ret/d
/	restore/d


Which produces
.global _pushLit
	.proc 1
_pushLit:
	add %g5,4,%g5
	ld [%g7],%o0
	st %o0,[%g5]
	add %g7,4,%g7
	ld [%g7],%o0; jmpl %o0,%g0; add %g7,4,%g7

On the mc68k the sed-script is a little more involved but is still only 22
lines. (Its complicated because of the compiler optimizing various register
save/restore code. e.g. pushing a single register is quicker than using a
move multiple with a single bit set in the register move mask.)


All threaded opcodes run in the same stack frame.  The system is kicked off
from a C routine that calls alloca to allocate a large enough stack frame
for all threaded opcodes, e.g.:

void	Interpret()
{
	tcip = init_tcode();
	(void)alloca(1024);
	(**tcip++)();
}


The resulting system is as efficient a threaded code interpreter as one
written entirely in assembler BUT
On the sparc
	The system is about 20,000 lines (including comments)
	All but 13 lines are ordinary C code.
	12 lines are gcc-style global register variable declarations and
	1 line defines JUMPNEXT (as above) with an asm statement.
	42 lines of sed-script in 3 files
		9 lines strip prolog/epilog from threaded opcodes
		5 lines do a peephole optimization
		28 lines restore a global register stomped on by .div & .rem

-- 
Eliot Miranda			email:	eliot@cs.qmw.ac.uk
Dept of Computer Science	Tel:	071 975 5220 (+44 71 975 5220)
Queen Mary Westfield College	ARPA:	eliot%cs.qmw.ac.uk@nsfnet-relay.ac.uk	
Mile End Road			UUCP:	eliot@qmw-cs.uucp
LONDON E1 4NS