Xref: utzoo comp.arch:17183 comp.lang.misc:5179 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!icdoc!qmw-cs!eliot From: eliot@cs.qmw.ac.uk (Paul Davison (postmaster)) Newsgroups: comp.arch,comp.lang.misc Subject: Re: It looks like he's at it again! Message-ID: <2518@sequent.cs.qmw.ac.uk> Date: 17 Jul 90 12:45:00 GMT References: <2328@l.cc.purdue.edu> <1990Jul10.072443.4844@cs.UAlberta.CA> <37569@ucbvax.BERKELEY.EDU> <2358@l.cc.purdue.edu> Reply-To: eliot@cs.qmw.ac.uk (Eliot Miranda) Organization: Computer Science Dept, QMW, University of London, UK. Lines: 156 I use threaded code in my dynamic translation Smalltalk virtual machine, but its written in C. I use a single asm statement to do the 'jump to next threaded opcode' & I use simple sed scripts on assembler code to turn C procedures into threaded opcodes. I've found this combination provides portability & speed. It took half a day to port the machine independent parts of the vm (i.e. not the graphics) from SUN 3 (mc68k) to SUN 4 (sparc). In straight C we could define a threaded code interpreter thus: void (**tcip)(); /* threaded code instruction pointer */ void inner_interpreter() { do (**tcip++)(); while (1); } where an opcode could be written void do_something() { .... return; } The resulting system would probably spend most of its time doing return/call pairs. With a little extra effort (& help from a good C compiler) we can eliminate the call/return pairs & build a conventional threaded code interpreter that jumps from routine to routine but is still written in C (well 99.9% anyway). Here's how it works: I use GCC so I can declare some oft used global variables in registers: On mc68k: register OOP *stackPointer asm("a3"); register TCODE *tcip asm("a5"); On sparc: register OOP *stackPointer asm("%g5"); register TCODE *tcip asm("%g7"); Threaded code is a sequence of 32 bit words organized as First some convenience defines: #define TBEGIN { #define TEND JUMPNEXT; } All threaded opcodes begin with TBEGIN instead of { & TEND instead of }. Here's a simple threaded opcode that pushes an operand onto the vm's stack: void pushLit() TBEGIN *++stackPointer = (OOP)*tcip++; TEND JUMPNEXT jumps to the next threaded opcode. On mc68k its defined as #define JUMPNEXT \ do{asm("mov.l (%a5)+,%a0; jmp (%a0)");return;}while(0) and on sparc as #define JUMPNEXT \ do{asm("ld [%g7],%o0; jmpl %o0,%g0; add %g7,4,%g7");return;}while(0) JUMPNEXT is analogous to (*tcip++)(), but jumps instead of calls. On a hypothetical pure C machine it could be: #define JUMPNEXT return So on the sparc pushLit is actually void pushLit() { *++stackPointer = (OOP)*tcip++; do{ asm("ld [%g7],%o0; jmpl %o0,%g0; add %g7,4,%g7"); return; }while(0); } Which compiles to: .global _pushLit .proc 1 _pushLit: !#PROLOGUE# 0 save %sp,-80,%sp !#PROLOGUE# 1 add %g5,4,%g5 ld [%g7],%o0 st %o0,[%g5] add %g7,4,%g7 ld [%g7],%o0; jmpl %o0,%g0; add %g7,4,%g7 ret restore Since each threaded opcode is jumping to the next we don't want the prolog or the epilog. I apply the following sed-script to the assembler to strip them: /^_.*:$/{n N N s/ !#PROLOGUE# 0\n save %sp,[-0-9]*,%sp\n !#PROLOGUE# 1// } / ret/d / restore/d Which produces .global _pushLit .proc 1 _pushLit: add %g5,4,%g5 ld [%g7],%o0 st %o0,[%g5] add %g7,4,%g7 ld [%g7],%o0; jmpl %o0,%g0; add %g7,4,%g7 On the mc68k the sed-script is a little more involved but is still only 22 lines. (Its complicated because of the compiler optimizing various register save/restore code. e.g. pushing a single register is quicker than using a move multiple with a single bit set in the register move mask.) All threaded opcodes run in the same stack frame. The system is kicked off from a C routine that calls alloca to allocate a large enough stack frame for all threaded opcodes, e.g.: void Interpret() { tcip = init_tcode(); (void)alloca(1024); (**tcip++)(); } The resulting system is as efficient a threaded code interpreter as one written entirely in assembler BUT On the sparc The system is about 20,000 lines (including comments) All but 13 lines are ordinary C code. 12 lines are gcc-style global register variable declarations and 1 line defines JUMPNEXT (as above) with an asm statement. 42 lines of sed-script in 3 files 9 lines strip prolog/epilog from threaded opcodes 5 lines do a peephole optimization 28 lines restore a global register stomped on by .div & .rem -- Eliot Miranda email: eliot@cs.qmw.ac.uk Dept of Computer Science Tel: 071 975 5220 (+44 71 975 5220) Queen Mary Westfield College ARPA: eliot%cs.qmw.ac.uk@nsfnet-relay.ac.uk Mile End Road UUCP: eliot@qmw-cs.uucp LONDON E1 4NS