Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!willett!ForthNet
From: ForthNet@willett.UUCP (ForthNet articles from GEnie)
Newsgroups: comp.lang.forth
Subject: Advanced Beginners
Message-ID: <514.UUL1.3#5129@willett.UUCP>
Date: 20 Feb 90 03:01:25 GMT
Organization: Latest link in the ForthNet chain.  (Pgh, PA)
Lines: 59

Category 2,  Topic 8
Message 34        Sun Feb 18, 1990
F.SERGEANT [Frank]           at 20:39 CST
 
 To: David Albert  Re: inner interpreter (next) vs CALL/RET on 8088/8086

 The reasons some people use LODS, AX JMP, rather than CALL/RET are 
   1. the former probably runs faster
   2. the former probably takes less space.

 I say 'probably' in #1 because I've given up thinking I know how long  an
instruction takes due to variables such as the mix of instructions  affecting
whether the queue stays filled and the actual processor (eg  V20 vs 8088), and
in #1 & #2 because of a number of other variabls.  My  Intel book for the 8088
says 

      LODSW   1 byte  16 cycles       memptr16 CALL  3 bytes  29 cycles
      AX JMP  2 bytes 11 cycles                 RET  1 byte   20 cycles

 so, at first glance it seems that letting CS:IP thread thru the  address list
takes 49 cycles to DS:SI's 27 cycles.  There are a number  of complicating
factors if you use CALL/RET.  

 1st, addressing the data stack is more difficult as you need to do at  least
one pair of SP BP XCHG instructions in order to be able to use  the PUSH & POP
instructions to address the data stack.  

 2nd, addressing the return stack becomes easier since you no longer  need to
do the SP BP XCHG for it. 

 3rd, you can eliminate docol (nest) saving both time & space.  

 4th, you have a smaller, faster exit (unnest), using a 1 byte RET  rather
than a 2 byte address for a jump to a central routine.

 5th, each entry in the address list takes 3 bytes rather than 2 bytes.

 6th, colon definitions can (often) end w/ a 3 byte JMP to the final  word in
the list rather than using a 3 byte CALL followed by a 1 byte  RET.

 Where is our breakeven point?  I certainly don't know.  I would  welcome a
detailed analysis from someone(s).  Let's look just at space  in a colon
definition:  Using CALL/RET we save a 3 byte jump to nest and one byte for
unnest.  Each word in the colon definition would take  3 bytes instead of two.
The space breakeven point seems to be 4 words.   Unfortunately perhaps, the
code I write averages more than 4 words per  colon definition, so space-wise I
seem to be better off using direct  threading rather than CALL/RET.

 And, there are even more factors to consider such as in-line code and 
optimizations vs the ease of decompiling (SEE).

 I hope to do some experimenting when I get the time.

 Obviously the best solution is to cease using Intel processors.

  -- Frank
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'