Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!willett!ForthNet From: ForthNet@willett.UUCP (ForthNet articles from GEnie) Newsgroups: comp.lang.forth Subject: Advanced Beginners Message-ID: <514.UUL1.3#5129@willett.UUCP> Date: 20 Feb 90 03:01:25 GMT Organization: Latest link in the ForthNet chain. (Pgh, PA) Lines: 59 Category 2, Topic 8 Message 34 Sun Feb 18, 1990 F.SERGEANT [Frank] at 20:39 CST To: David Albert Re: inner interpreter (next) vs CALL/RET on 8088/8086 The reasons some people use LODS, AX JMP, rather than CALL/RET are 1. the former probably runs faster 2. the former probably takes less space. I say 'probably' in #1 because I've given up thinking I know how long an instruction takes due to variables such as the mix of instructions affecting whether the queue stays filled and the actual processor (eg V20 vs 8088), and in #1 & #2 because of a number of other variabls. My Intel book for the 8088 says LODSW 1 byte 16 cycles memptr16 CALL 3 bytes 29 cycles AX JMP 2 bytes 11 cycles RET 1 byte 20 cycles so, at first glance it seems that letting CS:IP thread thru the address list takes 49 cycles to DS:SI's 27 cycles. There are a number of complicating factors if you use CALL/RET. 1st, addressing the data stack is more difficult as you need to do at least one pair of SP BP XCHG instructions in order to be able to use the PUSH & POP instructions to address the data stack. 2nd, addressing the return stack becomes easier since you no longer need to do the SP BP XCHG for it. 3rd, you can eliminate docol (nest) saving both time & space. 4th, you have a smaller, faster exit (unnest), using a 1 byte RET rather than a 2 byte address for a jump to a central routine. 5th, each entry in the address list takes 3 bytes rather than 2 bytes. 6th, colon definitions can (often) end w/ a 3 byte JMP to the final word in the list rather than using a 3 byte CALL followed by a 1 byte RET. Where is our breakeven point? I certainly don't know. I would welcome a detailed analysis from someone(s). Let's look just at space in a colon definition: Using CALL/RET we save a 3 byte jump to nest and one byte for unnest. Each word in the colon definition would take 3 bytes instead of two. The space breakeven point seems to be 4 words. Unfortunately perhaps, the code I write averages more than 4 words per colon definition, so space-wise I seem to be better off using direct threading rather than CALL/RET. And, there are even more factors to consider such as in-line code and optimizations vs the ease of decompiling (SEE). I hope to do some experimenting when I get the time. Obviously the best solution is to cease using Intel processors. -- Frank ----- This message came from GEnie via willett through a semi-automated process. Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'