Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!pt.cs.cmu.edu!dsl.pitt.edu!pitt!willett!ForthNet From: ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) Newsgroups: comp.lang.forth Subject: Forth Implementation Message-ID: <1899.UUL1.3#5129@willett.pgh.pa.us> Date: 22 Oct 90 00:58:05 GMT Organization: String, Scotch tape, and Paperclips. (in Pgh, PA) Lines: 119 Category 3, Topic 24 Message 91 Sun Oct 21, 1990 F.SERGEANT [Frank] at 14:20 CDT Re: register moves vs pushes & pops on 8088 etc. I couldn't believe Jonah Thomas's results that found the push instructions to be faster than the register move instructions. There is such a huge difference in the listed number of "clocks" that I bend over backwards to use the moves instead of the push/pops. I decided I'd better test it out myself! I tested it on an 8 MHz XT clone with a NEC V20 in place of the 8088 microprocessor. The NEC book says 2 clocks for the move instruction and 12 clocks for the push or pop instructions. So, the push & pop test ought to take approximately 6 times as long, right? Wrong. I've known for a long time that pipelining of instructions makes the timings from the book, well, maybe not completely meaningless, but far from certain. In the following tests, I laid down a whole lot of the instructions to be tested in-line in a single code word (one word for the moves and another word for the push/pops). This makes the times for next & FOR NEXT pale into insignificance. I used the macros MOVES, and PUSHES, as they are a hell of a lot easier that coding AX BX MOV, a thousand times by hand! It is unnecessary, I think, but I also disabled interrupts during the test, to eliminate the overhead of the timer ticks. I needn't have bothered. Unless some other code re-enabled the interrupts without me realizing it. There is also the dynamic RAM refresh overhead that I've ignored. The "scaffolding" overhead is under 1/2 of a percent. Here's the code I used (for Pygmy Forth). I'd be interested in the timing results on other processors. CODE INTS-OFF ( -) CLI, NXT, END-CODE ( disable interrupts) CODE INTS-ON ( -) STI, NXT, END-CODE ( re-enable interrupts) : MOVES, ( # -) 2/ FOR BX AX MOV, AX BX MOV, NEXT ; ( a macro to lay down lots of reg to reg move instructions) : PUSHES, ( # -) 2/ FOR AX POP, AX PUSH, NEXT ; ( a macro to lay down lots of push & pop instructions) : TESTX2 25000 FOR TESTX1 NEXT ; ( about 1 second ?) ( empty loop as a "control" ) The results: 250,000,000 register to register move instructions take about 278 seconds and 250,000,000 register push and pop instructions take about 449 seconds, giving an actual ratio of 1.615:1. So, thank god, the move instructions are faster! But, not by the expected 6 to 1. The NEC book also says to allow 4 clocks per byte when the needed byte is not already in the prefetch queue. I feel like a chemist figuring out the empirical atomic ratios of molecules. Each move instruction takes 2 bytes. Each push or pop takes 1 byte. If we assume the needed bytes are NEVER present in the prefetch queue, we would add 2*4=8 clocks to the move, giving it a total of 10 clocks, and we'd add 1*4 clocks to the push/pop, giving it a total of 16 clocks. There's our 16 to 10 or 1.6 to 1 ratio. I just didn't realize it was this bad. Different mixes of instructions would affect this quite a bit, apparently. At 12 clocks the 1 byte push/pop is "too fast." The prefetch queue can't keep up. What instructions do you have to run to keep the queue filled up? Maybe divide instructions. If we'd do a dummy divide instruction every other instruction perhaps we wouldn't have prefetch faults! Of course, I've done this test and analysis hurriedly and I may have screwed up, in which case I'll be embarassed over my sarcasm. You'll let me know, I presume, if I'm in error? So, a "2 cycle" instruction really takes 10 cycles. It reminds me of two things: (1) the "$4 pizza commercials where the cheapest $4 pizza really costs 16.95 (2) rating automobiles at so many EPA miles per gallon (the explanation here is that an EPA "mile" is only 3,280 feet long). -- Frank ----- This message came from GEnie via willett through a semi-automated process. Report problems to: dwp@willett.pgh.pa.us or uunet!willett!dwp