Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!spool.mu.edu!agate!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.arch Subject: VAX EDIV remainder (was new instructions) Message-ID: <13587@dog.ee.lbl.gov> Date: 26 May 91 20:40:01 GMT Article-I.D.: dog.13587 References: <9105200213.AA05095@ucbvax.Berkeley.EDU> <1991May21.191034.25980@murdoch.acc.Virginia.EDU> <25874@as0c.sei.cmu.edu> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 71 X-Local-Date: Sun, 26 May 91 13:40:01 PDT In article <25874@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: [to do `z = x % y; /* z gets the remainder of x divided by y */' properly you need, assuming x in r6, y in r7, and z in r11:] > MOVL R6,R1 ; construct the sign-extended 64-bit ... > ASHQ #-32,R0,R0 ; dividend in the register pair > EDIV r7,r0,r2,r11 [which, as Clark Coleman pointed out in a followup article that I seem to have lost---the original article was <1991May21.191034.25980@murdoch.acc.Virginia.EDU> by clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman)---should actually be movl r6, r1 ashl #-32, r1, r0 ediv r7, r0, r2, r11 or perhaps, but this is slower, `movl r6, r0; ashq #-32, r0, r0'] >You might like to time THAT sequence, and rethink your post. Or you >could take my word for it, that when you include the cost of having >to reserve and target into an even-odd register pair, the EDIV is >almost always slower. (Not even-odd, just register pair.) According to my `VAX instruction timings (with FPA)', the original sequence divl3 r7, r6, r0 # r0 = x / y ... mull2 r7, r0 # ... * y subl3 r0, r6, r11 # z = x - r0 will take: VAX-11/780 vs. VAX-11/750 vs. VAX-11/730 WITH FPA INSTRUCTION 780 750 730 750 730 DIVL3 Reg, Reg, Reg 9.64 8.88 16.15 1.086 0.597 MULL2 Reg, Reg 1.85 5.68 12.05 0.326 0.154 ADDL3 Reg, Reg, Reg 0.60 1.29 2.83 0.465 0.212 ----- ----- ----- 12.09 15.85 31.03 while the EDIV sequence will take: MOVL Reg, Reg 0.40 0.93 1.69 0.430 0.237 ASHL #10, Reg, Reg 2.00 4.03 11.33 0.496 0.177 EDIV Reg, Reg, Reg, Reg 11.86 11.86 100.29 1.000 0.118 ----- ----- ------ 14.26 16.82 113.31 (I have assumed a barrel shifter here.) In other words, on the 750 it is almost a wash (about 5% faster to avoid ediv; this could easily be lost in testing---it is hard to get accurate timings as things depend on, e.g., alignment), while on the 780 avoiding ediv is about 15% faster and on the 730, over 70% faster. I do not have tables for anything but obsolete VAXen, and the ones I have came with this disclaimer: `The following VAX instruction timings were obtained from a former DEC employee. I cannot vouch for their accuracy and have no idea how they were obtained.' You should use your own judgement before charging off with this as `the answer'. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov