Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!zaphod.mps.ohio-state.edu!wuarchive!udel!haven.umd.edu!uvaarpa!murdoch!hemlock!clc5q From: clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman) Newsgroups: comp.arch Subject: Re: new instructions Message-ID: <1991May23.210519.23443@murdoch.acc.Virginia.EDU> Date: 23 May 91 21:05:19 GMT References: <9105200213.AA05095@ucbvax.Berkeley.EDU> <1991May21.191034.25980@murdoch.acc.Virginia.EDU> <25874@as0c.sei.cmu.edu> Sender: usenet@murdoch.acc.Virginia.EDU Organization: University of Virginia Computer Science Department Lines: 92 In article <25874@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >In article <1991May21.191034.25980@murdoch.acc.Virginia.EDU> clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman) writes: > >>Given the C source code statement: > >> z = x % y; /* z gets the remainder of x divided by y */ > >>... we generate >>the 3-instruction sequence: >> >> movl r6,r1 /* Transfer quotient to r1 */ >> clrl r0 /* Zero out upper word to form 64-bit r0/r1 >> register pair quotient */ >> ediv r7,r0,r2,r11 /* Divide r0-r1 pair by r7; throw away quotient >> into r2 and keep remainder in r11 */ > >I hope not. From the previous code fragment, it is clear you are >expecting the remainder from SIGNED division. If you want the same >answer as before, the code must be > > MOVL R6,R1 ; construct the sign-extended 64-bit ... > ASHQ #-32,R0,R0 ; dividend in the register pair > EDIV ... as before > Thanks for pointing out my error. I looked into the VAX Architecture Handbook and it seems that you are trying to get "ASHL #-32,R1,R0" in your second statement. "ASHQ #-32,R0,R0" takes a heck of a long time and gives the wrong answer. "ASHL #-32,R1,R0" takes the sign bit of R1 and fills R0 with it. Unfortunately, this seems to be the best way to sign extend on the VAX. (The coercion instructions don't include CVTLQ == coerce longword to quadword, so the apparently slower pseudo-shift is the best we can do.) >You might like to time THAT sequence, and rethink your post. Or you >could take my word for it, that when you include the cost of having >to reserve and target into an even-odd register pair, the EDIV is >almost always slower. Well, I timed the new sequence, and a little over half of my speedup disappeared, but it is still faster by more than 10% compared to what "cc -O" does. As for register allocation issues, that is a complex subject on the VAX. Registers R6 through R11 are "allocable" general-purpose registers. When translating most source code statements, you can consider R0 through R5 available. (BTW, Robert, this little tutorial is not directed at you, but at those new to the VAX register set.) As long as you aren't doing weird assembly language instructions like CRC or POLY or the string instructions, R2 through R5 are not going to be trampled by anything. R0 and R1 are used for the return value of a function, so usage of them has to be temporary and not live across a function call. Really good register allocation will use R0 through R11 as much as is legal. Simpler and poorer register allocation will only use R6 through R11 except when a single intermediate-code statement is translated into multiple assembly language statements, and those statements need scratch registers that will be dead upon conclusion of the single intermediate code operation. Thus, my code above used R0 through R2 temporarily. The point here is that "cc -O" is doing the same thing. It generated a sequence of 3 instructions for the remainder operation, and used R0 as a scratch register. Thus, for simple and stupid register allocators, R0 and R1 are always available as a nice even-odd register pair for scratch usage. (Although the VAX does not care about even-odd pairs, so I am not sure why you mentioned them. A contiguous pair is all that is needed.) A smarter allocator might want to avoid using "ediv" for the remainder operation because of the need to reserve a pair of registers. (A REALLY smart code generator might look to see if a pair is available for scratch use, and generate the "ediv" code if it were, and the "cc -O" sequence otherwise. And the first instruction of my sequence is unnecessary if the next lower numbered register is unused at the moment; the ASHL and EDIV can be done in place.) The point still remains: "cc -O" produces less than optimal code that biases instruction count analysis of the architecture. I am still wondering how system architects handle this bias when determining the future path of the architecture. And how it affects the famous pronouncements about how CISCs all have umpteen never-used instructions and umpteen more rarely-used instructions. (BTW, I will check the timings again on the VAX 11/750. The speedup I confirmed was on the VAX 8600. If it is different on the VAX 11/750, that just points out that a code generator can get outdated and bias instruction counts. So the point remains the same.) ----------------------------------------------------------------------------- "The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence." E.W.Dijkstra, 18th June 1975. ||| clc5q@virginia.edu (Clark L. Coleman)