Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!steinmetz!sunray!oconnor From: oconnor@sunray.steinmetz (Dennis Oconnor) Newsgroups: comp.arch Subject: Divides (Was RE: What should be in hardware but isn't) Message-ID: <7460@steinmetz.steinmetz.UUCP> Date: Fri, 25-Sep-87 09:54:13 EDT Article-I.D.: steinmet.7460 Posted: Fri Sep 25 09:54:13 1987 Date-Received: Sun, 27-Sep-87 02:33:43 EDT References: <581@l.cc.purdue.edu> <18336@amdcad.AMD.COM> <582@l.cc.purdue.edu> <6336@apple.UUCP> Sender: root@steinmetz.steinmetz.UUCP Reply-To: oconnor@sunray.UUCP (Dennis Oconnor) Organization: General Electric CRD, Schenectady, NY Lines: 80 ( All elipses ... are mine, and indicate excluded text. DMOC ) In article <6336@apple.UUCP> baum@apple.UUCP (Allen Baum) writes: > ... Most RISC architectures have a divide step instruction, which > is precisely what underlying microcode would use ... Our RISC architecture here at GE has no divide-step or multiply-step. We have a better way. More later. > ... any hardware support in excess of this will inevitably slow > the basic cycle down (I've been through the exercise). No, this is not true. Cycle time is generally dependant on some set of critical paths. Hardware that does not interact with these critical paths has no effect, unless it creates new critical paths. Were your critical paths lie depends heavily on implementation technology : could be the ALU, or the register file, or the instruction decode ... > ... Microcode, or nanocode, has to go through all the same > operations that assembly level code does. Except fetching instructions, and operations resulting from the need to handle interupts or exceptions at arbitrary points in the assembly code (microcode can lock excepts out till it completes) >... Its VERY difficult to make fixed point division run faster than > a bit per cycle, without a LOT of hardware. By leaving out the > special purpose speedup stuff, you can afford to include some VERY > useful general purpose speedup stuff: More registers ... branch folding ... This is not really true. If you have a fast multiplier ( which is a good idea for many applications ) you can do division very much quicker than one cycle per bit, relatively easily, especially for long word lengths. In fact, you can do division in something like C + (multiply_lateny * (Int_Round_up( log_base2( word_length )) - K) where C and K are positive integer small constants dependant on how you implement your algorithm. The technique to use is Newton-Raphson iteration with a first-guess look-up table. "The official divide algorithm of the IBM-360/95 and Cray-1 (I think :-)" The additional hardware needed (besides a fast multiplier) is TWIT. >> [quote from someone else about allowing registers >> to be accessed as memory locations] >The original PDP-10 from DEC allowed ... Registers were the first >16 locations in memory ... instructions could put into the registers >and executed from them ... Off course, the PDP10 didn't have to reorganize code, so it did not have to deal with memory-aliasing problems. >The ATT CRISP doesn't have any registers. But, by caching the top of >the local frame, references to locals are effectively turned into >register references, and you get register windows as well. You can >index into these 'registers', byte access them, and reference them >with short 5-bit fields in the instruction. One of the NICE things about registers that are NOT accessable as memory is that you can uniquely identify references to a register based strictly on the bits in the instruction stream. This is crucial to reorganization : you must know when registers are modified as a limit on how uses of that register can be moved. Memory-aliasing can be a difficult task, especially if post- reorganization linking is supported. How does the CRISP reorganizer address this issue ? Simple reorganizers ( a contradiction ) deal with memory aliasing by forcing serialization of loads with respect to stores. If your registers are accessable as memory and you use this scheme in your reorganization, you wind up serializing every instruction with respect to stores. What's the cost of this ? >{decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385 -- Dennis O'Connor oconnor@sungoddess.steinmetz.UUCP ?? ARPA: OCONNORDM@ge-crd.arpa "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"