Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!labrea!decwrl!pyramid!voder!apple!baum From: baum@apple.UUCP Newsgroups: comp.arch Subject: Re: What should be in hardware but isn't Message-ID: <6336@apple.UUCP> Date: Thu, 24-Sep-87 12:48:14 EDT Article-I.D.: apple.6336 Posted: Thu Sep 24 12:48:14 1987 Date-Received: Sat, 26-Sep-87 13:43:23 EDT References: <581@l.cc.purdue.edu> <18336@amdcad.AMD.COM> <582@l.cc.purdue.edu> Reply-To: baum@apple.UUCP (Allen Baum) Organization: Apple Computer, Inc. Lines: 63 -------- [] >In article <582@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > >Olson greatly underestimates the number of RISC instructions needed to do >even a fair job. I don't think that Olson is underestimating anything. Most RISC architectures have a divide step instruction, which is precisely what underlying microcode would use. Furthermore, in order to get signed/unsigned variations, microcode has to do the same kinds of conditional operations that a RISC would have to do. It is is mistake to assume that a RISC would be slower to do these than a microcoed engine; some RISC machines (Acorn ARM, HP Spectrum) have support for conditional operations. Furthermore, any hardware support in excess of this will inevitably slow the basic cycle down (I've been through the exercise). > I have not seen any remotely efficient bit-handling hardware on any machine. Check out the HP Spectrum. > To do unsigned >multiplication with only signed multiplication available requires that >2 conditional additions must be done after the multiplication; as machines >get faster conditional operations are bad except in nanocode. Unsigned >division is so complicated that one introduces other inefficiencies instead. Again, you make the mistake of believing that for some reason nanocode is somehow magically faster or more efficient than a well designed instruction set. Wrong. Microcode, or nanocode, has to go through all the same operations that assembly level code does. While special purpose data paths can be included to make the sign correction run faster, it is just that: special purpose. It can't be used for anything else, it may have the effect of making everything else run slower, and making division run a cycle or two faster will have no noticable effect on performance. Its VERY difficult to make fixed point division run faster than a bit per cycle, without a LOT of hardware. By leaving out the special purpose speedup stuff, you can afford to include some VERY useful general purpose speedup stuff: More registers, perhaps, or branch folding logic ala the ATT CRISP. > >BTW, there is an address modification procedure which is missing on all >machines I have seen except the UNIVAC's. That is to consider the register >file as a memory block and allow indexing on it. Another missing procedure >is to enable the register file to be treated as a block of memory so that >bytes or short words can be addressed. These two operations can be combined >on a byte-addressable machine. The original PDP-10 from DEC allowed that, originally because registers were real expensive, so that hardware registers were an expensive (but effective) speedup option; otherwise, they went to real memory. Registers were the first 16 locations in memory. This came back to bite them in the later KL models, because instructions could put into the registers and executed from them. While this was a real speedup hack on the older models, it slowed down the newer ones. The ATT CRISP doesn't have any registers. But, by caching the top of the local frame, references to locals are effectively turned into register references, and you get register windows as well. You can index into these 'registers', byte access them, and reference them with short 5-bit fields in the instruction. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385