Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ncar!ames!oliveb!sun!dgh!dgh From: dgh%dgh@Sun.COM (David Hough) Newsgroups: comp.arch Subject: Re: Bandwidth and RISC vs. CISC Message-ID: <100891@sun.Eng.Sun.COM> Date: 24 Apr 89 23:49:46 GMT References: <38853@bbn.COM> <423@bnr-fos.UUCP> <17417@cup.portal.com> <39049@bbn.COM> Sender: news@sun.Eng.Sun.COM Lines: 81 In article <39049@bbn.COM>, slackey@bbn.com (Stan Lackey) writes: > In article <100524@sun.Eng.Sun.COM> dgh%dgh@Sun.COM (David Hough) writes: > >In article <38971@bbn.COM>, slackey@bbn.com (Stan Lackey) writes: > >Such rounding is no more expensive than biased rounding > >on a system that is required to provide directed rounding modes as well. > Having to detect EXACTLY .5 is a bottleneck in terms of transistor > count, design time, and diagnostics. The extra execution time may not > affect overall cycle time, but the RISC guys say that any added > hardware increases cycle time (they usually use it in the context of > instruction decode). EXACTLY .5 is no harder than correct directed rounding. You have to (in principle) develop all the digits, propagate carries, and remember whether any shifted off were non-zero. Division and sqrt are simplified by the fact that EXACTLY .5 can't happen. > Note: It's prealigning a denormalized operand before a multiplication > that REALLY hurts. This event is rare enough that it needn't be as fast as a normal multiplication, so it's OK to slow down somewhat by holding the CPU, but not so rare that you want to punt to software. By throwing enough hardware at the problem you can make it as fast as the normal case. I don't advocate that but that's my understanding of what the Cydra-5 did. Interestingly enough, the early drafts of 754 specified that default handling of subnormal numbers be in a "warning mode" and that the more expensive "normalizing mode" be an option. This was with highly-pipelined implementations very much in mind. However a gang of early implementers from Apple managed to talk a majority of the committee into making the normalizing mode the default. The normalizing mode is easier to understand and easier to implement in software. Warning mode is a lot cheaper to pipeline, however. I was part of the gang but I've since had opportunity to repent at leisure. > Lots of valid uses of IEEE features listed. I didn't mean that IEEE > was bad or useless, it's just that it was architected when CISC was > the trend, and it shows. Especially after my own efforts in an IEEE > implementation, I am glad to see from this posting and others that at > least a few users can make use of the features. Remember IEEE 754 and 854 are standards for a programming environment. How much of that is to be provided by hardware and how much by software is up to the implementer; in contrast RISC is a hardware design philosophy. The MC68881 is probably the best-known attempt to put practically everything in the hardware so the software wouldn't screw it up as usual. The Weitek 1032/3 and their descendants and competitors are examples of minimal hardware implementations that support complete IEEE implementations once appropriate software is added. Evidently the first generations of such chips were too minimal; for instance nowadays everybody has correctly-rounded division and sqrt in hardware, rather than software, on chips intended for general-purpose computation. > I think the RISC > implementers should have a RISC-style floating point standard, though. There's a very minimalist floating-point standard, that of S. Cray, which is very cheap to implement entirely in hardware (compared to other standards at similar performance levels). The only hard part is writing the software that uses it. So far no other hardware manufacturers have seen fit to adopt Cray arithmetic. IBM 370 architecture has been more widely imitated but not because of any inherent wonderfulness for mathematical software. DEC VAX floating-point architecture is well defined and a number of non-DEC implementations are available. But divide and sqrt are no easier than IEEE, and IEEE double precision addition and multiplication are available now in one or two cycles on some implementations. Does anybody still think there would be an advantage to VAX, 370, or Cray floating-point architecture for a PC or workstation? David Hough dhough@sun.com na.hough@na-net.stanford.edu {ucbvax,decvax,decwrl,seismo}!sun!dhough