Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!oliveb!sun!dgh!dgh From: dgh%dgh@Sun.COM (David Hough) Newsgroups: comp.arch Subject: Re: Bandwidth and RISC vs. CISC Summary: ieee floating point is not CISC or RISC Message-ID: <100524@sun.Eng.Sun.COM> Date: 22 Apr 89 02:27:03 GMT References: <38853@bbn.COM> <423@bnr-fos.UUCP> <17417@cup.portal.com> <38971@bbn.COM> Sender: news@sun.Eng.Sun.COM Lines: 97 In article <38971@bbn.COM>, slackey@bbn.com (Stan Lackey) writes: > > I have a real problem with anything that includes IEEE floating point > AND calls itself a RISC. IEEE FP violates every rule of RISC; it has > features that compilers will never use (rounding modes), features that > are rarely needed that slow things down (denormalized operands), and > features that make things complex that nobody needs (round-to-even). > I'd really like to see someone stand up and say, "Boy, the IEEE > round-to-even is much more accurate than DEC's round .5 up. I have an > application right here that proves it." Or, "Gradual underflow is > much better. I have an application that can be run in single precision > that would need to be run double precision without it." This is certainly the position that DEC took through the IEEE 754 and 854 meetings. For better or worse, however, all RISC chips that I'm aware of that have hardware floating point support implement IEEE arithmetic more of less fully. The anomaly here, of course, is that common scientific applications that, by dint of great effort, have been debugged to the point of running efficiently unchanged on IBM 370, VAX, and Cray, run about as well but not much better on IEEE systems since they don't exploit any specific feature of any particular arithmetic system. Sometimes they run slower if they underflow a lot in situations that don't matter, AND the hardware doesn't support subnormal operands and results efficiently. This is properly viewed as a shortcoming of the hardware/software system that purportedly implements IEEE arithmetic: even on synchronous systems you have to be able to hold the FPU for cache misses and page faults, so similarly you should be able to hold the CPU for exception misses in the FPU that take a little longer to compute. On asynchronous CISC systems like 68881 or 80387 this isn't a problem, but they are slower in the non-exceptional case, which is why RISC systems are mostly synchronous. Conversely, however, programs that take advantage of IEEE arithmetic, usually unknowingly, don't work nearly as well on 370, VAX, or Cray, where simple assumptions like if (x != y) /* then it's safe to divide by (x-y) */ no longer hold. > an application that can be run in single precision > that would need to be run double precision without [gradual underflow]. There will never be such an example that satisfies everyone since you never "need" any particular precision. After all, any integer or floating-point computation is fabricated out of one-bit integer operations. It's just a matter of dividing up the cleverness between the hardware and the software. What you CAN readily demonstrate are programs (written entirely in one precision) that are no worse affected by underflow than by normal roundoff, PROVIDED that underflow be gradual. Demmel and Linnainmaa contributed many pages of such analyses to the IEEE deliberations and to subsequent proceedings of the Symposia on Computer Arithmetic published by IEEE-CS. Of course if you are sufficiently clever you can use higher precision explicitly if provided by the compiler or implicitly otherwise to produce robust code in the face of abrupt underflow or even Cray arithmetic. Many mathematical software experts are good at this but most regard this as a evil only made necessary by hardware, system, and language designs that through ignorance or carelessness become part of the problem rather than part of the solution. Not all code is compiled. For instance, there is a great body of theory and practice in obtaining computational error bounds in computations based on interval arithmetic. Interval arithmetic is efficient to implement with the directed rounding modes required by IEEE arithmetic, but you can't write the implementation in standard C or Fortran. In integer arithmetic, the double-precise product of two single-precise operands, and the single-precise quotient and remainder of a double-precise dividend and single-precise divisor, are important in a number of applications such as base conversion and random number generation, but there is no way to express the required computations in standard higher-level languages. As to rounding halfway cases to even, the advantage over biased rounding is perhaps simplest understood by the observation that 1+(eps/2) rounds to 1 rather than 1+eps. The "even" result is more likely to be the one you wanted if you had a preference. Such rounding is no more expensive than biased rounding on a system that is required to provide directed rounding modes as well. It's not the bottleneck on any hardware IEEE implementation of which I'm aware. I have heard that adder carry propagate time and multiplier array size are the key constraints with a floating-point chip; hardware experts will correct me if I'm wrong. Memory bandwidth tends to be the key constraint on overall system performance unless floating-point division and sqrt dominate. The last describes a minority of programs but they are quite important in some influential circles. David Hough dhough@sun.com na.hough@na-net.stanford.edu {ucbvax,decvax,decwrl,seismo}!sun!dhough