Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!oliveb!sun!dgh!dgh
From: dgh%dgh@Sun.COM (David Hough)
Newsgroups: comp.arch
Subject: Re: Bandwidth and RISC vs. CISC
Summary: ieee floating point is not CISC or RISC
Message-ID: <100524@sun.Eng.Sun.COM>
Date: 22 Apr 89 02:27:03 GMT
References: <38853@bbn.COM> <423@bnr-fos.UUCP> <17417@cup.portal.com> <38971@bbn.COM>
Sender: news@sun.Eng.Sun.COM
Lines: 97

In article <38971@bbn.COM>, slackey@bbn.com (Stan Lackey) writes:
> 
> I have a real problem with anything that includes IEEE floating point
> AND calls itself a RISC.  IEEE FP violates every rule of RISC; it has
> features that compilers will never use (rounding modes), features that
> are rarely needed that slow things down (denormalized operands), and
> features that make things complex that nobody needs (round-to-even).
> I'd really like to see someone stand up and say, "Boy, the IEEE
> round-to-even is much more accurate than DEC's round .5 up.  I have an
> application right here that proves it."  Or, "Gradual underflow is
> much better.  I have an application that can be run in single precision
> that would need to be run double precision without it."

This is certainly the position that DEC took through the IEEE 754 and 854
meetings.  For better or worse, however, all RISC chips that I'm aware of
that have hardware floating point support implement IEEE arithmetic
more of less fully.

The anomaly here, of course, is that common scientific applications that,
by dint of great effort, have been debugged to the point of running
efficiently unchanged on IBM 370, VAX, and Cray, run about as well but
not much better on IEEE systems since they don't exploit any specific
feature of any particular arithmetic system.  Sometimes they run slower
if they underflow a lot in situations that don't matter, AND the
hardware doesn't support subnormal operands and results efficiently.
This is properly viewed as a shortcoming of the hardware/software
system that purportedly implements IEEE arithmetic: even on synchronous
systems you have to be able to hold the FPU for cache misses and page
faults, so similarly you should be able to hold the CPU for exception
misses in the FPU that take a little longer to compute.  On asynchronous
CISC systems like 68881 or 80387 this isn't a problem, but they are
slower in the non-exceptional case, which is why RISC systems are
mostly synchronous.

Conversely, however, programs that take advantage of IEEE arithmetic,
usually unknowingly, don't work nearly as well on 370, VAX, or Cray,
where simple assumptions like 

	if (x != y) /* then it's safe to divide by (x-y) */

no longer hold.

> an application that can be run in single precision
> that would need to be run double precision without [gradual underflow].

There will never be such an example that satisfies everyone since
you never "need" any particular precision.  After all, any integer
or floating-point computation is fabricated out of one-bit integer
operations.  It's just a matter of dividing up the cleverness between
the hardware and the software.  What you CAN readily demonstrate
are programs (written entirely in one precision)
that are no worse affected by underflow than by normal
roundoff, PROVIDED that underflow be gradual.  Demmel and Linnainmaa
contributed many pages of such analyses to the IEEE deliberations
and to subsequent proceedings of the Symposia on Computer
Arithmetic published by IEEE-CS.  Of course if you are sufficiently
clever you can use higher precision explicitly if provided by the
compiler or implicitly otherwise to produce robust code in the
face of abrupt underflow or even Cray arithmetic.  Many mathematical
software experts are good at this but most regard this as a 
evil only made necessary by hardware, system, and language designs
that through ignorance or carelessness become part of the problem
rather than part of the solution.

Not all code is compiled.  For instance, there is a great body of theory
and practice in obtaining computational error bounds in computations
based on interval arithmetic.  Interval arithmetic is
efficient to implement with the directed rounding modes required by IEEE
arithmetic, but you can't write the implementation in standard C or
Fortran.  In integer arithmetic, the double-precise product of 
two single-precise operands,
and the single-precise quotient and remainder of a double-precise
dividend and single-precise divisor, are important in a number of
applications such as base conversion and random number generation,
but there is no way to express the required computations in standard
higher-level languages.

As to rounding halfway cases to even, the advantage over biased
rounding is perhaps simplest understood by the observation
that 1+(eps/2) rounds to 1 rather than 1+eps.  The "even" result
is more likely to be the one you wanted if you had a preference.

Such rounding is no more expensive than biased rounding
on a system that is required to provide directed rounding modes as well.  
It's not the bottleneck on any hardware IEEE implementation of which I'm aware.
I have heard that adder carry propagate time and multiplier array size 
are the key constraints with a floating-point chip; hardware experts
will correct me if I'm wrong.  Memory bandwidth tends to be the key constraint
on overall system performance unless floating-point division and sqrt
dominate.  The last describes a minority of programs but they are quite
important in some influential circles.

David Hough

dhough@sun.com   
na.hough@na-net.stanford.edu
{ucbvax,decvax,decwrl,seismo}!sun!dhough