Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ncar!ames!oliveb!sun!dgh!dgh
From: dgh%dgh@Sun.COM (David Hough)
Newsgroups: comp.arch
Subject: Re: Bandwidth and RISC vs. CISC
Message-ID: <100891@sun.Eng.Sun.COM>
Date: 24 Apr 89 23:49:46 GMT
References: <38853@bbn.COM> <423@bnr-fos.UUCP> <17417@cup.portal.com> <39049@bbn.COM>
Sender: news@sun.Eng.Sun.COM
Lines: 81

In article <39049@bbn.COM>, slackey@bbn.com (Stan Lackey) writes:
> In article <100524@sun.Eng.Sun.COM> dgh%dgh@Sun.COM (David Hough) writes:
> >In article <38971@bbn.COM>, slackey@bbn.com (Stan Lackey) writes:

> >Such rounding is no more expensive than biased rounding
> >on a system that is required to provide directed rounding modes as well.  

> Having to detect EXACTLY .5 is a bottleneck in terms of transistor
> count, design time, and diagnostics.  The extra execution time may not
> affect overall cycle time, but the RISC guys say that any added
> hardware increases cycle time (they usually use it in the context of
> instruction decode).

EXACTLY .5 is no harder than correct directed rounding.  You have to
(in principle) develop all the digits, propagate carries, and remember
whether any shifted off were non-zero.  Division and sqrt are simplified
by the fact that EXACTLY .5 can't happen.

> Note: It's prealigning a denormalized operand before a multiplication
> that REALLY hurts.

This event is rare enough that it needn't be as fast as a normal
multiplication, so it's OK to slow down somewhat by holding the CPU,
but not so rare that you want to punt to software.  By throwing enough
hardware at the problem you can make it as fast as the normal case.
I don't advocate that but that's my understanding of what the Cydra-5 did.  

Interestingly enough, the early drafts of 754 specified that default 
handling of subnormal numbers be in a "warning mode" and that the more expensive
"normalizing mode" be an option.  This was with highly-pipelined 
implementations very much in mind.  However a gang of early implementers from
Apple managed to talk a majority of the committee into making the
normalizing mode the default.  The normalizing mode is easier to understand
and easier to implement in software.  Warning mode is a lot cheaper to
pipeline, however.  
I was part of the gang but I've since had opportunity to repent at leisure.

> Lots of valid uses of IEEE features listed.  I didn't mean that IEEE
> was bad or useless, it's just that it was architected when CISC was
> the trend, and it shows.  Especially after my own efforts in an IEEE
> implementation, I am glad to see from this posting and others that at
> least a few users can make use of the features.

Remember IEEE 754 and 854 are standards for a programming environment.
How much of that is to be provided by hardware and how much by software
is up to the implementer; in contrast RISC is a hardware design philosophy.  
The MC68881 is probably the best-known attempt
to put practically everything in the hardware so the software wouldn't
screw it up as usual.  The Weitek 1032/3 and their descendants and
competitors are examples of minimal hardware implementations that support
complete IEEE implementations once appropriate software is added.  
Evidently the first generations of such chips were too minimal; 
for instance nowadays
everybody has correctly-rounded division and sqrt in hardware, 
rather than software, on chips intended for general-purpose computation.

> I think the RISC
> implementers should have a RISC-style floating point standard, though.

There's a very minimalist floating-point standard, that of S. Cray,
which is very cheap to implement entirely in hardware (compared
to other standards at similar performance levels).  
The only hard part is writing the software that uses it.  So far
no other hardware manufacturers have seen fit to adopt Cray arithmetic.
IBM 370 architecture has been more widely imitated but not because of
any inherent wonderfulness for mathematical software.  DEC VAX
floating-point architecture
is well defined and a number of non-DEC implementations are available. 
But divide and sqrt are no easier than IEEE, and IEEE double precision
addition and multiplication are available now in one or two cycles
on some implementations. 
Does anybody still think there would be an advantage to VAX, 370, or Cray
floating-point architecture for a PC or workstation?


David Hough

dhough@sun.com   
na.hough@na-net.stanford.edu
{ucbvax,decvax,decwrl,seismo}!sun!dhough