Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!haven!adm!lhc!usenet
From: usenet@nlm.nih.gov (usenet news poster)
Newsgroups: comp.arch
Subject: Re: Why FP at all? (was: Re: Killer Micro II)
Message-ID: <1990Sep8.221853.12579@nlm.nih.gov>
Date: 8 Sep 90 22:18:53 GMT
References: <STEPHEN.90Sep5000536@estragon.uchicago.edu> <14900015@hpdmd48.boi.hp.com>
Reply-To: states@tech.NLM.NIH.GOV (David States)
Organization: National Library of Medicine, Bethesda, Md.
Lines: 46

In article <14900015@hpdmd48.boi.hp.com> 
sritacco@hpdmd48.boi.hp.com (Steve Ritacco) writes (and quotes):
>> putting integrated floating point into a silly little workstation like
>> a Sparc or an 80486 machine is serious overkill ...
>
>> Is it so absurd to suggest, in sum, that exposing separate mantissa
>> and exponent to the optimiser might result in *speedup* due to
>> constant propagation and expression-rearrangement

The chained multiply and add FP hardware in processors like the IBM 6000
effectively do this.  The marginal gain of putting resolution of the
exponent off by more than every other operation is going to be small.

>> while at the same
>> time increasing expressivity by allowing an INDEPENDENT choice of
>> mantissa and exponent sizes?
>
>Very true, who need IEEE format anyway.  

The market is in simulation and modeling.  Everything from stockbrokers
running econometric models to chemists looking at molecules.  IEEE
format has proven to be a reasonable balance which allows you to write
general purpose tools that function over a wide range of input values.
Between 32 bit integer and 32/64 bit FP and an occaissional algorithmic
tweak, the vast majority of data can be be reasonably well represented.  
Custom fixed point formats have a place in DSP where performance is
critical and you have the advantage of knowing exactly where the input
data is coming from and what values will be acceptable.

>Give me a processor capable of
>doing a few arithmetic instructions in a single cycle, with a single
>cycle multiply, and I think you've got it.  

Alot of the "superscalar" marketing hype is really just FP coprocessors.
Take away the load/store operations in a superscalar RISC (those used to 
be part of the CISC instruction anyway) and the FPU, and what have you
got left?  ~one op/cycle.

>Lets use all the FPU silicon
>to do more needed operations and good floating point could fall out anyway.

Matching similar levels of integration for the IPU and FPU, I have yet
to see software emulation of FP that comes anywhere close to the speed
of a hardware FPU.

David States