Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!haven!adm!lhc!usenet From: usenet@nlm.nih.gov (usenet news poster) Newsgroups: comp.arch Subject: Re: Why FP at all? (was: Re: Killer Micro II) Message-ID: <1990Sep8.221853.12579@nlm.nih.gov> Date: 8 Sep 90 22:18:53 GMT References: <14900015@hpdmd48.boi.hp.com> Reply-To: states@tech.NLM.NIH.GOV (David States) Organization: National Library of Medicine, Bethesda, Md. Lines: 46 In article <14900015@hpdmd48.boi.hp.com> sritacco@hpdmd48.boi.hp.com (Steve Ritacco) writes (and quotes): >> putting integrated floating point into a silly little workstation like >> a Sparc or an 80486 machine is serious overkill ... > >> Is it so absurd to suggest, in sum, that exposing separate mantissa >> and exponent to the optimiser might result in *speedup* due to >> constant propagation and expression-rearrangement The chained multiply and add FP hardware in processors like the IBM 6000 effectively do this. The marginal gain of putting resolution of the exponent off by more than every other operation is going to be small. >> while at the same >> time increasing expressivity by allowing an INDEPENDENT choice of >> mantissa and exponent sizes? > >Very true, who need IEEE format anyway. The market is in simulation and modeling. Everything from stockbrokers running econometric models to chemists looking at molecules. IEEE format has proven to be a reasonable balance which allows you to write general purpose tools that function over a wide range of input values. Between 32 bit integer and 32/64 bit FP and an occaissional algorithmic tweak, the vast majority of data can be be reasonably well represented. Custom fixed point formats have a place in DSP where performance is critical and you have the advantage of knowing exactly where the input data is coming from and what values will be acceptable. >Give me a processor capable of >doing a few arithmetic instructions in a single cycle, with a single >cycle multiply, and I think you've got it. Alot of the "superscalar" marketing hype is really just FP coprocessors. Take away the load/store operations in a superscalar RISC (those used to be part of the CISC instruction anyway) and the FPU, and what have you got left? ~one op/cycle. >Lets use all the FPU silicon >to do more needed operations and good floating point could fall out anyway. Matching similar levels of integration for the IPU and FPU, I have yet to see software emulation of FP that comes anywhere close to the speed of a hardware FPU. David States