Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!spool.mu.edu!uunet!zephyr.ens.tek.com!tektronix!percy!littlei!intelhf!ichips!ichips!colwell From: colwell@pdx023.ichips (Robert Colwell) Newsgroups: comp.arch Subject: Re: IEEE arithmetic (Goldberg paper) Message-ID: Date: 3 Jun 91 12:07:34 GMT Article-I.D.: pdx023.COLWELL.91Jun3130734 References: <9106010224.AA28532@ucbvax.Berkeley.EDU> Sender: news@ichips.intel.com (News Account) Organization: Intel Corp., Hillsboro, Oregon Lines: 42 In-Reply-To: jbs@WATSON.IBM.COM's message of 1 Jun 91 00:47:30 GMT In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: Path: ichips!iWarp.intel.com!ogicse!dali.cs.montana.edu!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!ucbvax!WATSON.IBM.COM!jbs Henry Spencer says: My understanding is that the funnier-looking features, in particular the infinities, NaNs, and signed zeros, mostly cost essentially nothing. I believe this statement is incorrect. Infs and nans compli- cate the design of the floating point unit. Even if there is ultimately no performance impact, this added design effort represents a very real cost. Perhaps some hardware designers could comment. Ok, I designed the square root/divide hardware in the first generation of Multiflow's machines. I'm mostly on Henry's side -- you do need to detect inputs that can cause wrong answers (square root of negative, 0/0, etc.), and it does take extra hardware and design time, but it isn't that hard, and it doesn't affect the cycle time. Of the "nanocode" that controlled this floating point unit, something like half dealt with getting single/double divide/sqrt results, and the other half looked for exceptions, rounding, and the like. Getting the exceptions right was harder than getting the basic functionality right, and was tougher to validate. We found four bugs in this functional unit over the two years it was in use, and three were in exceptions, not basic functionality. But it still isn't all that hard: I spent three months coming up with the basic design from a standing start. And since there was a standard, I could run industry standard validation suites against it to build my own confidence in it. To me, that alone proved the worth of the standard. Getting the denorms right is not easy, and it does affect performance. Especially in a statically scheduled machine, where the compiler was expecting the result in cycle N, but your floating point unit just figured out that it generated an underflow in cycle N-1 and must now do some additional processing. This means that the floating unit must be able to stall the machine, something it previously didn't need, and stall is inevitably on the critical timing path. Bob Colwell colwell@ichips.intel.com 503-696-4550 Intel Corp. JF1-19 5200 NE Elam Young Parkway Hillsboro, Oregon 97124