Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!rice!titan!preston From: preston@titan.rice.edu (Preston Briggs) Newsgroups: comp.arch Subject: Re: Pipelined FP add Message-ID: <3740@brazos.Rice.edu> Date: 14 Dec 89 19:53:14 GMT References: <241@dg.dg.com> <33570@hal.mips.COM> Sender: root@rice.edu Reply-To: preston@titan.rice.edu (Preston Briggs) Distribution: na Organization: Rice University, Houston Lines: 47 In article <33570@hal.mips.COM> mark@mips.COM (Mark G. Johnson) writes: >In article <241@dg.dg.com> uunet!dg!chris (Chris Moriondo) writes: > > > >Anyone care to speculate as to how much pipelining add would win/lose > >in terms of useful overlap in FP codes versus increased latency? > > > >Several articles were posted to comp.arch this spring, talking about the >perceived benefits of a pipelined FP adder in the Motorola MC88K risc. Well, long pipelines will usually increase latency. Additionally, they'll increase FP performance, given some effort by the compiler. Deciding the tradeoff will just be a decision, probably based on the target applications. For a workstation that *I* would use (to compile and edit things like compilers and editors), FP performance isn't very important. But when people really want FP performance, they can usually manage to avoid task switching too often. What'll happen when people get more used to cheap FP? Probably more FP in everyday code, particularly graphics display. >I dimly recall that the last time around, they were beginning to feel >that the 88K's shared register file -- same regs for integer and FP >operands -- required large numbers of read and write ports to make >FP programs run quickly. The other design alternative, separate >integer regs from the FP regs, has lots of ports already since it's >2 copies of the hardware. The point about read/write ports is good. I used to believe in a single (large) set of multi-purpose registers controlled by the compiler. This scheme would seem to be less wasteful of resources (when you need more FP regs, spill more integers, and vice-versa, whichever works out cheapest for particular programs). Nowadays, I've come around to the view that extra resources for the moments of peak demand (triply-nested loops and such) is better. In particular, I prefer seperate FP and integer register sets. Locally, friends are working on loop transformations that will use *all* the available FP registers profitably. If we've got to balance the FP register pressure against the integer pressure, then we've added another hard issue to optimization. Of course, it's extra hardware and it'll cost money. Preston Briggs preston@titan.rice.edu