Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!rpi!uwm.edu!rutgers!rochester!pt.cs.cmu.edu!gandalf.cs.cmu.edu!lindsay From: lindsay@gandalf.cs.cmu.edu (Donald Lindsay) Newsgroups: comp.arch Subject: Re: Register Count Message-ID: <11538@pt.cs.cmu.edu> Date: 9 Jan 91 05:45:14 GMT References: Organization: Carnegie Mellon Robotics Institute Lines: 35 In article pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >Statistics exist that show that almost any >expression and a majority of code sequences can, on a single threaded >CPU (one with one ALU), be compiled without spills with FOUR spare >registers[1]. To add abundant inter-expression caching we need another >four. This directly contradicts recent experience with optimizing compilers. You appear to have counted floating point values (and assumed no unrolling or software pipelining) and taken that as the contents of the register bank. Aside from the generated-in address computations, there are uses such as scope uplinks, subroutine parameter lists, loop control, and the like. And those are just the conventional uses, not the agressive ones. >My favourite solution would be to have multiple stacks. How many? FOUR >is the answer, because it is exceedingly rare that an expression >contains as many independent threads of computation. A superscalar >machine may attach independent ALUs to each stack, or even specialize >them[4]. >[4] Indeed in practice superscalar implementations find it exceedingly >hard to find degrees of microparallelism greater than two or three, That's two or three things _per-clock_, on a _pipelined_ machine. You are planning to operate more than one stack per clock? In a pipelined fashion, so that each stack can be re-referenced while an outstanding computation hasn't yet written back its result? How are the multiple consequences resolved? -- Don D.C.Lindsay .. temporarily at Carnegie Mellon Robotics