Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!rpi!uwm.edu!rutgers!rochester!pt.cs.cmu.edu!gandalf.cs.cmu.edu!lindsay
From: lindsay@gandalf.cs.cmu.edu (Donald Lindsay)
Newsgroups: comp.arch
Subject: Re: Register Count
Message-ID: <11538@pt.cs.cmu.edu>
Date: 9 Jan 91 05:45:14 GMT
References: <PCG.91Jan8175401@odin.cs.aber.ac.uk>
Organization: Carnegie Mellon Robotics Institute
Lines: 35

In article <PCG.91Jan8175401@odin.cs.aber.ac.uk> 
	pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>Statistics exist that show that almost any
>expression and a majority of code sequences can, on a single threaded
>CPU (one with one ALU), be compiled without spills with FOUR spare
>registers[1].  To add abundant inter-expression caching we need another
>four.

This directly contradicts recent experience with optimizing
compilers. You appear to have counted floating point values (and
assumed no unrolling or software pipelining) and taken that as the
contents of the register bank.

Aside from the generated-in address computations, there are uses such
as scope uplinks, subroutine parameter lists, loop control, and the
like. And those are just the conventional uses, not the agressive
ones.

>My favourite solution would be to have multiple stacks. How many? FOUR
>is the answer, because it is exceedingly rare that an expression
>contains as many independent threads of computation. A superscalar
>machine may attach independent ALUs to each stack, or even specialize
>them[4].
>[4] Indeed in practice superscalar implementations find it exceedingly
>hard to find degrees of microparallelism greater than two or three,

That's two or three things _per-clock_, on a _pipelined_ machine. You
are planning to operate more than one stack per clock? In a pipelined
fashion, so that each stack can be re-referenced while an outstanding
computation hasn't yet written back its result? How are the multiple
consequences resolved?


-- 
Don		D.C.Lindsay .. temporarily at Carnegie Mellon Robotics