Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!cbmvax!daveh
From: daveh@cbmvax.commodore.com (Dave Haynie)
Newsgroups: comp.arch
Subject: Re: registerless architecture
Keywords: cache
Message-ID: <17212@cbmvax.commodore.com>
Date: 8 Jan 91 04:35:34 GMT
References: <1990Nov12.145410.29035@cs.cmu.edu> <56084@brunix.UUCP> <1990Nov14.064225.14406@caliban.uucp> <1990Nov21.004355.212@noose.ecn.purdue.edu>
Reply-To: daveh@cbmvax.commodore.com (Dave Haynie)
Organization: Commodore, West Chester, PA
Lines: 54

In article <1990Nov21.004355.212@noose.ecn.purdue.edu> hankd@dynamo.ecn.purdue.edu (Hank Dietz) writes:
>As to all the comments about needing only cache, I've said it before
>and I'll say it again....  Registers help because:

>[1]	They are fast
>[2]	Register refs don't interfere with memory data path
>[3]	You never miss (i.e., have static timing for schedules)
>[4]	Register names are shorter than addresses

>A conventional cache gets you only benefit [1]; however, ambiguously
>aliased references (array elements and pointer targets) are
>effectively managed by a cache whereas they require frequent flushing
>from registers.  If you want all the benefits, you need both....

Well, it seems to me that if you built a registerless machine right, you
could pick up a few more points.  A good cache is fast these days.  So
lets have three, one for data, one for instruction, one to replace actual
registers.  So we got [1].

As for [2], registers to intefere with a memory path -- when they are swapped
to main memory during a context swap.  So if we have a good sized register 
cache, in many cases we not only miss interference during task execution, but
from within a task as well.  Like a Harvard machine, only with three internal
data paths rather than two.  I guess you have to decide how the register cache
actually works during a program execution -- it one could treat each virtual
register as on a normal fixed register machine, but it would probably make as
much sense to make it act like a register window machine.  In today's silicon,
you could have a 4-8K register cache with multiple set associtivity.

Number [3] is something of an issue -- with a task swap on a conventional
machine, you "miss" only on task boundaries.  Here, you miss the first time
you access a register, but never again, at least until your task is swapped out
and back in, in which case you may miss, but even that's not guaranteed.

Number [4] is solved by making all working register references relative to a
real register, which points to the base of register space.  The time to add
in the offset from the base pointer can be hidden in the CPU pipeline if 
there's a dedicated adder for this purpose.

Still, with all that said, I'm not sure this puppy buys you much over the
conventional approach, and it does make the design of the CPU more complex.
It would definitely cut down on the context swap time, and does have the
interesting property of making the number of logical registers used in a
task definable by the OS, or even the application if you split things into
user and supervisor/kernel space.

>						-hankd


-- 
Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
	"Don't worry, 'bout a thing. 'Cause every little thing, 
	 gonna be alright"		-Bob Marley