Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!cbmvax!daveh From: daveh@cbmvax.commodore.com (Dave Haynie) Newsgroups: comp.arch Subject: Re: registerless architecture Keywords: cache Message-ID: <17212@cbmvax.commodore.com> Date: 8 Jan 91 04:35:34 GMT References: <1990Nov12.145410.29035@cs.cmu.edu> <56084@brunix.UUCP> <1990Nov14.064225.14406@caliban.uucp> <1990Nov21.004355.212@noose.ecn.purdue.edu> Reply-To: daveh@cbmvax.commodore.com (Dave Haynie) Organization: Commodore, West Chester, PA Lines: 54 In article <1990Nov21.004355.212@noose.ecn.purdue.edu> hankd@dynamo.ecn.purdue.edu (Hank Dietz) writes: >As to all the comments about needing only cache, I've said it before >and I'll say it again.... Registers help because: >[1] They are fast >[2] Register refs don't interfere with memory data path >[3] You never miss (i.e., have static timing for schedules) >[4] Register names are shorter than addresses >A conventional cache gets you only benefit [1]; however, ambiguously >aliased references (array elements and pointer targets) are >effectively managed by a cache whereas they require frequent flushing >from registers. If you want all the benefits, you need both.... Well, it seems to me that if you built a registerless machine right, you could pick up a few more points. A good cache is fast these days. So lets have three, one for data, one for instruction, one to replace actual registers. So we got [1]. As for [2], registers to intefere with a memory path -- when they are swapped to main memory during a context swap. So if we have a good sized register cache, in many cases we not only miss interference during task execution, but from within a task as well. Like a Harvard machine, only with three internal data paths rather than two. I guess you have to decide how the register cache actually works during a program execution -- it one could treat each virtual register as on a normal fixed register machine, but it would probably make as much sense to make it act like a register window machine. In today's silicon, you could have a 4-8K register cache with multiple set associtivity. Number [3] is something of an issue -- with a task swap on a conventional machine, you "miss" only on task boundaries. Here, you miss the first time you access a register, but never again, at least until your task is swapped out and back in, in which case you may miss, but even that's not guaranteed. Number [4] is solved by making all working register references relative to a real register, which points to the base of register space. The time to add in the offset from the base pointer can be hidden in the CPU pipeline if there's a dedicated adder for this purpose. Still, with all that said, I'm not sure this puppy buys you much over the conventional approach, and it does make the design of the CPU more complex. It would definitely cut down on the context swap time, and does have the interesting property of making the number of logical registers used in a task definable by the OS, or even the application if you split things into user and supervisor/kernel space. > -hankd -- Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy "Don't worry, 'bout a thing. 'Cause every little thing, gonna be alright" -Bob Marley