Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!caliban!caliban!ig From: ig@caliban.uucp (Iain Bason) Newsgroups: comp.arch Subject: Re: registerless architecture Keywords: cache Message-ID: <1990Nov14.064225.14406@caliban.uucp> Date: 13 Nov 90 17:47:46 GMT References: <1990Nov12.145410.29035@cs.cmu.edu> <56084@brunix.UUCP> Sender: ig@caliban.uucp (Iain Bason) Reply-To: ig@caliban.UUCP (Iain Bason) Organization: none Lines: 106 curtis>In article <56084@brunix.UUCP> cgy@cs.brown.edu (Curtis Yarvin) writes: scott>In article <1990Nov12.145410.29035@cs.cmu.edu> spot@WOOZLE.GRAPHICS.CS.CMU.EDU (Scott Draves) writes: tom>In article tom@ssd.csd.harris.com (Tom Horsley) writes: herman>In article <2731@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: mark>In article <1990Nov13.011231.4899@rice.edu> foo@titan.rice.edu (Mark Hall) writes: scott>Has anyone every thought about or done a registerless architecture? scott>registers, after all, are just a sort of cache, another level in the scott>memory hierarchy. but a fixed size, hard-wired one. curtis> curtis>...registers can still be made a bit faster; no association or anything necessary curtis>(this goes unless you are one of those direct-mapped cache people). This curtis>capability isn't much used in practice, though - generally both register curtis>and cache hits take one clock cycle. curtis> I agree here. One other point is that if you're doing a stack cache, and all your instructions use indexing off the stack pointer, you have to add the index to the stack pointer. That is going to take > 0 time. scott>context switch is fast and easy, there's nothing but CCR, PC, and FP. curtis> curtis>Ah, but no... you have to flush your cache anyway, you don't really curtis>gain anything here. This is not entirely true. It doesn't take much hardware to add a (small) process-id tag to cache lines. Then cache flushing can take place in the background, while the CPU does useful work. In some cases (e.g., a simple interrupt handler) only a few cache lines will be flushed before the CPU returns to the interrupted process. tom>Once a long long time ago in a universe far far away I worked on a compiler tom>for a new machine that was going to be registerless because, as the engineers tom>said, "cache is just as fast as registers anyway". tom> tom>By the time we got to the point where they were ready to cancel the project tom>the engineers had taken to pleading with the compiler writers to come up tom>with some way to allocate variables in locations such that frequently used tom>variables would be in spots that didn't get cache collisions with other tom>frequently used variables... tom> tom>There is a common technique for doing something like this in compilers. It tom>is called "register allocation". Unfortunately, it is orders of magnitude tom>more difficult to do when there are no registers... Most (maybe all? anyone know?) C compilers will allocate local variables on the stack. Hardware can certainly be designed to cache a stack (all you have to do is avoid collisions from contiguous memory; I would think this would be the normal way to do a cache). The compiler could create new locals and just pretend they are registers (although I'm sure there would be smarter ways to optimize for the architecture). I expect many languages other than C can also be made to allocate local variables on the stack. Lisp might be tough, and Smalltalk, but then they usually are on any architecture. A compiler for a machine like this would obviously be different. For instance, I imagine "register" coloring would be difficult to do when the number of "registers" is variable. You have to take into account the fact that other routines may have data in the cache, and only allocate space if you think it will save this routine more time than it will cost other routines. tom>spot> any thoughts on this? stupid idea, or the wave of the future? :) tom> tom>Stupid idea (that's your phrase, not mine :-). This is far from clear herman>Only a 9-bit field relative to a pointer? One of the stupid (in my opinion) herman>things about the 86-class machines is the 16 bit field relative to a pointer, herman>and more than one such field could be active. I don't think Scott is proposing to limit *all* indexes to 9 bits. Look at it this way: most CPUs limit you to 5-bit indexes into their register files, but they let you use larger indexes into memory. herman>Indirect addressing and addressing relative to registers is extremely herman>important; to replace registers with cache intelligently would require herman>allowing arbitrary depth of indirection, which is not a bad idea. Gaaak! Please banish the thought from your mind. I believe one company (Data General?) had a hell of a time trying to do virtual memory with such a "feature". Apparently it was almost never used, anyway. mark> Just for a sense of history: the TI 9900 (and 99000 I believe) were mark> also registerless. They never made it very big in the marketplace. mark> mark> (this is almost folklore to me, so correct me if I am wrong. It has mark> been a long time since I looked at a chip spec. Any chip spec.) I believe you are correct, although I'd never even heard of the 99000. "They never made it very big" is being charitable. Speaking of which, does anyone remember the Fairchild F8? -- Iain Bason ..uunet!caliban!ig -- Iain Bason ..uunet!caliban!ig