Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!rupert!pcg From: pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.lang.misc Subject: Re: function calls Message-ID: Date: 28 Mar 90 19:13:01 GMT References: <29551@amdcad.AMD.COM> <6076@brazos.Rice.edu> Sender: pcg@aber-cs.UUCP Followup-To: comp.lang.misc Organization: Coleg Prifysgol Cymru Lines: 93 In-reply-to: preston@titan.rice.edu's message of 26 Mar 90 06:55:46 GMT Posting-Front-End: GNU Emacs 18.47.1 of Wed Mar 15 1989 on rupert (berkeley-unix) In article <6076@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: On many (most?) machines, 2 uses is enough to justify keeping a value in register. Only if you look at the relative cost of refetching vs. keeping in the register. Speeding up a program by 0.001 percent by caching values that are goig to be reused only a few dozen or even thousand times (when a machine's speed in rated in millions of instructions per second) is pointless, and registers do not come free. You usually want to allocate a register only to values that are used for a significant fraction of the program's memory accesses. The others simply don't matter. Which fraction of the total memory accesses you want to consider large enough to warrant register'ing a value is of course dependent on the cost of reloading from memory; if you have an architecture or a language that makes this very expensive, you will want to cache more values, but then it is all your fault, sheesh. >It is not difficult to produce examples of fairly common pieces >of code where on many machine register caching worsens performance. Of course, we can produce plenty of examples where many registers are helpful. Yes, but only for very special codes or architectures. Further, can you post an example of some sort? The obvious example of such an oddity is on traditional general purpose architectures and argument passing; it may be worse than pointless to pop an argument from the stack. That's why they have stack offset address modes... CPU's outrun memory. They have been for years, and memory isn't catching up. Hence the development of caches, multi-level caches, wide data busses, and large register sets. Von Neumann's bottleneck? Who said bottleneck? The guy from Thinking Technologies down the hall? :-> >My aversion to large register caches You like "regular" caches but not registers caches... But regular (pseudo) associative caches are far less "stiff" than satically managed ones. On general purpose architectures and codes, of course. If you can otherwise predict access patterns, "registers" win (can I say "vector codes"?). Aren't registers just the top of the memory hierarchy? Yes and no. They are the interface to functional units, and/or caches. The current mania for general purpose registers seems to obscure this. There are machines that only have specialized registers, i.e. one per functional unit (X index registers, Y integer accumulators, Z floating registers, where X,Y,Z are the numbers of the respective functional units), and a cache, usually organized as a stack, on top of each of these. Note that such a machine has a number of interesting advantages from the language implementor point of view; for example, to return to the issue from which our thread started, procedure calls, and funargs, and coroutines, and any form of context switch, become much easier/faster. My own dream machine is a machine with multiple cached stacks, each stack served by a functional unit. Some hints are available that in typical code you only have four overlappable threads of computation, because you often find that you only need four spills to compute most expressions. Well, a machine with four accumulators and cached stacks behind is my idea of a high code density superscalar. My favourite examples are CRISP and NOVIX. They are rather more subject to control from software, but I'd think that was a plus. If you know access patterns in advance... I'd rather see the systems extended so that the other layers of the hierarchy were also under software control (prefetches to cache, fetches around cache, prefetching pages of virtual memory, and so forth). There is an uncertain tradeoff between generality and reliability and complexity of hardware vs. software here. A particularly difficult issue. My own rule is that complexity that is not supported by very big sure advantages is to be regarded with the utmost suspicion, whether it manifests itself as number of intructions, address modes, registers, optimizers, and so on... -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk