Path: utzoo!attcan!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: Compilers taking advantage of architectural enhancements Message-ID: Date: 14 Oct 90 18:00:09 GMT References: <1990Oct9> <3300194@m.cs.uiuc.edu> <1990Oct11.223224.26604@rice.edu> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 124 Nntp-Posting-Host: odin In-reply-to: aglew@crhc.uiuc.edu's message of 12 Oct 90 03:28:01 GMT On 12 Oct 90 03:28:01 GMT, aglew@crhc.uiuc.edu (Andy Glew) said: [ ... some comments on large numbers of registers being useful ... ] Bah, as usual. If you use them as static cache, yes. But isn't a dynamic cache as good and less trouble? Yes, if you don't use a load-store architecture. In a load-store architecture to address a line in the cache takes an extra load or store instruction, and potentially a delay slot; addressing a line in the register bank takes just a wide register number field in the current instruction. Note that some compilers are starting to treat cache lines as registers indeed, by scheduling code to have optimal cache reference patterns. So, given that one wants an intermediate cache between the input and output ports of the CPU functional units, and the main memory, we could have three alternatives: 1) a dynamic cache addressed with aliases of main memory addresses. 2) a static cache in a separate, much smaller, address space. 3) a cache with multiple stacks, only top of stacks have addresses. If your architecture can address "efficiently" the main memory address space, 1) is better than 2); if it cannot, 2) is better than 1); in all cases MNHO 3) is better than either 1) or 2), because it is dynamic just like 1) and does not require long addresses just like 2). aglew> I agree with you --- I really don't understand why heterogenous aglew> register files are so hard to handle. But homogenous register aglew> files are one thing that compiler people have gone into aglew> rhapsodies wrt. RISC about. That's actually not difficult to comprehend, IMHNO, as soon as you realize that registers, as currently (mis)understood, perform actually two completely different functions (at least -- there are others): 1) inputs and outputs to functional units ("accumulators") 2) statically managed cache ("temporaries"). The former function of registers means that they are essentially entry and exit ports of a queueing network. In order to generate efficient code for a queueing network you must analyze flows into it, or something equivalent. This seems harder than just considering problem 2), which is already hard enough. aglew> Here's one example: the Intel 80x86 is basically a heterogenous aglew> register file machine. Specific registers were tied to the aglew> outputs and inputs of specific functional units in the original aglew> hardware. Compiler people hated targetting this architecture, aglew> and there are very few compilers that can produce machine code aglew> comparable to hand-coded assembly on this architecture. Oh yes. But this is simply because current compiler technology is mostly based on believing that registers are there only to be a statically managed cache. Thus ridiculous things like graph coloring, which minimizes the *static* costs, e.g. code size, not run time, unless there are so many registers that essentially all values, including those that are dynamically important, have a chance of ending up in a register. There are plenty of research papers that show that 1) the number of dynamically important values is very small, for a single functional unit. 2) large numbers of registers are useful under graph coloring and on machines that have a huge gap between register file and cache. The two sets of results can only be reconciled by observing that: * vector/superscalar etc... have in effect multiple functional units * graph coloring wastes a large amount of registers to dynamically unimportant values, and load/store architectures have a huge gap between register file and cache. Not surprising, eh? I reckon that fully specialized registers (e.g. having input-only and output-only registers that map directly onto functional unit ports) are best, and that caching temporaries ought not to be done with registers. I would like a more data-flow like architecture, in which the input and output ports of the functional units (and the relative delays maybe) are directly exposed, and separate. Caching, IMNHO, ought to be performed using multiple cached stacks, or anyhow using dynamic caching (e.g. like in the i386/i486, where the onchip cache is almost a large associative register bank). Naturally exposing the functional units and their input and output ports (hints of VLIW here) means that the number of architecturally visible ports varies with the number of functional units in different implementations. This is a problem anyhow; one can solve it in several ways, e.g.: 0) recompiling for different implementations 1) lengthening of the instruction word (VLIW) 2) register renaming (RS/6000) 3) dynamic queuing (MU dataflow) You may argue that 0) is not a solution; but consider: it is probably the best way to take advantage of the specificities of a particular implementation. 1) is a slightly easier way of doing 0). 2) ensures binary portability, but I don't see how it could work over a large range of functional unit numbers. 3) is guaranteed to exploit any number of functional units nearly optimally, but requires sophisticated hardware. aglew> But heterogenous register files are much easier to make fast. Because you do not have to put logic in that does the mapping from the register file as static cache to the input-output ports of the functional units, if you choose one of solutions 0-2) above. Arguably solution 3) is so flexible that its potential complexity/speed disadvantage can be offset by adding extra functional units, even if there are hints that the inherent parallelism in many applications does not require a lot of functional units (my rule of thumb is '4'). -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk