Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!sun-barr!sun!chiba!khb From: khb@chiba.Sun.COM (Keith Bierman - SPD Languages Marketing -- MTS) Newsgroups: comp.arch Subject: Re: Register usage Message-ID: <107834@sun.Eng.Sun.COM> Date: 2 Jun 89 18:59:26 GMT References: <978@aber-cs.UUCP> Sender: news@sun.Eng.Sun.COM Reply-To: khb@sun.UUCP (Keith Bierman - SPD Languages Marketing -- MTS) Organization: Sun Microsystems, Mountain View Lines: 93 In article <978@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >... >It must be repeated for the Nth time that this is only true if spill >minimization is of paramount importance; if you look at speed, then most >spills avoided by global optimizers with large register sets don't make much >of a difference. And for the N+1th time, I guess, it must be repeated that the class of machines for which minimizing spills is quite interesting (and getting more so all the time). Consider the paper by Hsu,Dehnert, and Bratt in ASPLOS-III on the Cydra 5, as an example. > > Upshot: modern compilers can employ as many registers are you can design > in. > >But pointlessly... And even old compilers can just register everything in One can, but old compilers tend to do a very bad job of it. >sight; if there are many registers, then using an optimizer is not very >important. The hard work, as we have just discussed, is to cache *only* the >variables that matter, for *only* the section of code where they matter (and Which with software pipeling, trace scheduling and similar techniques can be quite a long time indeed. On the Cydra 5 a memory write was assumed to take 26 cycles, two memory references could be initated EVERY clock, as well as two integer operations and two FP operations. Since the FP operations required more than one cycle, the instruction scheduling was quite interesting. With respect to register "live time" a given register might be required several loop iterations into the future. >this can be done by the programmer using "register" in C, or by the compiler Which is often ignored by the compiler...simply because programmers cannot reasonably guess how the compiler (try it on your multiflow trace/28 for a bunch of codes and show us what is produced!) will unroll, split and otherwise contort your code. >when fed with either "representative" profile data, or with calculations or >estimations of where hot spots lie). This tipically requires many less >registers than minimizing spills regardless of whether they are expensive >ones or not. Doing a "spill" (i.e. running out of interconnect) on the Cydra 5 meant your loop ran 10x slower. This is not acceptable to most programmers. > Naive rationale for infinite (as long as they are free) registers: > ^^^^^^^^^^^^^^^^^^^^^^^^ > >Unfortunately they are not free; more registers make the system stiffer, in True. Which is why folks build windows, small (32) register files, file pointers (AMD, Gould) and other stuff. >that they do raise the cost of multithreading, which is where os technology >is finally heading (Mach, Os/2, etc...), and they do have costs in real >estate and even, possibly, cycle time lengthening (Cray's law). You only >need a handful of register to capture most of the benefit of expression >optimization, and another to capture most of the benefits of intra statement >optimization (whether you do it via "register" in C or leave it to the >compiler). Your assertation about multithreading is quite true, it is here to stay. It is far from clear that transputer type designs will win (tiny machine fast communication) out over somewhat "chunkier" designs. But the assertation about a handful of registers being sufficient on high performance machines is simply not borne out. All of Seymour's machines have a bunch (don't forget those vector registers), and this is NECESSARY for those long pipes (superpiplining ?) ... and it is just as true for software pipelining. >Large register banks are only justified for special purpose machines >(vector, VLIW, superscalar) where the only thing that matters is raw speed >in processing batched numeric codes where there is an inherent high degree >of parallelism in the algorithms employed. Multiflow claims that they eat "general" code just fine. As Mashy has pointed out there are several superscalar projects running around ... and business codes, database codes, and windowing systems benefit from that kind of parallelism just like numeric codes (although writing the code in C makes it much harder to extract the parallelism). Keith H. Bierman |*My thoughts are my own. Only my work belongs to Sun* It's Not My Fault | Marketing Technical Specialist ! kbierman@sun.com I Voted for Bill & | Languages and Performance Tools. Opus (* strange as it may seem, I do more engineering now *)