Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!wuarchive!rice!news From: preston@ariel.rice.edu (Preston Briggs) Newsgroups: comp.arch Subject: Re: Register Count Message-ID: <1991Jan14.163739.10786@rice.edu> Date: 14 Jan 91 16:37:39 GMT References: <11566@pt.cs.cmu.edu> Sender: news@rice.edu (News) Organization: Rice University, Houston Lines: 98 pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >It can happen, even to the best; a lot of serious and important people >that publish papers on "optimizing" compilers expound the funny view >that things like graph coloring optimize speed and not space. I don't >know who's the joke on. > >The joke is that graph coloring minimizes the *number* of spills, not >their *cost* in time. Most spills have negligible cost in time, because >they are not in critical paths, and are not executed frequently. The idea of managing registers/memory via graph coloring is an old one (at least back to '61). The basic idea is to minimize the register requirements (as in "find a minimal coloring"). Spilling isn't a part of graph coloring; it's a part of register allocation. The advantage of graph coloring is that it provides a simple abstraction from the complexities of register allocation on various interesting machines. The reason it works well (and it _does_) is that many machines have enough registers to run most programs. Lots of people complain about the difficulty of implementation and suggest some simpler alternative. My experience is that a graph coloring allocator is not that hard to build, the simpler alternatives are not trivial to build, and that a global register allocator (like graph coloring) smokes a local register allocator. >If an "optimizer" does not attempt to minimize the *dynamic* cost of >spills, by estimating the frequency of reuse of values either by >feedback profiling or by heuristics, it will only achieve optimal timing >by pure chance, or by having so many registers to play with that the >cost of spills is zero because there aren't any (virtually all the so >called "optimizers" for RISC machines). Optimality sounds like a fine reason to have lots of registers, especially since optimal choices by the compiler are undecidable. Further, every register allocator based on graph coloring that I've read about (and I read aggressively in this area) uses heuristics to estimate the dynamic cost of spills. In addition, some use of profiling info. Spill choices are always based on these dynamic estimates. >I had hoped that more people understood what David Wall and Hank Dietz ... >Yes, probably reading the relevant literature is a better use of your >time. Reread David Wall and company a million times... Or maybe the >David Wall in my world is a pop singer in yours :-). I'm not sure exactly which Wall or Dietz papers you're refering to; they both publish extensively. I'll talk some about Wall. Wall has papers in Sigplan 86 and Sigplan 88 discussing global (in this case meaning interprocedural) register allocation at link time (the 88 paper compares with various forms of register windows). The papers are interesting for many reasons; however, Grandi's emphasis is (probably) the small number of registers required for reasonable performance and the small benefits achieved with a large number of registers (the 86 paper reports results for 8, 32, and 52 registers). It's important to be a little sceptical though, and look for holes (reasons why his measurements might not apply to particular situations). For his 86 paper, I wonder about: 1) The balance of his machine (cost of add versus cost of load). In the 5 years since the 86 paper, spills have gotten more expensive. Consider also various forms of low-level parallelism (pipelines and such). These aren't recent inventions, but they change the shape of the code. 2) His optimizer. It's not mentioned beyond "The compiler generates good code that does not load a variable twice in the same basic block..." Sounds like value numbering to me. Optimization needs registers. More powerful optimization needs more registers (and the payoff is faster programs). Even if he used a good optimizer, improvements in the last 5 years tend to make his numbers conservative (e.g., global value numbering, fancier pipeline scheduling, dependence analysis and loop restructuring). 3) His benchmarks. Do they resemble anything you're interested in? 4) He doesn't talk about allocating addressing temporaries to registers. Again, I assume it's because of minimal optimization. But temporaries are an important consumer of registers (and their reuse is an important source of improvement). 5) He doesn't do subsumption (coalescing) when coloring. This saves copy instructions (cycles!) but sometime requires additional registers. Therefore, his results may understate the improvement and the registers requirements. Reading Wall's work is valuable. His numbers are interesting and informative. But I think they should be interpreted as showing trends, not exact register requirements. Preston Briggs