Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!wuarchive!rice!news
From: preston@ariel.rice.edu (Preston Briggs)
Newsgroups: comp.arch
Subject: Re: Register Count
Message-ID: <1991Jan14.163739.10786@rice.edu>
Date: 14 Jan 91 16:37:39 GMT
References: <PCG.91Jan10162301@odin.cs.aber.ac.uk> <11566@pt.cs.cmu.edu>
Sender: news@rice.edu (News)
Organization: Rice University, Houston
Lines: 98

pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

>It can happen, even to the best; a lot of serious and important people
>that publish papers on "optimizing" compilers expound the funny view
>that things like graph coloring optimize speed and not space. I don't
>know who's the joke on.
>
>The joke is that graph coloring minimizes the *number* of spills, not
>their *cost* in time. Most spills have negligible cost in time, because
>they are not in critical paths, and are not executed frequently.

The idea of managing registers/memory via graph coloring is an old one
(at least back to '61).  The basic idea is to minimize the register
requirements (as in "find a minimal coloring").  Spilling isn't a part of
graph coloring; it's a part of register allocation.

The advantage of graph coloring is that it provides a simple abstraction
from the complexities of register allocation on various interesting
machines.  The reason it works well (and it _does_) is that many
machines have enough registers to run most programs.

Lots of people complain about the difficulty of implementation and
suggest some simpler alternative.  My experience is that a graph
coloring allocator is not that hard to build, the simpler alternatives
are not trivial to build, and that a global register allocator
(like graph coloring) smokes a local register allocator.

>If an "optimizer" does not attempt to minimize the *dynamic* cost of
>spills, by estimating the frequency of reuse of values either by
>feedback profiling or by heuristics, it will only achieve optimal timing
>by pure chance, or by having so many registers to play with that the
>cost of spills is zero because there aren't any (virtually all the so
>called "optimizers" for RISC machines).

Optimality sounds like a fine reason to have lots of registers,
especially since optimal choices by the compiler are undecidable.

Further, every register allocator based on graph coloring that I've
read about (and I read aggressively in this area) uses heuristics to
estimate the dynamic cost of spills.  In addition, some use of profiling
info.  Spill choices are always based on these dynamic estimates.

>I had hoped that more people understood what David Wall and Hank Dietz
...
>Yes, probably reading the relevant literature is a better use of your
>time. Reread David Wall and company a million times... Or maybe the
>David Wall in my world is a pop singer in yours :-).

I'm not sure exactly which Wall or Dietz papers you're refering to;
they both publish extensively.  I'll talk some about Wall.

Wall has papers in Sigplan 86 and Sigplan 88 discussing global
(in this case meaning interprocedural) register allocation at link 
time (the 88 paper compares with various forms of register windows).  
The papers are interesting for many reasons; however, Grandi's
emphasis is (probably) the small number of registers required
for reasonable performance and the small benefits achieved
with a large number of registers (the 86 paper reports results
for 8, 32, and 52 registers).  It's important to be a little
sceptical though, and look for holes (reasons why his measurements
might not apply to particular situations).

For his 86 paper, I wonder about:
1) The balance of his machine (cost of add versus cost of load).
   In the 5 years since the 86 paper, spills have gotten more
   expensive.

   Consider also various forms of low-level parallelism (pipelines
   and such).  These aren't recent inventions, but they change
   the shape of the code.

2) His optimizer.  It's not mentioned beyond "The compiler generates
   good code that does not load a variable twice in the same basic
   block..."  Sounds like value numbering to me. Optimization needs
   registers.  More powerful optimization needs more registers (and
   the payoff is faster programs).

   Even if he used a good optimizer, improvements in the last 5 years
   tend to make his numbers conservative (e.g., global value numbering,
   fancier pipeline scheduling, dependence analysis and loop restructuring).

3) His benchmarks.  Do they resemble anything you're interested in?

4) He doesn't talk about allocating addressing temporaries to registers.
   Again, I assume it's because of minimal optimization.  But temporaries
   are an important consumer of registers (and their reuse is an
   important source of improvement).

5) He doesn't do subsumption (coalescing) when coloring.  This saves
   copy instructions (cycles!) but sometime requires additional
   registers.  Therefore, his results may understate the improvement
   and the registers requirements.

Reading Wall's work is valuable.  His numbers are interesting and
informative.  But I think they should be interpreted as showing trends,
not exact register requirements.

Preston Briggs