Path: utzoo!attcan!sobmips!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!purdue!bu-cs!mirror!ima!esegue!compilers-sender
From: preston@rice.edu (Preston Briggs)
Newsgroups: comp.compilers
Subject: Re: Graph coloring register allocation
Message-ID: <1989Nov16.145318.6207@esegue.segue.boston.ma.us>
Date: 16 Nov 89 14:53:18 GMT
References: <1989Nov15.010209.11362@esegue.segue.boston.ma.us>
Sender: compilers-sender@esegue.segue.boston.ma.us
Reply-To: Preston Briggs <preston@rice.edu>
Organization: Rice University, Houston
Lines: 81
Approved: compilers@esegue.segue.boston.ma.us

rfg@ics.uci.edu writes:
>preston@rice.edu writes:
>>I don't know which is more effective.  I suspect (with no evidence) that
>>Chow's method will be better on constrained machines (not many registers) and
>>Chaitin's better otherwise.  Neither seems worthwhile without some global
>>optimization to provide grist for the mill.

>Chow's method is always at least as good as Chaitin's graph coloring
>(and sometimes better).  

You need a little explanation here.  I don't think anyone's published
a comparison, so what's your justification?

I spoke briefly with James Larus this summer, and he was curious too.
(see "Register Allocation for the Spur Lisp Compiler"
	Larus and Hilfinger
	Sigplan Compiler Construction Conference, 1986
 for a description of his version of Chow's allocator.)

Chow's claims in his paper are a little inaccurate.  For example,
he claimed Chaitin didn't account for loop nesting level; but loop
nesting depth *is* accounted for.  I expect Chow was referencing only
Chaitin's 1st paper, though his bibliography mentions the 2nd.

The priority-based coloring heuristic (just the coloring heuristic, not the
entire register allocation scheme) is less effective than Chaitin's coloring
heuristic (makes more spill code) according to Bernstein, et al. in Sigplan
89.  Additionally, we've published an improved version of Chaitin's
heuristic (also Sigplan 89), so the gap may be larger.  Chow's advantage is
live range splitting, and even that advantage has been cut down by the
"clean spilling" ideas of Berstein and friends.

The big win for Chaitin is the coalescing (subsumption) phase he uses just
before coloring (after computing the interference graph).  Chaitin searches
for register-to-register copies where the source and destination do not
interfere and eliminates the copy, renaming one live range to another
globally.  Chow's can't easily be extended since he computes his
interferences based on basic blocks.  (two live ranges connected by a copy
will always interfere since they are live in the same basic block.)

Additionally, since Chow's interferences are computed only to the nearest
basic block, only one live range per block can be given a color.  Chaitin's
interference graph is accurate to the (very low level) intermediate language
statement level; therefore, a single color (register) can be used for many
live ranges in a single block.

>It is likely that it will only be significantly
>better on machines *without* lots of registers.

Right.  Chow will be better (maybe) when things are really tight.
*Significantly* is the question.

>If you have a zillion registers,
>you never spill with either method, so they're equal in that case.

Chaitin won't spill, but Chow will in some cases.  A very long live range,
with very few references, will be split and some portions spilled.
Additionally, the lack of coalescing in Chow causes different code (extra
copies).  Finally, the coarser interference graph and inferior coloring
heuristic produce worse colorings.  Practically (less than a zillion
registers), this means that Chow will spill when Chaitin can still color.

On the other hand, the coarser interference graph is much smaller
which suggest that Chow's method will be quicker.  However, Chaitin's
isn't too bad.

I also wonder if Chow's method works over bizzare register sets (overlapping
integer and floats, base registers, instructions returning register pairs,
paired double precision floats, ad naseum)?  Chow mentions these as areas
for future research, but I've read no results.  I'll state without proof
that Chaitin's method extends naturally to handle all of the above cases.

If anyone's feeling really energetic, implement both, with all the known
improvements, and try them out on a variety of machines.  (and tell us what
happened!)

Preston Briggs
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.