Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!snorkelwacker!spdcc!ima!esegue!compilers-sender
From: dgb@cs.washington.edu (David Bradlee)
Newsgroups: comp.compilers
Subject: register allocation
Keywords: chow, chaitin
Message-ID: <1989Nov22.183501.6735@esegue.segue.boston.ma.us>
Date: 22 Nov 89 18:35:01 GMT
Sender: compilers-sender@esegue.segue.boston.ma.us
Reply-To: dgb@cs.washington.edu (David Bradlee)
Organization: U of Washington, Computer Science, Seattle
Lines: 139
Approved: compilers@esegue.segue.boston.ma.us

This is a response to Preston Briggs' article from a couple of days
ago.  It's from a friend with whom I've worked at Microsoft.

***************************************************
From microsoft!bobs@beaver Tue Nov 21 15:34:39 1989
Reply-to: bobs@bobs (Bob Scheulen)

Hi - A friend of mine sent me a copy of one of your news postings
because I've been looking at both Chatin & Chow.  I don't read news
so appologies if I'm out of date on the discussion.

Backgroud:
We (MS) have implemented a scheme like Chow's for a 8086/286 C compiler.
There are a few major differences between us and Chow: first, instead 
of assuming the live range is the whole procedure, we split a variable
into its true lifetimes (see note below about what I mean by that).
Secondly, we don't split ranges at all, we just give up (except we have
a special hack for loops...).

Definition of a lifetime: the set of all definitions and references to
a variable which MUST be the same variable (and so must get the same
register).  This corresponds the the definition of liveness used in
global optimization (in fact we discover the lifetime by tracing thru
all contiguous blocks where the variable is live until no more can be added
--yes I know this is ineffecient).  Under this definition of lifetime,
a variable could be split into its separate lifetimes in the user source
(eg. a user variable "i" could be change to "i0", "i1" etc.).  I point
this out, because it does not agree with either Chatin or Chow.  I admit
I don't understand the implications (or reason) for Chatin's definition.
I am not sure I even understand what Chatin means, but he apparently limits
a live range to those uses which only one definition reaches.  I assume
this means that uses which can be reached by multiple definitions are in
their own live range.  One would think this could create tons of copies,
but maybe thats why I don't understand why subsumption is good for much.
Apologies if this didn't make much sense.

Specific comments:

>>The big win for Chaitin is the coalescing (subsumption) phase he uses ...

In terms of coloring, it would seem that all subsumption can do is make
things less colorable, because it is effectively extending the live
range.  For example, isn't it possible that the first pseudo register
(the src of the copy) can be colored while the other can't?
I also don't understand where all these copies come from (except possible
as a result of the definition of live range Chatin uses..see above 
disscussion).  The other obvious cases are silly copies generated by
the code generator because it needs to preserve the original register
contents.  We solve this problem by a lot of special case code in the
code generator so that it never generates useless copies.  If we didn't
do this we would get extra demand on the scratch register (remember that
Chow partitions the register set) which could cause extra spills, but
I think a smart peephole would be able to clean that up.  It would
seem that Chatin could so subsumtion after coloring (assuming I'm right
that subsumption has no positive effect on colorability).

>>If you have a zillion registers,
>>you never spill with either method, so they're equal in that case.
>
>Chaitin won't spill, but Chow will in some cases.  A very long live range,
>with very few references, will be split and some portions spilled.
>Additionally, the lack of coalescing in Chow causes different code (extra
>copies).  Finally, the coarser interference graph and inferior coloring
>heuristic produce worse colorings.  Practically (less than a zillion
>registers), this means that Chow will spill when Chaitin can still color.

If you had a zillion registers Chow won't spill.  Chow does not spilt
ranges until after coloring fails (Chow P.81).
Chow claims to be able to deal with smaller than basic block granularity,
by just splitting the basic block in sub blocks (see Chows thesis P.73).
I certainly wouldn't defend this as a good solution. Chatin's fine-grained
interference is clearly better.
As previously discussed, as far as I know Chow doesn't need coalescing.

In terms of working with a machine with few registers, I think Chatin
has the advantage because he doesn't partition the register set.
Chatin (or any post-generation coloring) allows you to look at the
whole problem of allocation at once. The other alternative is McCusisk's
variable partition, or Catells prepass to guess what scratch registers
the codegen will need.  Neither of these alternatives account for
the possibility of different code sequences you can generate based
on the availability of extra scratch registers (because Catell regenerates,
he could use extra registers, but he has no cost analysis to
determine whether to leave extra registers free).  Note that this
argument ignores the problem of constrained register assignment.

The fact that the register usage is constrained (on the 86-family) is a 
problem no matter which technique you use.  Our solution in the framework
of Chow was twofold: first you must constrain each nodes to
interfere with whatever fixed registers its range interferes with.
This is no different that the solution proposed by Chatin.
The other part is to give each range its "best" register.  I didn't
write the code, so I can't say how much of this we do, but clearly
Chow has a slight advantage here because register are allocated in
"maximum benefit" order (so the maximum benefit variable gets it first
choice of registers).  An obvious hack to Chatin would be to add
an extra pass to reassign the colors in the same maximum benefit order.
Or maybe you could always push nodes on the stack in spillcost order
(even if they're unconstrained)..or..I'm not familiar enough with
Chatin to really know.  Even if you did this you still have to deal
with two problems: do you assume you have an infinite number of
registers in each constraint class, and if you do how do you correct
the code when you have to spill (which on the 86/286 is often).
Our reason for using Chow in the first place was because this problem
seemed very difficult.

On the 386 where the register set is small and the constraint problem
isn't so bad, Chatin may well be a good choice.  I belive GCC (Gnu) uses
some kind of post-generation coloring (if they're really like Davidson-
Fraser, then it's probably Frieberghouse's use count scheme).  In the
little fiddling I did with GCC it generated quite good code.  From
my point of view the advantage of Chatin (or any post-generation scheme) 
is in avoiding partitioning and giving more accurate cost analysis.
The latter is because, the 386 being a CISC machine has complex addressing
modes and alternate instruction sequences that demand different number
of registers, so determining the benefit of register assigment by
scanning il (as we do with Chows pre-generation scheme) make cost 
analysis difficult.

I think the Chow .VS. Chatin could be summarized as:
Chow:	pre-generation(color on intermedite, partitioned register set)
	goal directed allocation
	uses range splitting
Chatin: post-generation (color on instructions, uniform register usage)
	goal directed spilling.
	range is true live range, but no other splitting done.
And the questions become:
Which method has better information to do the cost analysis?
Which method deals best with the "registers affect the code, code affects 
  the registers" probem?
When does spilling give sub-optimial results?
What is the effect of range splitting?
etc.

		- Bob Scheulen
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.