Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!snorkelwacker!spdcc!ima!esegue!compilers-sender From: dgb@cs.washington.edu (David Bradlee) Newsgroups: comp.compilers Subject: register allocation Keywords: chow, chaitin Message-ID: <1989Nov22.183501.6735@esegue.segue.boston.ma.us> Date: 22 Nov 89 18:35:01 GMT Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: dgb@cs.washington.edu (David Bradlee) Organization: U of Washington, Computer Science, Seattle Lines: 139 Approved: compilers@esegue.segue.boston.ma.us This is a response to Preston Briggs' article from a couple of days ago. It's from a friend with whom I've worked at Microsoft. *************************************************** From microsoft!bobs@beaver Tue Nov 21 15:34:39 1989 Reply-to: bobs@bobs (Bob Scheulen) Hi - A friend of mine sent me a copy of one of your news postings because I've been looking at both Chatin & Chow. I don't read news so appologies if I'm out of date on the discussion. Backgroud: We (MS) have implemented a scheme like Chow's for a 8086/286 C compiler. There are a few major differences between us and Chow: first, instead of assuming the live range is the whole procedure, we split a variable into its true lifetimes (see note below about what I mean by that). Secondly, we don't split ranges at all, we just give up (except we have a special hack for loops...). Definition of a lifetime: the set of all definitions and references to a variable which MUST be the same variable (and so must get the same register). This corresponds the the definition of liveness used in global optimization (in fact we discover the lifetime by tracing thru all contiguous blocks where the variable is live until no more can be added --yes I know this is ineffecient). Under this definition of lifetime, a variable could be split into its separate lifetimes in the user source (eg. a user variable "i" could be change to "i0", "i1" etc.). I point this out, because it does not agree with either Chatin or Chow. I admit I don't understand the implications (or reason) for Chatin's definition. I am not sure I even understand what Chatin means, but he apparently limits a live range to those uses which only one definition reaches. I assume this means that uses which can be reached by multiple definitions are in their own live range. One would think this could create tons of copies, but maybe thats why I don't understand why subsumption is good for much. Apologies if this didn't make much sense. Specific comments: >>The big win for Chaitin is the coalescing (subsumption) phase he uses ... In terms of coloring, it would seem that all subsumption can do is make things less colorable, because it is effectively extending the live range. For example, isn't it possible that the first pseudo register (the src of the copy) can be colored while the other can't? I also don't understand where all these copies come from (except possible as a result of the definition of live range Chatin uses..see above disscussion). The other obvious cases are silly copies generated by the code generator because it needs to preserve the original register contents. We solve this problem by a lot of special case code in the code generator so that it never generates useless copies. If we didn't do this we would get extra demand on the scratch register (remember that Chow partitions the register set) which could cause extra spills, but I think a smart peephole would be able to clean that up. It would seem that Chatin could so subsumtion after coloring (assuming I'm right that subsumption has no positive effect on colorability). >>If you have a zillion registers, >>you never spill with either method, so they're equal in that case. > >Chaitin won't spill, but Chow will in some cases. A very long live range, >with very few references, will be split and some portions spilled. >Additionally, the lack of coalescing in Chow causes different code (extra >copies). Finally, the coarser interference graph and inferior coloring >heuristic produce worse colorings. Practically (less than a zillion >registers), this means that Chow will spill when Chaitin can still color. If you had a zillion registers Chow won't spill. Chow does not spilt ranges until after coloring fails (Chow P.81). Chow claims to be able to deal with smaller than basic block granularity, by just splitting the basic block in sub blocks (see Chows thesis P.73). I certainly wouldn't defend this as a good solution. Chatin's fine-grained interference is clearly better. As previously discussed, as far as I know Chow doesn't need coalescing. In terms of working with a machine with few registers, I think Chatin has the advantage because he doesn't partition the register set. Chatin (or any post-generation coloring) allows you to look at the whole problem of allocation at once. The other alternative is McCusisk's variable partition, or Catells prepass to guess what scratch registers the codegen will need. Neither of these alternatives account for the possibility of different code sequences you can generate based on the availability of extra scratch registers (because Catell regenerates, he could use extra registers, but he has no cost analysis to determine whether to leave extra registers free). Note that this argument ignores the problem of constrained register assignment. The fact that the register usage is constrained (on the 86-family) is a problem no matter which technique you use. Our solution in the framework of Chow was twofold: first you must constrain each nodes to interfere with whatever fixed registers its range interferes with. This is no different that the solution proposed by Chatin. The other part is to give each range its "best" register. I didn't write the code, so I can't say how much of this we do, but clearly Chow has a slight advantage here because register are allocated in "maximum benefit" order (so the maximum benefit variable gets it first choice of registers). An obvious hack to Chatin would be to add an extra pass to reassign the colors in the same maximum benefit order. Or maybe you could always push nodes on the stack in spillcost order (even if they're unconstrained)..or..I'm not familiar enough with Chatin to really know. Even if you did this you still have to deal with two problems: do you assume you have an infinite number of registers in each constraint class, and if you do how do you correct the code when you have to spill (which on the 86/286 is often). Our reason for using Chow in the first place was because this problem seemed very difficult. On the 386 where the register set is small and the constraint problem isn't so bad, Chatin may well be a good choice. I belive GCC (Gnu) uses some kind of post-generation coloring (if they're really like Davidson- Fraser, then it's probably Frieberghouse's use count scheme). In the little fiddling I did with GCC it generated quite good code. From my point of view the advantage of Chatin (or any post-generation scheme) is in avoiding partitioning and giving more accurate cost analysis. The latter is because, the 386 being a CISC machine has complex addressing modes and alternate instruction sequences that demand different number of registers, so determining the benefit of register assigment by scanning il (as we do with Chows pre-generation scheme) make cost analysis difficult. I think the Chow .VS. Chatin could be summarized as: Chow: pre-generation(color on intermedite, partitioned register set) goal directed allocation uses range splitting Chatin: post-generation (color on instructions, uniform register usage) goal directed spilling. range is true live range, but no other splitting done. And the questions become: Which method has better information to do the cost analysis? Which method deals best with the "registers affect the code, code affects the registers" probem? When does spilling give sub-optimial results? What is the effect of range splitting? etc. - Bob Scheulen -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.