Path: utzoo!utgpu!watmath!att!ima!esegue!compilers-sender
From: lee@sq.sq.com (Liam Quin)
Newsgroups: comp.compilers
Subject: dynamic optimisation -- summary
Message-ID: <1989Nov28.031958.13486@esegue.segue.boston.ma.us>
Date: 28 Nov 89 03:19:58 GMT
Sender: compilers-sender@esegue.segue.boston.ma.us
Reply-To: Liam Quin <lee@sq.sq.com>
Organization: Compilers Central
Lines: 163
Approved: compilers@esegue.segue.boston.ma.us

I asked whether there were any compilers (esp. on Unix) which took into
account statistics gathered from profiling runs when optimising programs.

For example, if a program does the "else" branch of an if-else 98.7% of the
time (on the test runs), it makes sense to re-order the code so that the
extra jump happns only on the "if" branch.  Again, registers can be assigned
to the most heavily used registers.
I also asked if anyone thought it would be worth adding to GNU gcc.

Here are the replies I received (there were also requests for information):

| From:	Sam Midkiff <uiucdcs!uicsrd.csrd.uiuc.edu!midkiff>
| Organization: Center for Supercomputing Research and Development,
| 		Univ. of Illinois


| The only thing remotely related to this that I've heard of is some work done
| at IBM TJ Watson, on the PTOOL project.  They are (or are planning on) using
| some timing information from actual runs to assist in scheduling code in
| later runs.  Note that PTOOL is a parallelizing compiler.  If you are
| interested I'll try and come up with a reference.

| @@

| From:	Ge' Weijers <uiucdcs!sci.kun.nl!ge>

| A friend of mine is working on his masters thesis, and he has done some related
| work for a language called CDL2. This language is open-ended, and about
| the only thing the language contains itself is flow-of-control. It uses
| a macro mechanism for everything else. e.g. the addition is defined as

| FUNCTION add + >x + >y + z> =
| 	"$L	addl3 " x "," y "," z.

| for a VAX. (the + means 'affix' or 'parameter', not addition)

| The language system performs optimisations like inline substitution,
| (user-indicated, or if the procedure is trivial or called only once)
| code replication, global register allocation (and I mean GLOBAL, the whole
| program) etc. The language is quite minimalist, but this makes exhaustive
| optimisation possible.

| The optimisation tried by this friend was to record which branches were taken
| in a program, and (after a number of runs) to reorganise the program in such
| a way that conditional branch instructions are most likely not to perform
| a jump, but to continue directly with the following instruction. He was a
| little disappointed to find that he could only gain about 10% at the most
| with this kind of optimisation, and this for already highly optimised code.

| For RISC processors using a one-instruction delay before jumping this
| result might even be negative as on some processors a successful jump
| is faster than an unsuccessful one if the delay slot can be filled with
| a useful instruction.

| It might be useful to measure which statements are evaluated most, and then
| use these statistics to influence the generation of spill code in a
| graph-colouring register allocator. I don't know of any research in this
| direction though. (Usually the body of a loop is supposed to be evaluated
| N times, which the register allocator then uses as the evaluation count).

| I hope the stuff above has been of some use to you.

| Ge' Weijers                                    Internet/UUCP: ge@cs.kun.nl
| Faculty of Mathematics and Computer Science,   (uunet.uu.net!cs.kun.nl!ge)
| University of Nijmegen, Toernooiveld 1         
| 6525 ED Nijmegen, the Netherlands              tel. +3180612483 (UTC-2)

| @@

| From:	uiucdcs!Princeton.EDU!mg (Michael Golan)
| Specifically, I really wonder if anyone has an algorithm for
| register allocation
| under "not equal prbability" of branches (i.e. if-then-else). Note that the
| info might be supplied by the programmer, not only actual runs, e.g.
| if x<0 then unlikely do ....
| else
| if x=1.344 then v.unlikely do 
| else usually do ....
| I have been playing with this idea for a while now, but I am only reading old
| articles right now, trying to figure something no one did yet for a thesis.
|  Michael

| @@
| From:	"Saumya K. Debray" <uiucdcs!cs.arizona.edu!debray>

| See the paper by Wall in SIGPLAN-86.


| @@
| From:	Sam Midkiff <uiucdcs!uicsrd.csrd.uiuc.edu!midkiff>

| The only handy reference I have to the IBM work is:

|   Allen, Burke, Bytron, Ferrante, Hsieh, and Sarkar, "A framework for
|   determining useful parallelism," Inter. Conf. on Supercomputing, 1988.

| I think all of the above are members of Fran Allen's compiler group at the IBM 
| T.J. Watson Research Ctr., Yorktown Heights, New York.  I suspect if you
| can get your hands on a copy of the above, it will have other references.

| V. Sarkar seems to be the main person on the problem, but much of their 
| research there seems to be a group effort.

| Hope it helps.

| Sam Midkiff
| midkiff@uicsrd.csrd.uiuc.edu

| @@
| >From: chip%soi@harvard.harvard.edu (Chip Morris)


| Our firm is currently working on a prototype implementation of Mike
| Karr's thesis, "Code Generation By Coagulation"  (see the 1984
| Compilers Conference proceedings for a summary paper) that uses
| profiles as in integral part of compilation.  We presently have a
| code generator working from a simple intermediate language.  It could
| be adapted to C or Pascal front-ends.

| Our code generator is set up to make virtually all of its optimization
| decisions based on an incremental cost calculation that uses the
| program's profile and detailed knowledge of the cost of the target
| machine's instruction set.  Register allocation, code ordering (to
| minimize jumps), and storage management (e.g. access in memory or load
| into register?) are a few of the issues handled by cost computation.
| Our register allocator can do as well or better than a C programmer using
| register declarations.  We also have a variety of schemes, some
| implemented, some in design, that reduce or eliminate penalties for
| using procedure calls.  

| Our work is DARPA-funded and public domain, so anyone who is
| interested should contact me and I'll put him/her on our mailing list
| for technical reports and papers.
| >--
| Chip Morris, Senior Engineer
| Software Options, Inc., 22 Hilliard St., Cambridge MA 02138  (617) 497-5054
| chip%soi@harvard.harvard.edu or  ...!harvard!soi!chip

| @@

| From:	"Saumya K. Debray" <uiucdcs!cs.arizona.edu!debray>

| [Liam wrote]
| > I'm sorry, I don't have SIGPLAN here (whilst visiting Canada).
| > I wonder if I could trouble you to tell me a little more about Wall's
| > paper from SIGPLAN-86?

| In Wall's system, the compiler doesn't do register allocation -- it
| produces code annotated with information so that the linker can do
| global register allocation.  The linker also takes profile information
| into account when doing register allocation.

| > --skd

Lee

Liam R. Quin, Unixsys (UK) Ltd [note: not an employee of "sq" - a visitor!]
lee@sq.com (Whilst visiting Canada from England, until Christmas)
utai!anduk.uucp!lee (after Christmas)
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.


Brought to you by Super Global Mega Corp .com