Path: utzoo!utgpu!watmath!att!ima!esegue!compilers-sender From: lee@sq.sq.com (Liam Quin) Newsgroups: comp.compilers Subject: dynamic optimisation -- summary Message-ID: <1989Nov28.031958.13486@esegue.segue.boston.ma.us> Date: 28 Nov 89 03:19:58 GMT Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: Liam Quin Organization: Compilers Central Lines: 163 Approved: compilers@esegue.segue.boston.ma.us I asked whether there were any compilers (esp. on Unix) which took into account statistics gathered from profiling runs when optimising programs. For example, if a program does the "else" branch of an if-else 98.7% of the time (on the test runs), it makes sense to re-order the code so that the extra jump happns only on the "if" branch. Again, registers can be assigned to the most heavily used registers. I also asked if anyone thought it would be worth adding to GNU gcc. Here are the replies I received (there were also requests for information): | From: Sam Midkiff | Organization: Center for Supercomputing Research and Development, | Univ. of Illinois | The only thing remotely related to this that I've heard of is some work done | at IBM TJ Watson, on the PTOOL project. They are (or are planning on) using | some timing information from actual runs to assist in scheduling code in | later runs. Note that PTOOL is a parallelizing compiler. If you are | interested I'll try and come up with a reference. | @@ | From: Ge' Weijers | A friend of mine is working on his masters thesis, and he has done some related | work for a language called CDL2. This language is open-ended, and about | the only thing the language contains itself is flow-of-control. It uses | a macro mechanism for everything else. e.g. the addition is defined as | FUNCTION add + >x + >y + z> = | "$L addl3 " x "," y "," z. | for a VAX. (the + means 'affix' or 'parameter', not addition) | The language system performs optimisations like inline substitution, | (user-indicated, or if the procedure is trivial or called only once) | code replication, global register allocation (and I mean GLOBAL, the whole | program) etc. The language is quite minimalist, but this makes exhaustive | optimisation possible. | The optimisation tried by this friend was to record which branches were taken | in a program, and (after a number of runs) to reorganise the program in such | a way that conditional branch instructions are most likely not to perform | a jump, but to continue directly with the following instruction. He was a | little disappointed to find that he could only gain about 10% at the most | with this kind of optimisation, and this for already highly optimised code. | For RISC processors using a one-instruction delay before jumping this | result might even be negative as on some processors a successful jump | is faster than an unsuccessful one if the delay slot can be filled with | a useful instruction. | It might be useful to measure which statements are evaluated most, and then | use these statistics to influence the generation of spill code in a | graph-colouring register allocator. I don't know of any research in this | direction though. (Usually the body of a loop is supposed to be evaluated | N times, which the register allocator then uses as the evaluation count). | I hope the stuff above has been of some use to you. | Ge' Weijers Internet/UUCP: ge@cs.kun.nl | Faculty of Mathematics and Computer Science, (uunet.uu.net!cs.kun.nl!ge) | University of Nijmegen, Toernooiveld 1 | 6525 ED Nijmegen, the Netherlands tel. +3180612483 (UTC-2) | @@ | From: uiucdcs!Princeton.EDU!mg (Michael Golan) | Specifically, I really wonder if anyone has an algorithm for | register allocation | under "not equal prbability" of branches (i.e. if-then-else). Note that the | info might be supplied by the programmer, not only actual runs, e.g. | if x<0 then unlikely do .... | else | if x=1.344 then v.unlikely do | else usually do .... | I have been playing with this idea for a while now, but I am only reading old | articles right now, trying to figure something no one did yet for a thesis. | Michael | @@ | From: "Saumya K. Debray" | See the paper by Wall in SIGPLAN-86. | @@ | From: Sam Midkiff | The only handy reference I have to the IBM work is: | Allen, Burke, Bytron, Ferrante, Hsieh, and Sarkar, "A framework for | determining useful parallelism," Inter. Conf. on Supercomputing, 1988. | I think all of the above are members of Fran Allen's compiler group at the IBM | T.J. Watson Research Ctr., Yorktown Heights, New York. I suspect if you | can get your hands on a copy of the above, it will have other references. | V. Sarkar seems to be the main person on the problem, but much of their | research there seems to be a group effort. | Hope it helps. | Sam Midkiff | midkiff@uicsrd.csrd.uiuc.edu | @@ | >From: chip%soi@harvard.harvard.edu (Chip Morris) | Our firm is currently working on a prototype implementation of Mike | Karr's thesis, "Code Generation By Coagulation" (see the 1984 | Compilers Conference proceedings for a summary paper) that uses | profiles as in integral part of compilation. We presently have a | code generator working from a simple intermediate language. It could | be adapted to C or Pascal front-ends. | Our code generator is set up to make virtually all of its optimization | decisions based on an incremental cost calculation that uses the | program's profile and detailed knowledge of the cost of the target | machine's instruction set. Register allocation, code ordering (to | minimize jumps), and storage management (e.g. access in memory or load | into register?) are a few of the issues handled by cost computation. | Our register allocator can do as well or better than a C programmer using | register declarations. We also have a variety of schemes, some | implemented, some in design, that reduce or eliminate penalties for | using procedure calls. | Our work is DARPA-funded and public domain, so anyone who is | interested should contact me and I'll put him/her on our mailing list | for technical reports and papers. | >-- | Chip Morris, Senior Engineer | Software Options, Inc., 22 Hilliard St., Cambridge MA 02138 (617) 497-5054 | chip%soi@harvard.harvard.edu or ...!harvard!soi!chip | @@ | From: "Saumya K. Debray" | [Liam wrote] | > I'm sorry, I don't have SIGPLAN here (whilst visiting Canada). | > I wonder if I could trouble you to tell me a little more about Wall's | > paper from SIGPLAN-86? | In Wall's system, the compiler doesn't do register allocation -- it | produces code annotated with information so that the linker can do | global register allocation. The linker also takes profile information | into account when doing register allocation. | > --skd Lee Liam R. Quin, Unixsys (UK) Ltd [note: not an employee of "sq" - a visitor!] lee@sq.com (Whilst visiting Canada from England, until Christmas) utai!anduk.uucp!lee (after Christmas) -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster. Brought to you by Super Global Mega Corp .com