Path: utzoo!attcan!uunet!wuarchive!usc!ucsd!rutgers!psuvax1!husc6!spdcc!ima!esegue!compilers-sender From: burley@world.std.com (James C Burley) Newsgroups: comp.compilers Subject: Re: Aggressive optimization Keywords: optimize, question Message-ID: Date: 20 Oct 90 07:16:29 GMT References: <1458@exodus.Eng.Sun.COM> <13405:Oct1800:22:5690@kramden.acf.nyu.edu> <2301@wn1.sci.kun.nl> <1990Oct18.223231.22315@esegue.segue.boston.ma.us> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: burley@world.std.com (James C Burley) Organization: The World Lines: 156 Approved: compilers@esegue.segue.boston.ma.us In-Reply-To: brnstnd@kramden.acf.nyu.edu's message of 18 Oct 90 17:18:00 GMT In article <1990Oct18.223231.22315@esegue.segue.boston.ma.us> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: Does anyone have a compiler that can introduce a non-linear intermediate expression and reduce around it? If so, I'm impressed. How advanced is the symbolic algebra system included in the compiler? Can it take advantage of range constraints, so that if it knows that x is between 0 and 7, it might set up a table to calculate ((1<<(x+2))-1)/10 quickly? Can it manipulate floor and ceiling expressions? Can it introduce invariants to figure out range constraints? These are all part of that single, fundamental optimization technique. I know, compilers are improving. Some optimizers fully automate the dullest, most machine-dependent part of optimization---viz., figuring out how often loops or branches are executed in practice, seeing how long machine instructions take, and deciding on that basis whether a given optimization is worthwhile. I really appreciate that. I won't stop hand optimization because of it. I'm glad to see someone remind us of these issues. Machine-implemented optimizations are desirable for handling cases we've decided are not only straightforward to implement via a set of rules (however complicated the code might get), but (and this point often is overlooked) take all their "inputs" from the source code ALONE (though some aggressive optimizations also take "input" from the "omniscient", and sometimes incorrect, person who wrote the optimization, as in "assume this particular case never happens, even though it is valid in the language", and which typically can be selectively disabled via compiler switches). For example, as Dan suggests above, a compiler that performs range analysis on data sets (not even just variables, but things like "X never exceeds Y by more than 50 and is never less than Y" by looking at the source code) is a desirable thing indeed, but even if it existed it wouldn't be enough. Why? Because in a case where X and Y are input from the outside (a file or the user), there may be constraints on them that are not expressed in the source code (and for which such expression is difficult or impossible in the source language -- C's "assert" might be an example of how to do it). (Whether the code should check for violations of the constraints and produce error messages, however, is entirely a "user interface" issue, ultimately: consider reading a file produced by a companion program, for example, where the constraints may already have been checked. When compiling the program reading the file, one cannot assume the compiler could also read the program writing the file to derive the constraints: not only is it hard, but in the real world, that program's source code may not be available! And it's executable image may be unreadable also, so forget about disassembling!) What I think is really needed is not only a compiler (here I refer to the total package of translating, including linking and debugging) that can recognize and deal with the kinds of optimizations Dan talks about, but that is easy to interact with, converse with, and to which the programmer can easily explain other constraints and point out other optimizations worth trying. I've been imagining such a beast for many years and hope to start actually building one, so I've done some thinking about it. It would be very important to make such interaction easy, so I think the interface would have to be as much easier to deal with compare to, say, the Macintosh GUI as that GUI is compared to JCL. That is, the interface must be capable of understanding the concepts behind presenting information -- avoiding overloading, having a variety of methods of presentation (graphical, text, combinations, and refinements thereof) available and some ability to choose the appropriate method for displaying a given set of data, and so on. Some of the data it might display would include that traditionally thought of, i.e. sample input data for the program and/or the various incarnations it has as it goes through transformations within the program. But the compiler would also need to be able to display various interpretations of the program itself: a representation of the data flow through a given portion of code, a picture of where various key data sets are kept in memory (global, local, register; virtual memory, wired/locked memory, or fast static RAM for a system that can exploit such differences; whatever), which implementation model is chosen for the optimized version of a given module, and so on. Various views of the same code/data could be selected by the programmer, but more importantly, the programmer would be able to interact with these views to refine or annotate the information. Refining the information might be useful when the optimizer picks a general optimization but has at its disposal several refined versions of that optimization, each of which goes faster than the general version. The programmer could provide a direct suggestion that a particular refinement is or should be valid, and the system, without having had to search all the possible refinements itself, could verify (or trust) that the chosen refinement is indeed correct. Annotating the information is useful when the programmer wants to note that, for example, "WORKARRAY should not be in global virtual memory" while working through the allocation views of the data of the program, and this annotation is forwarded to other views (such as the source code) so, later, the programmer can easily find operations on the pertinent item that might explain how it ended up that way. The annotation might even be active (vs. a passive "comment") in that it might "yell" at the programmer when a change he/she makes satisfies its constraint: for example, after making a change, a message might be sent saying "Congratulations! WORKARRAY is now in fast static RAM". Further, this system would keep information on the program in a persistent fashion so the information would not have to be entered again and again. To do that, it would be best for the system to serve as the "source editor", and be capable of recognizing when its current categorizations of a given chunk of code (at whatever level of granularity) were no longer applicable due to changes in the code, and forwarding information on changes to other chunks of code that held a "vested interest" in the changed code. Finally, this system would not spring forward whole from someone's forehead; it would have to have an underlying kernel language on which the system as a whole could be incrementally built. In fact, I don't envision the kernel as a compiler or anything in particular; it would more resemble something like a hybrid of EMACS and Smalltalk, but very stripped down and carefully planned. A library of often-used components involving compilation, optimization, editing, and such would become ubiquitous just as has happened with those two systems, and this library would grow. But many programmers would wish to, and could easily, extend the system as desired to take into account their own special needs (target hardware, programming languages, and so on). All of the source code of the system should thus be made available to anyone who wants it (a la the Free Software Foundation's GNU Public License). The output of the system? Whatever is needed: assembly, C, C++, are examples, but I don't see any limitations here. In fact, one could possibly output final executables when doing global compilation and optimization, meaning that some pretty neat optimizations some of us wish we could stick in the linker we're using would be doable, given that we'd "rewritten" the linker to run under this proposed system. I think this approach may be one of the more fundamental changes in optimization technology over the next decade or so: not because it introduces any sexy new optimizations per se, but because it permits programmers to easily pick apart their own creations and put them back together, without having to rewrite the source code if desired. Such a system might allow the industry to move back to powerful languages that "hide" potent but often expensive operations in innocuous-looking constructs (a problem with lanaguges like PL/I that C lovers like to point out, and which is valid on a practical level), since the expense would be easily discoverable during the "optimization discovery" process offered by the proposed system. I won't get into how such a system might also provide an easier means for debugging systems (even in "optimized" mode), writing source code (harking back to the "what if we had a language of languages?" thread we had earlier), or improving verifiability. As with optimization, the system itself would break no new ground in these areas, but should provide a more fertile field for implementing and discovering improvements in them. After I get through with GNU Fortran, I'd like to start trying to build such a system, so I hope that if anyone knows why it can't work, they'll let me know before then! (-: (Seriously, either this is a great idea that I have to do someday, or there's something for me to learn from its failure, either way I decided pursuing this and other ideas was more important than drawing a full-time paycheck and working on the same old thing, which is why I left my job over a year ago...so is this idea worth pursuing as one of my next projects? After watching the newsgroups for a while, this seems to be a good one to ask!) James Craig Burley, Software Craftsperson burley@world.std.com -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.