Path: utzoo!attcan!uunet!lll-winken!ames!mailrus!csd4.milw.wisc.edu!uxc!uxc.cso.uiuc.edu!uxg.cso.uiuc.edu!uicbert.eecs.uic.edu!wilson From: wilson@uicbert.eecs.uic.edu Newsgroups: comp.lang.smalltalk Subject: dynamic compilation, tentative opti Message-ID: <67500004@uicbert.eecs.uic.edu> Date: 13 Jan 89 01:49:00 GMT Lines: 216 Nf-ID: #N:uicbert.eecs.uic.edu:67500004:000:10312 Nf-From: uicbert.eecs.uic.edu!wilson Jan 12 19:49:00 1989 I am looking for references to advanced techniques in dynamic compilation and tentative optimization. (Is anybody out there working on such things? Is anybody doing anything beyond the Deutsch-Schiffman compiled-method caching and inline caching of method lookups?) In particular, I'm interested in systems that compile specific *versions* of methods/procedures that handle polymorphic arguments. (Especially systems that keep more than one version around, and systems that proceed to do interesting optimizations based on the particular combinations of types that it actually encounters.) The closest thing I've found to what I'm looking for is a paper by E.J. Van Dyke in the Hewlett-Packard Journal in July 1977 ("A Dynamic Incremental Compiler for an Interpretive Language," pp. 17-24). Van Dyke describes Hewlett-Packard's APL/3000, which compiles each procedure the first time it encounters it. It initially compiles a very specific version, which assumes that the arguments will be of exactly the same type next time (e.g., that array bounds will be the same). The The compiled code contains a short prologue of "signature code" that checks to see if the assumptions are correct. If not, the code "breaks" and calls the compiler to recompile it. It is then recompiled with fewer assumptions (e.g., array type but not array bounds). I posted my own ideas for dynamic compilation and tentative optimization on the Scheme list a while back, which is how I got referred to Van Dyke. I've tagged those postings onto the end of this one for anyone who's interested. (They contain some basic references to dynamic compilation articles. I'm really looking for more advanced techniques. The big question for me is what has happened in tentative optimization in the last ten years.) I'm told T.C.Miller's 1978 thesis from Yale is also relevant, but have not received it from Yale yet. It's called "Tentative Compilation: a Design for an APL Compiler" (Yale U. DCS 78-CS-013, May 1978.) ------------------------------------- First posting: ------------------------------------- I've been thinking about using dynamic compilation for Scheme, and wondering if anybody else is. Have people pretty much abandoned the idea of dynamic compilation, and if so, was it premature? Some of the advantages might be: 1) portable images could still run fast 2) an optimizer could be called in to optimize those chunks of code that are ACTUALLY executed a bunch of times. (rather than just optimizing the time/space tradeoff between interpreted and compiled code, it could be used to adjust compile time/run time tradeoffs.) 3) Transparent, automatic compilation and optimization. In particular, I'm thinking about a scheme in which inlining would be done automatically, but dependencies would be recorded. If somebody changed the value of an inlined procedure or constant, the code depending on those values would be invalidated and dynamically recompiled at runtime. (The mechanism for this depends on near-lexical-scoping. Procedures that are never redefined in a lexically apparent way could be inlined. Primitives that can modify things (like first-class environments) in non-lexically-apparent ways would have to check for dependencies.) Anybody care to comment on this idea? In particular, is there some irreducible cost of dynamic compilation that I don't know about? Any advantages? (It seems to me that you might dynamically determine what types of arguments frequently-executed procedures were frequently called with. You could then compile special versions for the common combinations. A large-grained check would dispatch to the proper version, within which most type checking could be eliminated using type inference. Would this be workable/worthwhile?) ---------------------------------------- second posting: ---------------------------------------- There seems to be some interest out there, so here's a bit more detail on those dynamic compilation ideas: * When a chunk of code is dynamically compiled (not optimized yet), it is given a little prologue that increments a count every time it is executed. When the count exceeds a threshold, the code is recompiled with a much higher level of optimization. The most optimized code would not have this prologue to slow it down, since it wouldn't need it anymore. Highly optimized code might be less subject to being discarded as well. * The business about noticing common types of arguments and compiling specially optimized versions could be handled similarly. A little prologue would check every nth execution for types, rather than every execution. Most times, it would just increment a counter. Very highly optimized code could forego the checking and counting entirely, assuming it had already learned all of the likely argument type combinations. (Actually, every nth time it would change n a little to avoid rhythmic problems.) * Something like this might also be useful for trace scheduling. Every now and then you execute an "instrumented" version of the code that collects data on common execution patterns. Any comments? Does anybody know of any systems that use dynamic compilation? I know a couple of Smalltalks do it, and now I'm told that MIT Scheme has a somewhat similar cache-with- dependencies mechanism for variable lookups. Does anybody have any opinions on the expected efficiency of such a system? It seems to me that it could be quite good. It could also be advantageous in that it would not optimize a lot of the code. In particular, it could keep things in a more debuggable form than something that optimizes more stupidly, without much performance penalty. (Especially since frequently modified things will never get heaviliy optimized.) I only have a few references on dynamic compilation, none of which go into anything advanced. If you have others, please send them to me and I'll pass them on to interested parties. (Besides the ones in Andy Gordon's posting, there was "An Evaluation of Throw-away Compiling" in Software Practice and Experience, vol. 13 (1983), pp. 241-249.) [ NOTE: here are the citations from Andy Gordon's posting: [1] A. D. Gordon, "How to Breed Hybrid Compiler/Interpreters", Technical report ECS--LFCS--88--50, Laboratory for the Foundations of Computer Science, Computer Science Dept., University of Edinburgh, Edinburgh, EH9 3JZ, Scotland. April, 1988. [2] P. J. Brown, "Throw-away compiling", Software P & E, 6, pages 423--434, 1976. [3] J.R. Low, "Automatic data structure selection: an example and overview," CACM, 21(5), pages 376--385, May 1978. [4] R. J. Dakin and P. C. Poole, "A mixed code approach", The Computer Journal, 16(3), pages 219--222, 1973. ] I have not gotten any responses indicating that dynamic compilation has been used to control levels of optimization or to recompile when optimizer assumptions are violated. [ NOTE: after I posted this originally, I did find the two references mentioned above, thanks to Greg Jaxon at U. of Illinois. ] --------------------------------------- third posting: --------------------------------------- To motivate the discussion of dynamic optimization, here's a simple example of why I think it might be worthwhile. Suppose you have a language in which you can have homogeneous collections like array-of-integer. (Or an implementation that lets you declare them.) Suppose also that the language is basically dynamically typed, and that the declared types are attached to the objects at runtime. So an array of integer would have a header saying so. (That is, an array-of-integer will always contain only integers, but you can't look at a particular point in the source code and tell what class(es) an argument could belong to. So some objects are "statically" typed in that their components don't change types, but the code is not amenable to static type analysis -- you can always call a procedure with any kind of argument you want. If types can be created at runtime, then the set of possible types at many points in the code could be infinite.) Now suppose we define a procedure that processes collections, looping to apply some operation to its elements. A dynamic optimizing system would generate a generic version of the code that checks the types of the arguments and then dispatches to an optimized version for those types. If it has never seen that combination of types before, a new version is created on the fly. So the first time this generic code sees an array-of-integer, it compiles a version of the looping routine that assumes each element is an integer, omitting tag checks, unrolling the loop, etc. The generic code must always check the collection's type when it is executed, but the compiled code it dispatches to needn't. (If it's not a homogeneous collection, it simply dispatches to a normal version with type checking.) The nice thing about this is that the code is still generic from the programmer's point of view, and type checking is not abandoned. (Unlike systems in which declarations tell the compiler to make assumptions at a particular point in the code.) My guess is that for most programs, most routines are only ever given one kind of argument, so there wouldn't be an explosion of versions. (You can always use a cache and throw away infrequently used versions.) At least for this example, the optimization would be transparent and safe. If somebody changes (at another point in the code) the type of some object that is passed to the optimized routine, the dynamic check will ensure that an appropriate version is generated. Now the question is how general this system could be, using more sophisticated type inference than recognizing homogeneous collections. And can it be combined with nifty trace-scheduling techniques to precompute paths through often-executed chunks of code and seriously optimize them? ----------------------------------------------------------------------- Paul R. Wilson Human-Computer Interaction Laboratory U. of Illin. at C. EECS Dept. (M/C 154) wilson@uicbert.eecs.uic.edu Box 4348 Chicago,IL 60680