Path: utzoo!attcan!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!cornell!uw-beaver!uw-june!pardo From: pardo@june.cs.washington.edu (David Keppel) Newsgroups: comp.lang.misc Subject: Re: Machine-independent intermediate languages Summary: 3 of them, the key is where/how you throw away info. Keywords: IR languages virtual machine Message-ID: <6004@june.cs.washington.edu> Date: 9 Oct 88 18:49:07 GMT References: <853@goofy.megatest.UUCP> <905@sword.bellcore.com> <5933@june.cs.washington.edu> <30151@oliveb.olivetti.com> <815@super.ORG> Reply-To: pardo@cs.washington.edu (David Keppel) Organization: U of Washington, Computer Science, Seattle Lines: 88 >[ if IR doesn't support your langauge's feature, it hurts ] ( This is a slight formalization of something I've "known" for a while but had never put together. I hope this is useful to everybody). My compilers prof introduced me to the ideas of "union IR" and "intersection IR". A union IR has primitive descriptions for every primitive in all of the languages that it is supposed to support. An intersection IR has primitives for only those primitives common to all languages. There are a couple things to get out of this: * The union...intersection IR describes a range of primitives to support langauges. These in turn define a language. * A machine-independent IR spec says nothing about the primitives supported by the machine. Thus you may be throwing out efficient primitives available on some machine. * A full union IR is hard to build and maintain and is probably quite inefficient. * An instersection IR requires mapping complex operators (e.g., the APL "transpose" operator) down to a concatentaion of simple primitives (e.g., function call, nested loops). Unless the simple operations can be re-recognized as the complex operation, many efficient compilation techniques cannot be used. Note that gcc, for instance, has 2 IR's: one is a machine-independent parse-tree construction, and one is a *very* machine dependent register transfer langauge. The parse-tree IR is very source-language dependent, while the machine-dependent IR depends very little on the source language. Thus we have (about) 3 languages (and thus 3 virtual machine specifications): source source -> machine-independent translator parse trees semantic analysis machine-independent -> machine dependent IR translator register transfer language optimization code generation In an ideal world, we create a machine-independent specification of a machine-dependent RTL. Note that GNU has chosen to ignore segment-register segmented architectures in thiers. We also need to define a machine-independent but language-*dependent* parse tree representation. * To write a new language you must create a new machine-dependent tree representation or bend your language to fit an existing one (e.g., you can probably parse Pascal on to a Modula-(n) representation pretty easily, but the C representaion would need to be extended to support nested procedures). The more information you need to pass on to the back end (e.g., value ranges for integer variables), the better a fit you need between the source and the parse language. You could probably wind up with one parse language for Ada, C, Modula, Pascal, but would have a different language for Lisp, Scheme, T, etc. * The source language to parse language translator and the semantic analyzer must be seperate for each language. * Translators from parse language to register transfer language can be shared among languages that use a common parse language. * A single port of the register transfer language ports all languages, independent of which (machine-independent) parse language is used. This discussion originally came up in the context of "I want a machine-independent `distribution' IR that can't recreate the source." Note that neither of the above languages (parse or RTL) fill this bill. The problem is that the parse language, by definition, contains enough information to recreate the original code. By the time you've gotten to the machine-generation language, you've thrown out a lot of this info, but (I assert, no proof is forthcoming) that you can only (safely, and without throwing away information needed for optimization) throw away information from the parse language in a machine-dependent way. I hope this is as enlightening to you as it was to me... ;-D on ( Machine-independent salad ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo