Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!mailrus!utah-gr!uplherc!esunix!bpendlet From: bpendlet@esunix.UUCP (Bob Pendleton) Newsgroups: comp.arch Subject: Re: Software Distribution Message-ID: <958@esunix.UUCP> Date: 26 Aug 88 16:29:14 GMT References: <1988Aug23.180420.28483@utzoo.uucp> Organization: Evans & Sutherland, Salt Lake City, Utah Lines: 118 From article <1988Aug23.180420.28483@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer): > In article <2793@hubcap.UUCP> mrspock@hubcap.UUCP (Steve Benz) writes: >>> Things like data-type sizes often have to be decided before Usually you can get away with specifying the radix of the data and the minimum number of digits required. Some times you need to specify the maximum number of digits as well. For example "short int x;" (a somewhat ambiguous declaration) can be translated into the portable form "x: static allocated signed binary min 16", or "char *name" can be represented as "name:stack allocated machine_pointer ASCII", or more loosely as "name:machine_pointer signed binary min 7 max 9". I think you can get the feel from these examples. The translator would translate declarations into constraints on the valid representations of the declared items. >>> the intermediate representation is generated, even if the details of the >>> code generation get deferred... >> >>I'm not sure what Henry talks about is really that big a problem, > > Is the layout of structs in memory decided before or after the intermediate > representation is generated? What about the results of "sizeof"? How is ^^^^^^ The layout of structs must be done by the machine specific code generator. NOT by the translator. "sizeof" becomes a symbolic expression that can be evaluated by the code generator, but not by the translator. In one system I wrote, all data size computations were done in the linker. Worked out very well. > "varargs" handled? And so forth. ^^^^^^^^^ Now that looks hard, for a minute. The general rule is that hardware dependent problems must be pushed through to the hardware dependent code generator. The machine indepenent code for a varargs call could look something like this: vararg_block code for arg 1 code for arg 2 . . . code for arg n vararg_end N call what_ever How did the translator find out it was a varargs call? By looking at the declaration of the procedure and/or the way it was used. It's important to remember that this intermediate language must be usable by ALL programming languages, not just C. > If you try to build a completely machine- > independent "intermediate" form, I think you will end up with something that > looks very much like a tokenized version of the source. This might or might > not be satisfactory for the original purposes, but an intermediate represen- > tation (in the usual sense of the word) it's not. Off the top of my head I can think of two different intermediate forms that could be used for this. Each include a symbol table, I hope you include a symbol table as part of your usual sense of the phrase "intermediate form." One is a simple reverse polish form of the source program. The operations can be generic like "+", and the operands can be indexes into the symbol table. This form can be converted directly to code or into a more "normal" intermediate form by symbolic execution of the RPN. The intermediate values generated by during symbolic execution can be constant values, registers, all sorts of things. By using faily complex patterns to decide how to "execute" an operator this approach can give you a surprisingly good quick and dirty code generator or a very machine specific intermediate form suitable for machine specific optimizations. Another possible intermediate form is good old quads. A quad specifies 2 operands (well... sometimes just 1), an operation and a destination. The operands and destinations can be other quads or variables. That is, quads are a way of representing a parse tree in a nice flat file. Actually, both of these are simple ways to represent parse trees in flat files. Both forms make it possible to recover the original parse tree. You can do machine independent optimization on both forms (though I think its easier with quads). You can also do machine independent linking of these forms. The problems are just not that big. I used to spend a lot of time thinking about this kind of thing. My senior project, oh these many years gone by, was the design of a language for writing machine independent LISP interpretors. I've looked very carefully at the PSL work at the University of Utah since I was there when a lot of it was being done. > -- > Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology > they just act that way. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu I just got back from Xhibition, someone from OSF said they are planning to establish a standard for a portable intermediate langauge. Nice to see that the market is finally growing up enough to need something like this. Imagine being able to buy a program, take it home and pop into the drive, wait a few minutes while a machine specific version is created from the machine independent version on the disk and then just use it. The only thing you have to worry about is wheather or not your machine has enough horse power to run the program well. Will it ever happen? I doubt it, but it sure would be nice. Bob P. -- Bob Pendleton @ Evans & Sutherland UUCP Address: {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet Alternate: utah-cs!esunix!bpendlet I am solely responsible for what I say.