Path: utzoo!attcan!uunet!lll-winken!ames!pasteur!ucbvax!decwrl!decvax!ima!compilers-sender From: think!compass!worley@EDDIE.MIT.EDU (Dale Worley) Newsgroups: comp.compilers Subject: Why can't we build a C compiler? Message-ID: <3149@ima.ima.isc.com> Date: 19 Dec 88 16:52:19 GMT Sender: compilers-sender@ima.ima.isc.com Reply-To: think!compass!worley@EDDIE.MIT.EDU (Dale Worley) Lines: 82 Approved: compilers@ima.UUCP I think that part of the problem is the C is not all that well defined. There are numerous tricky spots in the language that (until the advent of ANSI C) were not standardized. Consider the different tricks that were used to concatenate and stringize tokens. These were so horrible that the ANSI committee eliminated them entirely and invented the # and ## operators out of the whole cloth. Another cute one is the following: typedef int x; struct x { x x; int counter; }; Is this legal? According to Harbison and Steele, "structure and union field names are in a different overloading class than objects and typedef names". As I read this, it means that the fourth occurrence of "x" above is legitimate. Am I right? Who knows? Again, the requirement that declarations govern not the entire containing block, but only the portion of the block following the declaration leads to tricky points. In Ada this convention led to many paragraphs defining exactly where the declared object became accessible. In C the question was simply ignored. Etc. etc. Compounding this is that the de-facto standard, Kernigan and Ritchie, is not written as a genuine language reference manual, but a tutorial. In practice, C has been defined by "what the compiler does", which leads to numerous ambiguities and inconsistencies. Compare the definition of C to that of Algol 68 -- In Algol 68 the compilation task is relentlessly well-defined, although it's not always clear that it's *possible*. But a compiler for a language of C's complexity defined with the formality of Algol 68 would be a class project, not an engineering feat. Finally, if after hundreds of attempts we can't build a little 10,000 line utility for ourselves why in the world do we think we can build all the programs we work on every day? We are certainly kidding the folks that pay us and we're also doing a pretty good job of kidding ourselves. Actually, the poor customer is doing OK -- He has to get the work out the door, and a compiler that is 99.5% correct is far more useful to him than none at all. But this leads to a concept: Not only should we design a language to be easily comprehensible to the user, but also easily comprehensible to the compiler. (These goals should be synergistic, since things that are hard to parse are likely to be hard to read as well.) Some guidelines are: Similar-looking tokens should not have different gramatical uses depending on how they are declared. Examples are C identifiers (typedef names and objects) and Algol 68 bold-words (mode names and operators). A declaration should be effective throughout the entire block in which it appears, rather than starting at the point of declaration. This makes it impossible to write a one-pass compiler, but it simplifies the definition of the language semantics, and makes it *far* easier to formalize the semantics. Avoid features that can only be defined at the lower levels of abstraction (e.g., tokenization, parsing). For instance, the C preprocessor is *impossible* to define except as a pre-pass before parsing. This makes it hard to build, e.g., an incremental compiler for C. (Unfortunately, the preprocessor is really great for making code portable. There is something that should be studied here...) Dale -- Not, of course, the opinions of my employer. Dale Worley, Compass, Inc. mit-eddie!think!compass!worley -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { decvax | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request