Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site sftig.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!mhuxm!sftig!lr From: lr@sftig.UUCP (L.Rosler) Newsgroups: net.lang.c Subject: Re: Is this correct action for the c compiler/preprocessor ?? Message-ID: <621@sftig.UUCP> Date: Sat, 16-Nov-85 00:27:37 EST Article-I.D.: sftig.621 Posted: Sat Nov 16 00:27:37 1985 Date-Received: Sat, 16-Nov-85 09:29:08 EST References: <2667@brl-tgr.ARPA> <689@ucsfcgl.UUCP> <198@opus.UUCP> <776@cyb-eng.UUCP> Organization: AT&T Bell Laboratories, Summit, NJ Lines: 93 > The question was whether the C preprocessor should substitute for an > occurrence of a macro formal within a string within the body of the > macro... > > > Being able to insert literal text in strings is very useful. > > The fact that a feature is "useful" is not sufficient argument that it is > correct. > > the definition that most of > us use these days (K&R) says one thing: > Text inside a string or a character constant is not subject to > replacement. > ...which is pretty explicit, but the compiler that a lot of us use > substitutes inside strings. I would like to have an authoritative > definition and a correct compiler in accord with the definition. > -- > Dick Dunn Having been involved in many aspects of this fiasco, I'll give a capsule history. The original C preprocessor, designed and implemented by Mike Lesk of AT&T Bell Labs for the PDP-11, did not substitute inside strings (hence, the disclaimer in K&R). The preprocessor distributed with VAX UN*X, hence picked up by UCBerkeley, was implemented by John Reiser. In addition to being much faster than the original, it included many "features" which were documented only in a file /usr/src/cmd/cpp/README, dated August 25, 1978 (after the publication of K&R). The file is still there, though updated -- look and see! Among the features included without a great deal of review were the "magic disappearing comment" used to glue tokens together (despite K&R p. 179 "...comments...serve to separate tokens") and the issue at hand of substituting within strings (and character constants, for that matter, though no one seems to pay much attention to this part of the issue). The only justification for the latter seems to be K&R p. 207: "Each occurrence of an identifier mentioned in the formal parameter list of the definition is replaced by the corresponding token string from the call. When I championed these features before the ANSI X3J11 C Committee (most of whom had implemented a preprocessor according to the K&R description, not the UN*X code), I first had to convince the Committee that they were useful. Several UN*X headers and Alan Feuer's "The C Puzzle Book" helped here. But I could not convince the Committee that the way the features were implemented was acceptable, despite the tons of code that incorporated them. Reliance on undocumented (what README file?!?) capabilities of a particular implementation which contravened the clear sense of the de facto standard did not fall under the purview of the Committee's goal of not breaking existing "valid" code. Several syntaxes were proposed, some of which were as simple to implement as a new directive "#defines," meaning in THIS macro, substitute for identifiers inside strings. But they all foundered on the simple point that there ARE no identifiers inside strings! Strings and identifiers are each "tokens," and writing a grammar to parse strings into tokens was considered too outrageous. (Note that "tokens" can turn up in surprising places: #define PRINT(s) printf("%s", s) produces remarkable results on UN*X compilers.) So the Committee resorted to invention: # identifier meaning "stringize" the argument token-string substituted for the identifier; and token1 ## token2 meaning concatenate the two tokens nearest the ## after all other substitutions. The latter will be easy to substitute mechanically for /**/, but the former will require some work. Each of them has some advantages over the UN*X way, not the least of which is that they don't do violence to the rest of the language. Even though I'm not happy with the idea of standards committees inventing solutions that invalidate existing solutions, I buy into this case. As Henry Spencer warns, don't use the UN*X features, and wait for the ANSI Standard to provide better ways. Sorry to be so long-winded, but this history HAD to be told. Larry Rosler, AT&T Information Systems (Editor, ANSI X3J11 C STandards Committee) ihnp4!attunix!lr, 201-522-5086