Path: utzoo!attcan!uunet!fernwood!decwrl!wuarchive!brutus.cs.uiuc.edu!uakari.primate.wisc.edu!uflorida!novavax!hcx1!tom
From: tom@ssd.harris.com (Tom Horsley)
Newsgroups: comp.std.c
Subject: Re: Macro names imbedded in pp-numbers [repost]
Message-ID: <TOM.89Nov20072452@hcx2.ssd.harris.com>
Date: 20 Nov 89 12:24:52 GMT
References: <11134@riks.csl.sony.co.jp> <31615@watmath.waterloo.edu>
	<11647@smoke.BRL.MIL>
Sender: news@hcx1.UUCP
Organization: Harris Computer Systems Division
Lines: 48

>As I recall the committee sentiment, it wasn't felt that this slightly
>over-generous glomming onto source characters for pp-numbers posed a
>serious practical problem, and it did drastically simplify that part
>of the preprocessor.  The trade-off seemed worthwhile.

I am sorry, I can't watch this discussion passively anymore. This is simply
wrong. I was one of the first to complain to the committee about this bug.
The reason I noticed it was that I was writing a tokenizing pre-processor as
the standard was under development. In my implementation, I did not find
*ANY* simplification that pp-numbers provided.

When you reach phase 7, you have to have the ability to lex only legal
numbers to determine if the conversion of a pp-token to a token is correct.
By requiring you to match illegal tokens in early phases, then in a later
phase determine if the token is actually legal, the scanner is considerably
*COMPLICATED*, NOT SIMPLIFIED! There are more states required to recognize
gibberish first, then legal numbers later than there would have been if you
only had to recognize legal numbers in the first place.

My proposed change to the standard called for a pp-token to be the longest
sequence of characters that would match a valid legal token prefix (or a
single cxharacter that does not match any legal token). This is
unambiguously defined in the standard and would have actually been a
simplification, since it would not require a separate definition of
pp-tokens and real tokens.

The committee response to this was that it would allow too much stuff that
appears to be gibberish lexically to actually be a legitimate C program. I
consider this to be the lamest excuse I have ever heard, after all, when
hasn't gibberish been legal C? And it is a particularly lame excuse when the
alternative the committee selected makes code that looks like perfectly
ordinary (formerly) legal C, illegal instead.

If the committee wants to justify this by saying that they were in a hurry
to get the standard out, they didn't notice the problem with pp-numbers
until it was too late and they would have had to do another round of public
review, delaying the standard again, and they didn't think the problem was
serious enough to take that hit, then I might agree, but for God's sake,
*DON'T* try to claim that it simplifies things...

(Of course the standard wound up being delayed by stupidity anyway, but
thats another story...)
--
=====================================================================
domain: tahorsley@ssd.csd.harris.com  USMail: Tom Horsley
  uucp: ...!novavax!hcx1!tahorsley            511 Kingbird Circle
      or  ...!uunet!hcx1!tahorsley            Delray Beach, FL  33444
======================== Aging: Just say no! ========================