Path: utzoo!attcan!uunet!husc6!mailrus!uflorida!novavax!hcx1!hcx2!tom
From: tom@hcx2.SSD.HARRIS.COM
Newsgroups: comp.std.c
Subject: Re: 0x47e+barney not considered C
Message-ID: <120200002@hcx2>
Date: 1 Jul 88 12:21:00 GMT
References: <120200001@hcx2>
Lines: 134
Nf-ID: #R:hcx2:120200001:hcx2:120200002:000:6316
Nf-From: hcx2.SSD.HARRIS.COM!tom    Jul  1 08:21:00 1988


jss@hector.UUCP writes:

>Although I am not a member of the committee, I have seen many of
>their working drafts and do not believe that there was ever any such
>definition [longest valid prefix].

You are absolutely right, there never was, as I said, it was a rash
assumption on my part because the area of "What is a token?" needed a
definition and that was the only one I could imagine that made any
sense. The area definitely needed defining, and preprocessing numbers
are certainly better than the complete lack of definition that existed
before. [Feel free to consider this an apology].

>The rule you suggest is a bad one because it does not provide an an
>easy way to extend the syntax of numbers.  For example, if I have an
>implementation with a "long long" type  I might want to allow 6LL as
>an integer constant of that type. Under the "longest legal prefix"
>rule I can't. It gets tokenized as "6L" "L".  

This does not make sense. If I am extending C with new features and I
make LL a valid suffix, then I have changed the definition of what a
valid token *is*, so I would not tokenize 6LL as 6L L, but as 6LL.
Obviously, I may have problems if some twisted programmer somewhere
has previously written code which tries to take advantage of the 6L L
tokenization by making L a macro that expands to +5 or something, but
I am not sure I care much, because on the other side of the coin, I
can probably port your code that uses 6LL to my system by just
defining L to the empty string.

And gwyn@brl-smoke.UUCP writes:

>In case you haven't received the official response document
>yet (which is possible due to confusion between CBEMA and X3J11 as to
>who was supposed to do what),

Yeah, you're right, I haven't gotten it yet, although I did get a nice
letter telling me that I haven't gotten it yet.

>We were uncomfortable with preprocessing behavior that
>could parse ``garbage'' into a sequence that contained an identifier,
>which is then macro-replaced to form a ``sensible'' statement.

I guess I just don't care what happens to ``garbage'' when the
alternative definition in the standard turns ``sensible'' C into
``garbage''.

>Why do you think it so important for "0x47e" to be considered a
>preprocessing number token?  Just what is it that needs "fixing"?
>Is it that "0x47e" is supposed to be split into preprocessing tokens
>"0" and "x47e" (the second of which may be subject to macro
>replacement!) and in translation phase 7 they are not said to be
>spliced back together into a single (regular) token, so that it is
>impossible for an integer constant "0x47e" to ever be seen after
>phase 6?  If so, that does seem to me to be a problem, but it has
>nothing to do with "+barney" or with the final "e" on the constant;

Goodness gracious no. My proposal of longest valid prefix would parse
0x47e+barney into 0x47e + barney NOT 0 x47e + barney. The point is
that the definition of preprocessing numbers calls 0x47e+barney a
SINGLE token. This means it will be treated as a single unit all the
way up until it is converted to a token.  The standard says that
behavior is undefined if a pp-token cannot be converted to a token,
this (presumably) gives an implementation the right to convert this
single pp-token "0x47e+barney" into the three tokens "0x47e" "+"
"barney", but the major problem is that "barney" might have been a
macro. It is now too late to expand it.

Obviously, there is no reason to actually remove pp-numbers at this
point in the evolution of the standard, it would be too big a change.
But I do feel that the definition should be changed to allow something
that is currently perfectly legal C to remain legal. The real problem
is that is is actually fairly hard to write the grammar so that it
works. For what it is worth here is an attempt:

           pp-hex-prefix:       (the two chars that can start a hex number)
                   0 x
                   
           pp-not-hex-prefix:   (the things that can start other numbers)
                   . digit
                   digit .
                   digit digit
                   digit e sign
                   digit E sign
                   digit nondigit-except-x
                   
           pp-number:           (single digits, or hex or not-hex pp numbers)
                   digit
                   pp-hex-prefix
                   pp-not-hex-prefix
                   pp-hex-prefix pp-hex-suffix
                   pp-not-hex-prefix pp-not-hex-suffix
                   
           pp-not-hex-suffix:
                   digit
                   . digit
                   pp-number digit
                   pp-number nondigit
                   pp-number e sign
                   pp-number E sign
                   pp-number .
           
           pp-hex-suffix:
                   digit
                   pp-number digit
                   pp-number nondigit

I think that does it, but I can't be sure. A pp-number is now a number
that starts with "0x" and is followed by any number of digits or
letters, or it is a number that starts with something other than "0x"
and contains all the stuff the current pp-number definition has
(decimal points, e+, E+, e-, E-, letters).

With this definition 0x47e+barney will now parse as 0x47e + barney and
6LL will still parse as 6LL.

It sure seems like it ought to be possible to simplify this grammar.
Do I hear any alternate definitions? Or should the committee just
leave the grammar the way it is and stick in some language about
leading 0x not allowing the e+ e- stuff?  Also I took out '.' as long
as I was splitting the definition by prefix, but if someone has a good
reason to leave '.'s in hex pp-numbers I don't much care. It is the e+
that causes all the trouble. I would really like this fixed for the
final standard and something that could be considered an editorial
change is probably the only thing that stands a chance.

P.S. I am glad to see that stuff I post can actually make it out to
the net.

=====================================================================
    usenet: tahorsley@ssd.harris.com  USMail: Tom Horsley
compuserve: 76505,364                         511 Kingbird Circle
     genie: T.HORSLEY                         Delray Beach, FL  33444
======================== Aging: Just say no! ========================