Path: utzoo!attcan!uunet!husc6!mailrus!uflorida!novavax!hcx1!hcx2!tom From: tom@hcx2.SSD.HARRIS.COM Newsgroups: comp.std.c Subject: Re: 0x47e+barney not considered C Message-ID: <120200002@hcx2> Date: 1 Jul 88 12:21:00 GMT References: <120200001@hcx2> Lines: 134 Nf-ID: #R:hcx2:120200001:hcx2:120200002:000:6316 Nf-From: hcx2.SSD.HARRIS.COM!tom Jul 1 08:21:00 1988 jss@hector.UUCP writes: >Although I am not a member of the committee, I have seen many of >their working drafts and do not believe that there was ever any such >definition [longest valid prefix]. You are absolutely right, there never was, as I said, it was a rash assumption on my part because the area of "What is a token?" needed a definition and that was the only one I could imagine that made any sense. The area definitely needed defining, and preprocessing numbers are certainly better than the complete lack of definition that existed before. [Feel free to consider this an apology]. >The rule you suggest is a bad one because it does not provide an an >easy way to extend the syntax of numbers. For example, if I have an >implementation with a "long long" type I might want to allow 6LL as >an integer constant of that type. Under the "longest legal prefix" >rule I can't. It gets tokenized as "6L" "L". This does not make sense. If I am extending C with new features and I make LL a valid suffix, then I have changed the definition of what a valid token *is*, so I would not tokenize 6LL as 6L L, but as 6LL. Obviously, I may have problems if some twisted programmer somewhere has previously written code which tries to take advantage of the 6L L tokenization by making L a macro that expands to +5 or something, but I am not sure I care much, because on the other side of the coin, I can probably port your code that uses 6LL to my system by just defining L to the empty string. And gwyn@brl-smoke.UUCP writes: >In case you haven't received the official response document >yet (which is possible due to confusion between CBEMA and X3J11 as to >who was supposed to do what), Yeah, you're right, I haven't gotten it yet, although I did get a nice letter telling me that I haven't gotten it yet. >We were uncomfortable with preprocessing behavior that >could parse ``garbage'' into a sequence that contained an identifier, >which is then macro-replaced to form a ``sensible'' statement. I guess I just don't care what happens to ``garbage'' when the alternative definition in the standard turns ``sensible'' C into ``garbage''. >Why do you think it so important for "0x47e" to be considered a >preprocessing number token? Just what is it that needs "fixing"? >Is it that "0x47e" is supposed to be split into preprocessing tokens >"0" and "x47e" (the second of which may be subject to macro >replacement!) and in translation phase 7 they are not said to be >spliced back together into a single (regular) token, so that it is >impossible for an integer constant "0x47e" to ever be seen after >phase 6? If so, that does seem to me to be a problem, but it has >nothing to do with "+barney" or with the final "e" on the constant; Goodness gracious no. My proposal of longest valid prefix would parse 0x47e+barney into 0x47e + barney NOT 0 x47e + barney. The point is that the definition of preprocessing numbers calls 0x47e+barney a SINGLE token. This means it will be treated as a single unit all the way up until it is converted to a token. The standard says that behavior is undefined if a pp-token cannot be converted to a token, this (presumably) gives an implementation the right to convert this single pp-token "0x47e+barney" into the three tokens "0x47e" "+" "barney", but the major problem is that "barney" might have been a macro. It is now too late to expand it. Obviously, there is no reason to actually remove pp-numbers at this point in the evolution of the standard, it would be too big a change. But I do feel that the definition should be changed to allow something that is currently perfectly legal C to remain legal. The real problem is that is is actually fairly hard to write the grammar so that it works. For what it is worth here is an attempt: pp-hex-prefix: (the two chars that can start a hex number) 0 x pp-not-hex-prefix: (the things that can start other numbers) . digit digit . digit digit digit e sign digit E sign digit nondigit-except-x pp-number: (single digits, or hex or not-hex pp numbers) digit pp-hex-prefix pp-not-hex-prefix pp-hex-prefix pp-hex-suffix pp-not-hex-prefix pp-not-hex-suffix pp-not-hex-suffix: digit . digit pp-number digit pp-number nondigit pp-number e sign pp-number E sign pp-number . pp-hex-suffix: digit pp-number digit pp-number nondigit I think that does it, but I can't be sure. A pp-number is now a number that starts with "0x" and is followed by any number of digits or letters, or it is a number that starts with something other than "0x" and contains all the stuff the current pp-number definition has (decimal points, e+, E+, e-, E-, letters). With this definition 0x47e+barney will now parse as 0x47e + barney and 6LL will still parse as 6LL. It sure seems like it ought to be possible to simplify this grammar. Do I hear any alternate definitions? Or should the committee just leave the grammar the way it is and stick in some language about leading 0x not allowing the e+ e- stuff? Also I took out '.' as long as I was splitting the definition by prefix, but if someone has a good reason to leave '.'s in hex pp-numbers I don't much care. It is the e+ that causes all the trouble. I would really like this fixed for the final standard and something that could be considered an editorial change is probably the only thing that stands a chance. P.S. I am glad to see that stuff I post can actually make it out to the net. ===================================================================== usenet: tahorsley@ssd.harris.com USMail: Tom Horsley compuserve: 76505,364 511 Kingbird Circle genie: T.HORSLEY Delray Beach, FL 33444 ======================== Aging: Just say no! ========================