Path: utzoo!utgpu!watmath!clyde!ima!haddock!karl From: karl@haddock.ima.isc.com (Karl Heuer) Newsgroups: comp.std.c Subject: Re: ANSI C token set (including $ and @) Keywords: ANSI C, token set Message-ID: <11383@haddock.ima.isc.com> Date: 10 Jan 89 19:18:15 GMT References: <11343@haddock.ima.isc.com> <1858@zell.cs.vu.nl> Reply-To: karl@haddock.ima.isc.com (Karl Heuer) Organization: Interactive Systems, Boston Lines: 70 In article <1858@zell.cs.vu.nl> leendert@cs.vu.nl () writes: >In article <11343@haddock.ima.isc.com> karl@haddock.ima.isc.com writes: >> Let's see if I've got this straight yet. >> >>o `$' is required to scan as a separate pp-token, despite existing practice >> making it an optional identifier-character. > >Yes. The syntax of an identifier is [the pattern /[_a-zA-Z][_a-zA-Z0-9]*/]. > >Whether the '$' should be scanned as a separate pp-token depends on the source >character set. In the environment I'm thinking of, `$' should be legal in strings (where it represents the same symbol in the execution character set), hence it must be a member of the source character set, and by 3.1 it scans as a pp-token. >>o Hence, certain features of DEC and APOLLO implementations cannot be >> conforming. > >I don't know about DEC or APOLLO, but if they allow things like described >above their implementations are not strictly conforming (perhaps there is >a flag -pendatic as with the GNU C compiler ?). `Strictly conforming' is an attribute of programs, not implementations. An implementation is either ANSI C, or it isn't. According to the rules, accepting `$' in an identifier seems to yield a non-ANSI implementation. >>o DEC and APOLLO, through their representatives on X3J11, are aware of the >> above and accept it. Their ANSI C implementations, if any, will not use >> `$' in identifiers. > >Depends on there policy. They are free to add features. Perhaps they will >make a flag (if $ is the only nonconforming aspect). Hmm, assuming they do, I wonder if they'll follow Doug's suggestion of turning off __STDC__ whenever `$' is enabled. >>o Non-English letters, which are clearly not usable in a strictly conforming >> program, are in fact not usable in *any* conforming program, for the same >> reasons that apply to `$'. > >The basic source set, the set in which source files are written, does not >contain $, umlaut, accent grave, etc. The strings however, may contains these >characters (depending on the size of the character representation you could >use single or multibyte character strings). The source character set is used both inside and outside of string literals; those within string literals (or character constants) are mapped to the execution character set as they are tokenized. For the purposes of this discussion, I'm assuming that the source and execution character sets are identical, and that they contain `$' and/or non-English letters in addition to the minimal character set of 2.2.1. >>o The international community is aware of this and accepts it. > >Yep, why not ? Because the users can't use their native languages to name their variables. Doesn't it bother you that you can't have a variable named `IJspret' with a proper ligature instead of separate letters? It bothers me, and I don't even have any plans to use such a feature. (Actually, the problem occurs even in English; I once had a set of constants named DONT_xxx to selectively suppress individual features of a large system. I didn't worry about the lack of an apostrophe, because (a) there's nothing to be done about it, since the symbol is already in use, and (b) the meaning was clear without it. The correct use of the apostrophe seems to be declining in American English anyway. But that's a topic for a different group.) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint