Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83 based; site hou2d.UUCP Path: utzoo!watmath!clyde!cbosgd!ihnp4!houxm!hou2d!osd From: osd@hou2d.UUCP (Orlando Sotomayor-Diaz) Newsgroups: mod.std.c Subject: mod.std.c Digest V10#3 Message-ID: <698@hou2d.UUCP> Date: Wed, 18-Sep-85 18:59:58 EDT Article-I.D.: hou2d.698 Posted: Wed Sep 18 18:59:58 1985 Date-Received: Thu, 19-Sep-85 06:55:18 EDT Organization: AT&T Bell Labs, Holmdel NJ Lines: 255 Approved: osd@hou2d.UUCP From: Orlando Sotomayor-Diaz (The Moderator) mod.std.c Digest Wed, 18 Sep 85 Volume 10 : Issue 3 Today's Topics: Comments on draft C standard (General) Comments on Section B Comments on Section C ---------------------------------------------------------------------- Date: Mon, 9 Sep 85 16:48:16 mdt From: ihnp4!alberta!myrias!cg (Chris Gray) Subject: Comments on draft C standard (General) To: alberta!ihnp4!cbosgd!std-c Why not add some definitions and use them throughout: charspace: any number of ' ', '\t', '\b' linespace: one of '\n', '\r', '\v', \'f' whitespace: any amount of charspace intermixed with comments (which are allowed to contain linespace) Now, are there any places in C where charspace is allowed, but whitespace isn't? Should there be? (My definitions don't match with the draft's, but at least they are consistent.) (My intent is that the preprocessor grammar use linespace at the end of its productions.) (discussed a bit July 16) ------------------------------ Date: Mon, 9 Sep 85 16:48:16 mdt From: ihnp4!alberta!myrias!cg (Chris Gray) Subject: Comments on Section B To: alberta!ihnp4!cbosgd!std-c B.1.1.2 Translation phases Wouldn't it be better to NOT delete all newline backslash sequences, but rather to specify those places where a backslash token followed by a newline token can be deleted (macro bodies, macro calls, strings)? The current definition allows them inside keywords, identifiers, character constants, preprocessor lines, etc. This flexibility doesn't buy anything. The only use I can see for separating steps 3 and 4 is the special parsing of #include file names using angle brackets. What am I missing which requires character constants, string literals, and comments to be done specially (other than that they allow newlines in them)? Also, given that #include has to be fudged anyway, why not allow the rules that some older compilers did, such as the file name (including delimiters) extending from the first occurrence of the opening delimiter to the LAST occurrence of the closing delimiter? Thus I could say #include .3> and get file name B0:.3 which might be valid (and maybe even needed) on some wierd system. The same special processing is needed for #pragma's as well. This should be stated under step 3. Step 6 mentions newline characters. What newline characters? After step 4 there aren't any. In step 6, the current rules indicate that adjacent string literals are concatenated. Do we really intend that to happen if, by some chance (or due to a programmer that should be shot), the last token in a #include file is a string and the first token on the line after the #include is another string? A compiler will need some sort of indication of file transitions in order to produce useful error messages, so disallowing this shouldn't be much of a burden. Step 6's retokenization is a bit unclear. In order to retokenize the source, it must first (conceptually at least) be untokenized. To preserve meaning, some pairs of tokens must have spaces added between them, but tokens concatenated by ## explicitly don't have this done to them. Perhaps the step could be reworded to say that character sequences resulting from token concatenation are retokenized according to the normal tokenization rules. Another unclear aspect is that of exactly what happens when two tokens are concatenated - if the input tokens (perhaps coming from macro expansion) were 100L and 33L, I gather the result is NOT 10033L. Tokenization is often an information-losing process. It might be better to state exactly what all combinations are supported for ## and what they do. (e.g. what does 33L ## 25 yield? Does the size of the target ints affect what happens (does the tokenizer have to distinguish between a number being long because it doesn't fit the target int v.s. having 'L' on the end?) B.2.1 Character sets Perhaps should state that other things that look like trigraphs are not, and do not produce any error messages. (People who use things like "p < 0 ??????" would be upset, otherwise.) B.2.2 Character display semantics If you're gonna make backspacing past the beginning of a line undefined, then printing past the end of the line should be as well. Thus the first paragraph should end in something like " if there is a next position on the current line, else the effect is undefined". (mentioned ~ Jun 30) ------------------------------ Date: Mon, 9 Sep 85 16:48:16 mdt From: ihnp4!alberta!myrias!cg (Chris Gray) Subject: Comments on Section C To: alberta!ihnp4!cbosgd!std-c C.1 Lexical Elements Types of tokens not including white-space conflicts with B.1.1.2 which talks about white-space tokens. C.1.2 Identifiers - semantics Just when identifiers defined as macros are replaced by their bodies is a lot more complicated than stated. The replacement can be inhibited by the "defined" construct and by the fact that the macro name is being produced by its own expansion. C.1.2.5 Types page 17, near top. Are not unions also classed as aggregates? C.1.4 String literals - semantics "Adjacent string literals" should be defined better. Consider: #define BLAH "hello" "there" I would imagine the intent is that the strings are NOT concatenated. (It all works better if it's explicitly stated that string concatenation isn't done until AFTER preprocessing.) C.8 Syntax & Constraints Again, the discussion concerning the newline character is not appropriate, since, according to section B.1.1.2 there won't be any when preprocessing is done (they have been turned into newline TOKENs). C.8 Semantics Given that tokenization has been done, and that tokenization removed all sign of comments, it would appear that comments are allowed before the '#' and between it and the preprocessor command. C.8.1 Source file inclusion The form # identifier new-line is stated to allow the identifier to expand into either form (".." or <..>). Given that the macro body was tokenized just like everything else, the second form is impossible. E.g. #define STANDARDINCLUDE ... #include STANDARDINCLUDE would result in trying to process #-token include <-token stdio .-token h >-token Also, given that an identifier must be expanded, it's no harder to allow a macro call with parameters. C.8.2 Macro Replacement In the third last paragraph on page 62 - "white space preceding the first token or following the last token is deleted." What does this mean? I conclude that the intent is that the string generated should have no space before the generated representation of the first token or after the representation of the last token. What happens if one of the tokens is a string - are it's quotes conceptually removed and it's body used in the generated string, or are it's quotes escaped and included in the generated string? The third alternative (which is easy if the preprocessing is done using only characters, and not tokens) is to effectively retokenize the resulting character sequence - what was in a string before is now outside of one - ughh! Are newline tokens allowed in the parameter list? They need not be, given that backslash-newline pairs were previously deleted, but see my earlier comment regarding that. The last paragraph on page 62 and the first on page 63 are unclear. How about: "... The token sequence resulting from the macro expansion can be divided into two parts - those tokens coming directly from the macro body, and those coming from the macro parameters. Macro calls in the former are expanded only if they are calls to a different macro. All macro calls in the latter are expanded. After all such replacements have taken place... ... to a single token. The result is not reprocessed as a preprocessing directive, even if it resembles one." This is still a bit unclear. Is the intent that macro calls that are recursive through other macros be not expanded? Also, what if an inner macro call is generated as a result of concatenating the two kinds of replacement tokens - what rules does the expansion follow? Again, token concatenation should be explained in more detail. For example, newline tokens must not be concatenated with anything (it doesn't make sense). I suggest putting in a table of those that ARE supported; e.g. string string (yields string) (redundant) char char (yields string) string char (yields string) char string (yields string) string int (yields string, use decimal form, no 'L' or 'U') char int (yields string) int int (decimal forms, some sort of rules for handling 'L' and 'U' combinations and for out-of-range problems) id id (yields id) id int (yields id) I vote for not allowing operator concatenation, and other funny things that only lead to unreadable, unportable programs. Also note that allowing id ## string => id can result in illegal ids. This may in fact be useful, but has implications for external character sets, etc. Is there any reason for restricting the '#' enstringing (?) operator to inside macro bodies? The example given on page 63: #define f(x) f(a * (x)) adds a lot to the complexity of macro expansion and results in nothing except unreadable code. Programmers who use it should be shot. A much more readable form of modifying a function call would be: #define FUNC(x) func(a * (x)) Here at least the reader of the program has some warning that calls to FUNC may not be quite what they seem to be. (For much the same reason, I am opposed to ANY macro names that contain lower case letters. Thus, I would suggest that define ERRNO, not errno. The others I won't argue about too much, since they seem destined to become part of the language, and the programmer/reader must be aware of all of them.) C.8.4 Line control Why not allow macro expansion here as well? It's conceivable that some processor might put out #line directives in standard positions, but that in some cases it doesn't have a new value to give and would want the effect to be nil. In that case, allowing #line __LINE__ __FILE__ would be nice. In fact, allowing full macros with parameters here is no harder than similar things on #include. ------------------------------ End of mod.std.c Digest - Wed, 18 Sep 85 18:58:36 EDT ****************************** USENET -> posting only through cbosgd!std-c. ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C) In all cases, you may also reply to the author(s) above. Brought to you by Super Global Mega Corp .com