Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!henry From: henry@utzoo.UUCP (Henry Spencer) Newsgroups: net.lang.c,net.unix-wizards Subject: summary of C-standards workshop at Usenix Message-ID: <4013@utzoo.UUCP> Date: Sat, 30-Jun-84 20:59:02 EDT Article-I.D.: utzoo.4013 Posted: Sat Jun 30 20:59:02 1984 Date-Received: Sat, 30-Jun-84 20:59:02 EDT Organization: U of Toronto Zoology Lines: 263 The following is an informal report on what was said at the C Standards workshop at Usenix. The workshop essentially consisted of a presentation by Larry Rosler (of the ANSI C effort) plus question-and-answer afterwards. I apologize to Larry for any errors in the following. (Incidentally, he deserves a vote of thanks from everyone who attended the session. He flew in from the East Coast, at considerable inconvenience, basically just to give that talk.) The ANSI C standards effort is X3J11. It's split into three subcommittees: environment, library, and language. Rosler is chairman of the language subcommittee. The environment subcommittee is wrestling with a whole mess of very fuzzy things about how C relates to its surroundings. Alone of the three sub- committees, this one has no existing document to work from, so they're sort of feeling their way. Among the things they're trying to cope with are how a C program gets run (tentatively "main(argc, argv)", but the question of environment variables is very difficult on non-Unix systems) and how to resolve problems with European character sets. The library subcommittee is working from chapters 2 and 3 of the Unix manual. Most of chapter 2 is gone because it's Unix-dependent, although a few things like "signal" are still there. Most of chapter 3 is still present: stdio, chars and strings, memory allocation, basic math functions (nobody feels like standardizing the Bessel functions!). They are looking at things like error handling in the math library. The language subcommittee is the one all the detail following is about. Their basic goals are: - portability - preservation of the "spirit of C", i.e. the ability to get right down into the bits if you want - minimizing the impact on existing valid programs - formalizing proven enhancements (emphasis on "proven") - producing precise but readable documents The specific approach to that last item is to tidy up and tighten up the existing C Reference Manual. The idea of defining C by use of a mathematical formal definition was discussed, but it was rejected on the grounds that the audience for a definition written in English is several orders of magnitude larger. They've started from the System V.2 C Reference Manual. There have been three major areas of change in that since the "white book": 1. Long identifiers. The problem with Berklix-style arbitrary-length names is that they break existing tools and file formats. The breakage is much less severe if one simply cranks up the limit instead of making it infinite. Internal names (including pre- processor names) are now significant to 31 characters. External names are, alas, significant only to 6 characters and case is not significant in them; this cannot be improved without making the standard incompatible with most non-Unix object-module formats. 2. Void and enum. "void" is the type returned by a function that doesn't return a value. You can also cast things to "void" to throw away an unwanted value. The keyword is also used in a couple of other places, discussed later, to avoid having to introduce too many new keywords (any of which has the potential to break existing programs). Enums are as in V7; improvements to permit things like ordering comparisons (>=, etc.) on enums are still being thought about. 3. Structure/union improvements. Structure assignment, passing, and returning are as in V7. Structure comparison isn't there, at least not so far. Member names are now local to the particular structure, instead of all being in a global name space; this means that you have to be more careful about getting the type of (e.g.) the left-hand-side of "->" correct, or the compiler will object. The committee has introduced three major changes since the V.2 CRM: A. Function-argument type declaration and checking. Instead of just saying "extern int fread();", you can now say: extern int fread(char *, int, int, FILE *); so the compiler can do proper type checks. In the event of a type mismatch, the same conversions as for the assignment operator apply. (Hooray, no more casting NULL pointers!) Variable-argument functions like printf can be declared like: extern int printf(char *,); It is admitted that the comma is not all that conspicuous, and that this syntax makes it impossible to declare a function which has *only* variable arguments. These things are, of necessity, compromises. [Please note that neither Larry Rosler nor I necessarily *like* all the things I'm reporting.] There is an ambiguity when it comes to declaring no-argument functions, since "extern int rand();" looks like an old-style declaration which doesn't say anything about the arguments. The convention for this is: extern int rand(void); which means "no parameters". B. "const". A new keyword (sigh) which is used to mark things that are read-only, with run-time assignments forbidden. These things might be put in ROM or in text space. Some examples, with notes: const float pi = 3.14159; This is a real, live, named constant, which will show up in the symbol table (unlike #defines). const short yacctable[1000] = { ... }; An obvious case. const char *p; /* pointer to constant */ const *const q; /* constant pointer to something */ Illustrating two different uses: the first is a pointer that can be changed but can't be assigned through; the second is a pointer that can be assigned through but can't be changed. It is agreed that the syntax is less than ideal. Note that const is *not* a storage class, it is part of the type. extern char *strcpy(char *, const char *); Illustrating telling the compiler that strcpy doesn't change its second argument. C. Single-precision arithmetic. If all operands in an expression are float, the compiler is allowed (not required!) to evaluate it in float rather than double arithmetic. The choice is explicitly implementation-dependent. Casts can be used to force evaluation in double. Numeric constants, e.g. "1.0", are double, *not* float! This last isn't ideal, but trying to fix it invariably makes life much more complex. The original double-only rule was partly a concession to the pdp11, partly just plain simpler, but partly a way of avoiding multiple versions of all the library routines. With declarations of function argument types, the last problem is pretty much fixed. All the library functions in the standard want "full width" types, so that if you don't declare them, you're still safe. Some lesser issues: I. "Promiscuous" pointer assignments are illegal. You must use casts when mixing pointer types or mixing ints with pointers. II. "void *" is a new kind of pointer, which cannot be dereferenced but can be assigned to any other type of pointer without a cast. The idea here is that "char *" is no longer required to be the "universal" pointer type which can point to anything. So for example, the declaration of fread earlier really should go: extern int fread(void *, int, int, FILE *); (People who have machines where all pointers have the same representation, don't complain. You are lucky. Others aren't.) III. "volatile" (the choice of name is tentative) acts like "const" in the syntax, but with different semantics. It means that the data in question is "magic" in some way (e.g. device registers) and that compilers should not optimize references to such things. This resolves a long-standing problem with writing optimizing compilers for C. IV. "signal" is in the library. This means that reentrancy is explicitly part of C. V. The preprocessor is part of the language. The committee has opted for a simple and clean definition, which does not perpetuate some implementation accidents of some of the existing ones. There are some minor improvements, like permitting space before the "#". Some trivial additions: i. Hexadecimal string escapes. [Retch.] "Here's an ESC \x1b ". ii. String constant concatenation. Two string *constants* occurring adjacent to each other in the source are considered concatenated. Note that this is constants only. Among other minor things, this makes string continuation across line boundaries less ugly. iii. "unsigned char", "unsigned short", "unsigned long" are all part of the language. Plain "char" is *not* required to be signed or unsigned (requiring either would make efficient implementations impossible on some machines). The question of a "char-sized int" type, of whatever syntax, has not yet been resolved. iv. The unary + operator. Same conversions and type restrictions as unary -. Does nothing. This is partly consistency with other languages, and partly consistency with things like "atof". (At the moment, "+3.14" is valid when atoffed from a string but not when compiled into a program!) v. Initialization of unions and automatic aggregates. The latter is just removal of an existing restriction. The former is tricky; there is *no* clean way to define it. The committee has opted to do something not necessarily good, but simple: the type of the initializer is that of the lexically-first member. vi. The selection expression of a "switch" can be of any integer type. (E.g. it can be a "long".) vii. #elif. An added bit of preprocessor syntax, to simplify using #if's like a "switch". Some things are gone: 01. "entry", "asm", and "fortran" keywords. (Although the last two will probably be mentioned in a "recognized extensions" appendix.) 02. "long float" is no longer a synonym for "double". Nobody ever used it. There was discussion of using "long float" and "long double" to cope with machines having more than two floating-point types, but conversions and such are an unknown swamp in such a case, and the committee decided not to try. 03. 8 and 9 are not octal digits. 04. Pointer-integer conversions now are strictly type-checked, as I mentioned earlier. 05. The following code fragment is illegal: foo(parm) int parm; { int parm; ... Some compilers interpret such a situation as nested scopes, so the inner declaration hides the outer one. In this particular case, this seems both useless and dangerous. The scope of the arguments of a function is now identical to that of the local declarations, so this is a duplicate declaration and illegal. 06. Nothing is said about the alignment of bitfields, not even the K&R guarantee that they don't straddle word boundaries. 07. Some existing compilers permit taking the address of a variable declared "register" if the variable is not in fact placed in a register. This is now outlawed; "register" and the unary "&" operator don't mix. All in all, the current draft standard doesn't sound too bad to me. I will be getting a copy of it shortly, and may have some more comments at that time. A number of things are still unsettled. The committee's (very tentative) notion of schedule is a final draft for public comment by the end of the year, and a real standard by the end of next year. [Sound of crossing of fingers.] Comments on this should *not* be addressed to me; I'm just an interested observer, not a participant. Write to: Lawrence Rosler Supervisor, Language Systems Engineering Group AT&T Bell Laboratories Summit, NJ USA No, I don't have a network address for him. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry