Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!lll-crg!hoptoad!gnu From: gnu@hoptoad.uucp (John Gilmore) Newsgroups: comp.lang.c,comp.std.internat Subject: draft ANSI standard: trigraphs rear their ugly heads again Message-ID: <1381@hoptoad.uucp> Date: Tue, 2-Dec-86 01:27:54 EST Article-I.D.: hoptoad.1381 Posted: Tue Dec 2 01:27:54 1986 Date-Received: Tue, 2-Dec-86 06:27:28 EST Organization: Nebula Consultants in San Francisco Lines: 84 Xref: mnetor comp.lang.c:200 comp.std.internat:4 [This is posted to comp.lang.c because mod.std.c seems to be dead. Love those mod groups!] The committee did not want to tie C to ASCII. Fair enough. What they did was require that all the relevant characters be in the character set (section 2.2.1), but not say anything about their character encoding. In fact, you could compile source code in ASCII to run on a machine that uses EBCDIC in the runtime environment. This is great. The problem is that they went ahead to try to define a way to represent all the relevant characters in all the ISO code sets used in Europe. Since various countries reuse #, [, {, }, ], \, |, ~, and ^ as letters and such, they have defined three-character sequences that can be used to represent these characters. Now, these are major characters in the language. The preprocessor prefix #. The block structuring construct { }. The array subscripter [ ]. And the ultimate escape character \, as well as a bunch of logical ops. My question is this. Is a C program that is written in plain old ASCII, using the above characters, portable? Is it a "strictly conforming program"? Is every ANSI standard C compiler in the world required to read in such a program and translate it properly? Next question. Is a C program that uses local letters outside character strings (e.g. as letters in French or Swedish identifiers) portable? Is it a "strictly conforming program"? Are there ANY C compilers anywhere in the world which will read in such a program and translate it properly? My preliminary answers are: C programs that use ASCII characters had damn well better be strictly conforming, or every C program in the world is broken. C compilers on European machines could support the national letters in identifiers and such, but any program that used this feature would not be portable. Since a European C compiler which supported using the local characters AS LETTERS would encourage unportable code, it would be better to make European C compilers which did not support using the local characters as letters. This is tough, but are we trying to be nice or are we trying to encourage portability? Since the specific intent of the standard is to prompte portability, features in the standard which encourage the generation of nonportable code should be questioned. Newly introduced features discouraging portability should be removed. Now. If European C compilers do not support using the local characters as letters, and don't support using them as ASCII punctuation, everyone in Europe will be forced to write their code using trigraphs. Of course, any code written in North America or the UK will use ASCII characters, so the Europeans will have to write a program to translate the imported {, }, etc into trigraphs. I think that a better solution is for the European compilers to support these character codes to mean what they mean in ASCII. Now imported sources can be compiled directly. Also, Europeans would have the choice of editing the ASCII sources rather than using trigraphs. The programs will look funny on local terminals, but I don't see how it can be harder to read a program filled with local letters as punctuation, than it can be to read a program that looks like: ??=include main(argc, argv) int argc; char **argv; ??< char buf??(??) = "Hello, world!??/r??/n"; if (feof(stdin) ??!??! argc != 0) ??< printf(buf); ??> ??> Since the trigraphs are even uglier than the alternative, and since European compilers will not be able to use those character codes for anything else, there is no need for introducing the trigraphs. "The X3J11 charter clearly mandates the committee to *codify common existing practice*" (emphasis theirs -- Rationale, pg. 1). The committee's justification for ignoring common practice here is too weak. The trigraphs should be removed. -- John Gilmore {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu jgilmore@lll-crg.arpa "I can't think of a better way for the War Dept to spend money than to subsidize the education of teenage system hackers by creating the Arpanet."