Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!eecae!netnews.upenn.edu!rutgers!cmcl2!adm!smoke!gwyn From: gwyn@smoke.BRL.MIL (Doug Gwyn ) Newsgroups: comp.lang.c Subject: Re: trigraphs (was Why are character arrays special) Message-ID: <9650@smoke.BRL.MIL> Date: 14 Feb 89 20:19:40 GMT References: <19742@uflorida.cis.ufl.EDU> <225800126@uxe.cso.uiuc.edu> <1875@dataio.Data-IO.COM> <15941@mimsy.UUCP> Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 52 In article <15941@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: > Do you want to have trigraphs available? >If the user answers `yes', the next prompt is: > Why? I'd be about the last person to defend trigraphs as a technical element of the C language, as anyone who has attended X3J11 meetings could confirm. However, by now I've heard the official party line enough times that I think I can answer questions about this "feature". Trigraphs are intended as a means of portably transmitting maximally portable C programs between systems with potentially different character sets. Because separate preprocessors, data transmission protocols, etc. were outside the charter of X3J11 but nevertheless the Committee desired to ensure this degree of source code portability, they agreed that the minimal ISO character set requirements could be taken as the basis for such source code transfer. Because C traditionally uses symbols not in the ISO base character set, some substitutes for such symbols, that could be expressed entirely within the ISO base set, had to be found. The ??* form of trigraphs was chosen as the least problematic of all suggested alternatives. The important practical point is that C programmers are NOT expected to use trigraphs when they type in their source code, and they should not see trigraphs when displaying source code on any device on common modern computing systems. Trigraphs are intended for program interchange only. (Quite honestly, I doubt that everyone in X3J11 originally had this notion, but it appears to be the current party line.) Note that trigraphs may best be dealt with by a separate translator, ideally a separate program that could practically be skipped except the first time that code is imported from another site. The translator could be officially defined as part of one's Standard-conforming implementation, but in practice used only for validation testing and for translating imported source code. One can imagine circumstances in which some such translation would always be necessary, for example in some existing European character set environments. An extra level of translation (having nothing to do with trigraphs) is allowed in translation phase 1 to deal with such environments, which are beyond the scope of X3J11 or indeed any programming language standards group. In fact the C source code character "x" need not look anything like a Roman "X" as stored, displayed, or manipulated externally, and it can occupy any number of bytes in external storage. Therefore, even in character sets lacking a representation for the letter "x" it is possible to devise an encoding for C program source that might contain instances of source code character "x". Fortunately the ISO base set includes all the traditional C alphanumerics, just not all its special symbols such as "\". Thus in some ISO environments, "\" and other special C source symbols must be mapped into external encodings. Trigraphs were an attempt to standardize this mapping for ISO-based systems. Looking back at the consequent noise and confusion, I think many X3J11 members now wish we hadn't tried to "pioneer" in this area.