Path: utzoo!attcan!uunet!nbires!ico!rcd From: rcd@ico.ISC.COM (Dick Dunn) Newsgroups: comp.lang.c Subject: trigraphs in X3J11 Keywords: bizarre Message-ID: <5215@ico.ISC.COM> Date: 20 May 88 01:24:25 GMT Organization: Interactive Systems Corp, Boulder, CO Lines: 76 I've etalked to a few people about this, but I'd like to see if there's more info floating around. Background: "Trigraphs" in dpANS C are a way of avoiding the problems of character-set restrictions, by introducing 3-character replacements for those characters which are required for C but do not exist in the ISO 7-bit set. For example, if your character set doesn't have braces {}, you can use ??< and ??> to denote them. The behavior is as if trigraphs were replaced by the corresponding single characters in a prepass to the compiler, *including* replacement within strings. All trigraphs begin with "??". The draft standard seems to be written in such a way that a compiler MUST accept these trigraph sequences. I'm perplexed on a couple of points here. 1. Replacement within strings: This is a change to the existing language. It breaks existing programs. I looked through existing source code that we have here and found several programs which get broken or significantly altered. Here's an example--sanitized, but typical of what can happen. Suppose you now have: printf("bad status ??<%x>??--device %n\n", st, dev); What you're going to get, according to the draft standard, is something that has the effect of: printf("bad status {%x>~-device %n\n", st, dev); Point: The sequence "??" is not at all rare. Why was it chosen as the introducer? (I think people who start getting messages about using `/dev/tty^ are going to be confused.) Note also that it is common practice to use "?" in initializing strings where the "?" positions will be replaced at execution time. Pity the poor programmer who sets up something like: char ta[] = "/tmp/d?????/a", tb[] = "/tmp/d?????/b"; and discovers (eventually) that these strings are each two characters shorter than they used to be; if he tries to replace the ?s, he'll write off the ends of the strings! NOW, before you light 'em up and blast me, YES, I realize it's a hard problem. There aren't many safe character sequences to use--and YES, I know that you can't use backslash because that's one of the possibly- missing characters. What I don't understand is why it was decided to introduce a brand-new (I assume) mechanism which breaks existing code. 2. Replacement in program text: My philosophical objections to replacement of trigraphs within a program are much less...but I wonder who might ever use them. Is there any precedent for these sequences? Is there any reason to think they'll be used? Let's take another (slightly contrived but realistic) example here--I'll construct a piece of code which says, roughly, "If the first character of `line' is a sharp or percent, call function prepro to handle the rest of the line, then increment linect". We would now write this as: if (line[0]=='#' || line[0]=='%') { prepro(&line[1]); linect++; } Replacing all the nasty characters with corresponding trigraphs gives: if (line??(0??)=='??=' ??!??! line??(0??)=='%') ??< prepro(&line??(1??)); linect++; ??> I submit that this will produce code which is so near to unreadable that there is virtually no prospect of the mechanism ever seeing significant use. If you believe that, you have to wonder why every standard compiler should have to carry the extra baggage. If you don't believe that, I'd like to see some real evidence to show that programmers might use it. A general question: Has the trigraph mechanism been tried out, in real practice, anywhere prior to the introduction in X3J11? If so, I'd like to hear about how it's worked out. -- Dick Dunn UUCP: {ncar,cbosgd,nbires}!ico!rcd (303)449-2870 ...Never attribute to malice what can be adequately explained by stupidity.