Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!lll-lcc!lll-winken!uunet!mcvax!ukc!reading!cf-cm!cybaswan!iiit-sh From: iiit-sh@cybaswan.UUCP (Steve Hosgood) Newsgroups: comp.std.c Subject: Character Sets (was Re: trigraphs) Message-ID: <373@cybaswan.UUCP> Date: 28 Apr 89 23:02:51 GMT References: <4623@freja.diku.dk> <12.UUL1.3#5077@aussie.UUCP> <2469@ogccse.ogc.edu> Reply-To: iiit-sh@cybaswan.UUCP (Steve Hosgood) Organization: Institute for Industrial Information Technology Lines: 62 Several people have been talking about Trigraphs recently. Danes, Swedes, Icelanders and others have discussed at length whether or not they (the potential benefactors of such a scheme) actually *want* or *need* the damn things anyway. Now IMHO, we're seeing here the consequences of restricting the world's computer users to a 7-bit coding system originally designed just for American English. Surely it would be better for ANSI to scrap formally the concept of 7-bit coding and move to better things? As I understand it, the reason for 7-bits in the old days was so that the character and a parity bit would fit into a byte. These days though, as far as I know, all ACIA chips will happily send 8-bits and parity - though most people disable the parity anyway! I've got an article in front of me from Scientific American in 1983(ish), though I don't know the exact date as it's a photocopy. Anyway, it's pages 82 thru' 93 and written by Joseph D. Becker of Xerox Corporation, and is entitled "Multilingual Word Processing". It seems a lot of work has been done on beating the problems of handling the world's languages by means of switching of character sets. Xerox seem chiefly interested in word processing, but it's obvious that the same ideas could be used in E-mail, and presumably language source-code as well. [ ** in case you didn't see the article ** The idea is that you define 8-bit alphabets, and reserve the character 0xFF to indicate "next byte is an alphabet identifier". This allows you to switch from one character set to another in mid-text very easily. I get the feeling that the alphabets are designed to have shared sections, so that the codes 0x00 thru 0x7F print the same in the 'Roman/Hebrew' set as they do in the 'Roman/Esperanto' set for instance. Obviously the several alphabets needed for Chinese will not have any commonality with the Roman stuff though. ] I don't think you'd have to go as far as switched character sets to solve the problem of dealing with *most* of the Northern European and North American languages. Just look at the IBM-PC character set for instance. However it would be nice to think ahead a bit and allow for the Greeks, Russians, Chinese and Japanese. The result of moving in this direction would be that people with old Danish terminals would see the unrepresentable characters on screen as trigraphs, and would type them as such, but the trigraphs are a local product of the computer's TTY handler. What would appear in the source-code file would be the 8-bit Northern Europe/USA code for '{' or whatever he wanted. If someone in the USA wanted to use a 'yen' symbol, he'd have to type a trigraph for it, which would cause an alphabet-shift code to appear in the source file to cater for it. Someone in Japan reading that file would just see a 'yen' symbol. OK, well it's *far* too late for such ideas to be submitted to X3J11 now, but did anyone mention it in the early days, *before* it was too late? Actually, it's not an X3J11 problem if you put responsibility for trigraphs into the TTY handler. Whose problem would it be? -----------------------------------------------+------------------------------ Steve Hosgood BSc, | Phone (+44) 792 295213 Image Processing and Systems Engineer, | Fax (+44) 792 295532 Institute for Industrial Information Techology,| Telex 48149 Innovation Centre, University of Wales, +------+ JANET: iiit-sh@uk.ac.swan.pyr Swansea SA2 8PP | UUCP: ..!ukc!cybaswan.UUCP!iiit-sh ----------------------------------------+------------------------------------- My views are not necessarily those of my employers! "Traditional Japanese Theatre? Just say Noh" - not Nancy Reagan