Xref: utzoo comp.emacs:4624 comp.lang.c:14042 comp.sys.ibm.pc:21156 Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!decwrl!labrea!agate!helios.ee.lbl.gov!lll-tis!oodis01!uplherc!sp7040!obie!wsccs!terry From: terry@wsccs.UUCP (Every system needs one) Newsgroups: comp.emacs,comp.lang.c,comp.sys.ibm.pc Subject: Re: Programming and international character sets. Summary: hard; very very very hard. Message-ID: <774@wsccs.UUCP> Date: 10 Nov 88 03:49:14 GMT References: <532@krafla.rhi.hi.is> <8804@smoke.BRL.MIL> <207@jhereg.Jhereg.MN.ORG> <621@quintus.UUCP> Lines: 46 In article <621@quintus.UUCP>, ok@quintus.uucp (Richard A. O'Keefe) writes: > In article <207@jhereg.Jhereg.MN.ORG> mark@jhereg.MN.ORG (Mark H. Colburn) writes: > >In article <8804@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) writes: > >>In article <532@krafla.rhi.hi.is> kjartan@rhi.hi.is (Kjartan R. Gudmundsson) writes: > >>>How difficult is it convert american/english programs so that they can > >>>be used to handle foreign text? [etc.] > > Xerox have supported a 16-bit character set (XNS) for years. > Some of the surprises mentioned by Mark Colburn have been no news > to Interlisp-D programmers for a long time. > > The kludges being proposed for C & UNIX just so that a sequence of > "international" characters can be accessed as bytes rather than pay > the penalty of switching over to 16 bits are unbelievable. First of all, there are too many 8-bit character models available: All of the ISO models, DEC Multinational, 7-bit replacement sets, Wang-PC international sets, and IBM-PC International sets. There is no way to consolidate it without mapping, and that's so device dependant it isn't funny. Consider your termcap growing by at least 128 times the number of entries characters... assuming that there is no need for multiple GS/GE strings, as it may require more than one additional character set on some terminals. Second, vi in the US strips the 8th bit out, and is therefore not usable for programming international (8-bit) characters using either model. Problems with 16 bit characters: O The Xerox model is 16-bit and only valid for bitmapped displays, like Mac, and we all know how slowly that scrolls. O All of the current software would break without extensive rewrite O The internal overhead in a non-message passing operating system (most of them) is so high that it's ridiculous. O Think of pipes and all file I/O going half as fast. O Think of your hard disks shrinking to half their size... source files, after all, are text. terry@wsccs