Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!decwrl!sun!pitstop!sundc!seismo!uunet!mcvax!ukc!acorn!john From: john@acorn.co.uk (John Bowler) Newsgroups: comp.windows.x Subject: Re: 8 bits per char Summary: xterm emits VT102 - not ISO-LATIN-1 Message-ID: <722@acorn.co.uk> Date: 23 Feb 89 19:27:23 GMT References: <8902211720.AA14715@internal.apple.com> Organization: Acorn Computers Limited, Cambridge, UK Lines: 38 In article <8902211720.AA14715@internal.apple.com>, alan@APPLE.COM (Alan Mimms) writes: > This is an impassioned plea for people NOT to strip "that annoying parity > bit" when dealing with characters translated from keyboard events. This > "parity" bit is really a valid part of the character! Many character > sets (INCLUDING ISO Latin 1) REQUIRE all 256 possible values to be > representable. For example, if a user wants a "O-umlaut" (or O-diaeresis), > he won't get one when he's talking to a client that strips the high order > bit -- he'll get a "V" instead. xterm is guilty of this. Hopefully, > this will change with the next release (x11r4?). > But xterm neither receives not transmits ISO-Latin-1 characters - it receives keycodes, it transmits the codes defined by the VT102 transmitted codes (well documented in a VT102 manual... if you can find one). The latter are basically ASCII (7 bit) codes with some variations for national character sets. The VT220 can transmit 8 bit codes (corresponding to the complete DEC multinational character set) but I don't believe the VT102 supports these (I don't have a VT102 manual :-(. If xterm receives a keycode whose current keysym mapping does not fall into the DEC multinational character set I don't see what it can do about it. Even if it transmits an 8 bit character (say from the upper half of the multinational character set), in UN*X the tty will normally clobber the eighth bit anyway. As an experiment I switched my xterm pty into raw mode and typed the key on my keyboard (this generates a keycode with a suggested keysym of XK_sterling). I regretted it - it would seem that some 8 bit character did, in fact, get through, because my csh promptly died (csh uses the ``spare'' bit in input characters while parsing the line, if it receives a byte with the top bit set it screws up :-(. This area is a total mess - but its not X's fault - and a real solution would be a major change to most of the computer worlds preconceived ideas. After all, what use is an extra bit if you want to transmit Chinese or Japanese characters? So how can you say what is the ``valid part'' of an arbitrary character stream? Surely that is a matter for the two programs at either end, or for international standards (and the only really accepted standard - ASCII - says that there are only 7 bits in a character). John Bowler (jbowler@acorn.co.uk)