Path: utzoo!attcan!uunet!mcvax!kth!enea!maxim!prc From: prc@maxim.ERBE.SE (Robert Claeson) Newsgroups: comp.windows.x Subject: Re: 8 bits per char Message-ID: <508@maxim.ERBE.SE> Date: 25 Feb 89 14:56:13 GMT References: <8902211720.AA14715@internal.apple.com> <722@acorn.co.uk> Organization: ERBE DATA AB Lines: 98 In article <722@acorn.co.uk>, john@acorn.co.uk (John Bowler) writes: > In article <8902211720.AA14715@internal.apple.com>, alan@APPLE.COM (Alan Mimms) writes: > > This is an impassioned plea for people NOT to strip "that annoying parity > > bit" when dealing with characters translated from keyboard events. This > > "parity" bit is really a valid part of the character! Many character > > sets (INCLUDING ISO Latin 1) REQUIRE all 256 possible values to be > > representable. For example, if a user wants a "O-umlaut" (or O-diaeresis), > > he won't get one when he's talking to a client that strips the high order > > bit -- he'll get a "V" instead. xterm is guilty of this. Hopefully, > > this will change with the next release (x11r4?). > But xterm neither receives not transmits ISO-Latin-1 characters - it > receives keycodes, it transmits the codes defined by the VT102 transmitted > codes (well documented in a VT102 manual... if you can find one). > The latter are basically ASCII (7 bit) codes with some variations for national > character sets. Oh well. Why not make provisions for mapping the received keycodes into the different national, 7-bit character sets (there's an ISO standard for them) instead of just assuming that everything is ASCII and chopping off the eight bit under the assupmtion that it is garbage or zero? > The VT220 can transmit 8 bit codes (corresponding to the complete > DEC multinational character set) but I don't believe the VT102 supports > these (I don't have a VT102 manual :-(. True, but DEC MCS isn't ISO 8859/1. It is much the same, but not completely, so you would need to map between 8859/1 and DECMCS if you wrote a VT220 emulator. > If xterm receives a keycode whose current keysym mapping does not fall into > the DEC multinational character set I don't see what it can do about it. Ignore it. Don't chop off the eight bit to get some random ASCII character. > Even if it transmits an 8 bit character (say from the upper half of > the multinational character set), in UN*X the tty will normally clobber > the eighth bit anyway. If you with UNIX mean the one that comes from the company who has the trademark for UNIX, it won't. If your vendor sets ISTRIP by default, just unset it. If you, however, with UNIX mean the one that is called 4.xBSD or 2.xBSD, you're right. They screwed it up. They assumed that everything useful is 7 bits. Maybe it was, but that's not true anymore. As far as I know, they plan to fix this in 4.4BSD. However, if your terminal generates a parity bit and your communication is set to 7 data bits, then the tty driver should chop it. But then, your terminal can't generate 8-bit characters anyway. But this can never happen on a workstation running X, since there's no such communication involved. > This area is a total mess - but its not X's fault - and a real > solution would be a major change to most of the computer worlds > preconceived ideas. Computers and programs written in Europe and many other parts of the world generally assumes 8-bit data paths. So change "computer world" to "Anglo computer industry". > After all, what use is an extra bit if you want to transmit Chinese > or Japanese characters? Not much. So in fact, your programs should not assume 7 or 8 bit characters. They should use a character data type that's large enough to hold a 16 bit (or maybe even 32 bit) character. If you think this is a major waste of space for 7 or 8 bit character sets, make it a user-defined (programmer-defined) data type that you can define to anything you like (char, short, long). And never rely on it having a particular size in your code. That way, it's fairly easy to adapt it to other character set sizes than ASCII. > So how can you say what is the ``valid part'' of an arbitrary character > stream? Surely that is a matter for the two programs at either end, No. Parity bits and such is part of the communication protocol, not the data path. So in fact, the tty driver -- not your program -- should check and strip parity bits. Your program should always rely on what's coming from the tty driver is valid data. Of course, there's still the question if one, two or even more bytes of data is a data item (a character). I don't know how to handle this. Maybe someone else know? > or for international standards (and the only really accepted standard > - ASCII - says that there are only 7 bits in a character). What? My only really accepted standard - ISO - says that there may be 7 or 8 or whatever bits in a character. I hope this didn't take too much space, but I think the subject is too important to ingore with phrases like "almost all characters are 7 bits, so why should I care?" and "why can't everyone use ASCII?". -- Robert Claeson, ERBE DATA AB, P.O. Box 77, S-175 22 Jarfalla, Sweden Tel: +46 (0)758-202 50 Fax: +46 (0)758-197 20 EUnet: rclaeson@ERBE.SE uucp: {uunet,enea}!erbe.se!rclaeson ARPAnet: rclaeson%ERBE.SE@uunet.UU.NET BITNET: rclaeson@ERBE.SE