Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!ames!ucbcad!ucbvax!cbosgd.mis.oh.att.com!mark From: mark@cbosgd.mis.oh.att.com.UUCP Newsgroups: mod.protocols.tcp-ip Subject: Re: Telnet 8th bit: a good use for that bit... Message-ID: <8702160300.AA02734@cbosgd.MIS.OH.ATT.COM> Date: Sun, 15-Feb-87 22:00:06 EST Article-I.D.: cbosgd.8702160300.AA02734 Posted: Sun Feb 15 22:00:06 1987 Date-Received: Mon, 16-Feb-87 07:04:07 EST References: <8702150617.AA00895@sun.Sun.COM> Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 34 Approved: tcp-ip@sri-nic.arpa >>I think that it would be good to specify that 8-bit values passed >>on Telnet connections are in ISO Latin I (essentially, extend NETASCII >>to 8 bits using the ISO character set that contains all the graphics >>for all the Latin languages). > >That would leave all the non-Latin languages, like Japanese, Chinese, >Korean, etc., out in the cold. It would be a mistake to require that >8-bit values (i.e, GR characters, with the 8th bit set) passed over >TELNET connections be in one particular character set. If need be, >there could be TELNET options to indicate which character set is >being sent over the wire. Good point. The Japanese standard (or at least one of them) is in some sense upward compatible with ASCII and European character sets. Two byte sequences with both high order bits set are Kanji, single bytes with the high bit set are European. Anything that might be a control character is always a control char, no matter what else surrounds it. I don't have the details, and I don't know if this extends to Korean. I know it won't handle Chinese, because there are more characters in the Chinese language. However, TELNET option negotiation is very good at this sort of thing, all we'd have to do is standardize the character sets (or provide an open ended option that can be grown as needed.) I suspect that if we just say that TELNET has to be 8 bit transparent (except for a couple of things like 377 and CR) then most of the rest of this won't matter - we could apply a default character set (which might be ASCII, or European) unless options are negotiated otherwise. Mark