Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!ames!ucbcad!ucbvax!cbosgd.mis.oh.att.com!mark
From: mark@cbosgd.mis.oh.att.com.UUCP
Newsgroups: mod.protocols.tcp-ip
Subject: Re: Telnet 8th bit: a good use for that bit...
Message-ID: <8702160300.AA02734@cbosgd.MIS.OH.ATT.COM>
Date: Sun, 15-Feb-87 22:00:06 EST
Article-I.D.: cbosgd.8702160300.AA02734
Posted: Sun Feb 15 22:00:06 1987
Date-Received: Mon, 16-Feb-87 07:04:07 EST
References: <8702150617.AA00895@sun.Sun.COM>
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The ARPA Internet
Lines: 34
Approved: tcp-ip@sri-nic.arpa

>>I think that it would be good to specify that 8-bit values passed
>>on Telnet connections are in ISO Latin I (essentially, extend NETASCII
>>to 8 bits using the ISO character set that contains all the graphics
>>for all the Latin languages).
>
>That would leave all the non-Latin languages, like Japanese, Chinese,
>Korean, etc., out in the cold.  It would be a mistake to require that
>8-bit values (i.e, GR characters, with the 8th bit set) passed over
>TELNET connections be in one particular character set.  If need be,
>there could be TELNET options to indicate which character set is
>being sent over the wire.

Good point.

The Japanese standard (or at least one of them) is in some sense upward
compatible with ASCII and European character sets.  Two byte sequences
with both high order bits set are Kanji, single bytes with the high
bit set are European.  Anything that might be a control character is
always a control char, no matter what else surrounds it.

I don't have the details, and I don't know if this extends to Korean.
I know it won't handle Chinese, because there are more characters in
the Chinese language.

However, TELNET option negotiation is very good at this sort of thing,
all we'd have to do is standardize the character sets (or provide an
open ended option that can be grown as needed.)

I suspect that if we just say that TELNET has to be 8 bit transparent
(except for a couple of things like 377 and CR) then most of the rest
of this won't matter - we could apply a default character set (which
might be ASCII, or European) unless options are negotiated otherwise.

	Mark