Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!rutgers!sunybcs!boulder!hao!oddjob!mimsy!umd5!brl-adm!brl-smoke!gwyn
From: gwyn@brl-smoke.ARPA (Doug Gwyn )
Newsgroups: comp.unix.wizards
Subject: Re: byte != 8 bits
Message-ID: <6223@brl-smoke.ARPA>
Date: Sun, 2-Aug-87 00:11:35 EDT
Article-I.D.: brl-smok.6223
Posted: Sun Aug  2 00:11:35 1987
Date-Received: Sun, 2-Aug-87 11:07:41 EDT
References: <218@astra.necisa.oz> <142700010@tiger.UUCP> <2792@phri.UUCP> <857@bsu-cs.UUCP> <911@bsu-cs.UUCP> <3566@sdcsvax.UCSD.EDU>
Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>)
Organization: Ballistic Research Lab (BRL), APG, MD.
Lines: 21

In article <3566@sdcsvax.UCSD.EDU> elman@amos.ling.ucsd.edu (Jeff Elman) writes:
>I'm a little confused about this argument.  While Kanji are often
>called "characters", they're not characters in the sense most people
>probably understand.  Kanji are ideograms, and  Kanji characters (or 
>character pairs) correspond to what we think of as words.  Is the proposal 
>thus that  bytes should be capable of transmitting entire words?  That 
>hardly seems reasonable.  

The confusion is introduced by trying to take "character" and "word"
too literally.  What is necessary computationally is support for
handling individual basic textual units, whatever they might be.
In English, that includes letters of the alphabet in both upper-
and lower-case as well as digits and punctuation and separator
symbols.  One could include additional formatting controls as well,
and for some specialized disciplines such as mathematics a batch of
funny-looking squiggly things are also needed.

Thus, the desired "character set" contains whatever is necessary
so that a sequence of selections from the set can represent the
language.  In any case, the point was that a BYTE is NOT in general
large enough to encode all requisite basic textual units.