Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!rutgers!sunybcs!boulder!hao!oddjob!mimsy!umd5!brl-adm!brl-smoke!gwyn From: gwyn@brl-smoke.ARPA (Doug Gwyn ) Newsgroups: comp.unix.wizards Subject: Re: byte != 8 bits Message-ID: <6223@brl-smoke.ARPA> Date: Sun, 2-Aug-87 00:11:35 EDT Article-I.D.: brl-smok.6223 Posted: Sun Aug 2 00:11:35 1987 Date-Received: Sun, 2-Aug-87 11:07:41 EDT References: <218@astra.necisa.oz> <142700010@tiger.UUCP> <2792@phri.UUCP> <857@bsu-cs.UUCP> <911@bsu-cs.UUCP> <3566@sdcsvax.UCSD.EDU> Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 21 In article <3566@sdcsvax.UCSD.EDU> elman@amos.ling.ucsd.edu (Jeff Elman) writes: >I'm a little confused about this argument. While Kanji are often >called "characters", they're not characters in the sense most people >probably understand. Kanji are ideograms, and Kanji characters (or >character pairs) correspond to what we think of as words. Is the proposal >thus that bytes should be capable of transmitting entire words? That >hardly seems reasonable. The confusion is introduced by trying to take "character" and "word" too literally. What is necessary computationally is support for handling individual basic textual units, whatever they might be. In English, that includes letters of the alphabet in both upper- and lower-case as well as digits and punctuation and separator symbols. One could include additional formatting controls as well, and for some specialized disciplines such as mathematics a batch of funny-looking squiggly things are also needed. Thus, the desired "character set" contains whatever is necessary so that a sequence of selections from the set can represent the language. In any case, the point was that a BYTE is NOT in general large enough to encode all requisite basic textual units.