Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!rochester!cornell!uw-beaver!mit-eddie!bloom-beacon!mit-hermes!iuvax!bsu-cs!dhesi
From: dhesi@bsu-cs.UUCP (Rahul Dhesi)
Newsgroups: comp.unix.wizards
Subject: byte != 8 bits
Message-ID: <911@bsu-cs.UUCP>
Date: Fri, 31-Jul-87 17:13:11 EDT
Article-I.D.: bsu-cs.911
Posted: Fri Jul 31 17:13:11 1987
Date-Received: Sun, 2-Aug-87 06:26:20 EDT
References: <218@astra.necisa.oz> <142700010@tiger.UUCP> <2792@phri.UUCP> <857@bsu-cs.UUCP>
Reply-To: dhesi@bsu-cs.UUCP (Rahul Dhesi)
Organization: CS Dept, Ball St U, Muncie, Indiana
Lines: 67
Summary: byte == 8 bits

In article <857@bsu-cs.UUCP> I wrote:

>A byte is therefore exactly 8 bits.  No more and no less.

Amidst all the name-calling that followed, the following objection to
my statement was faintly discernible:

     Not all character sets will fit in 8 bits.

This is true, but it does not affect my claim.  A byte *is* exactly
8 bits.

First, 8 bits suffices for *most* of the world's languages.

Second, even if 8 bits is insufficient to hold a given character set
(and this is true for only a few languages), this simply means that
tradition must give way, and "character" and "byte" will not be
synonymous.  (If ANSI is not prepared for this, it's in for a rude
shock, in my opinion.)

Consider computer communications.  The world's networks deal in 8-bit
units.  Political reality being what it is, it was considered unwise to
call these bytes.  They are called octets.  What does one do with a
machine/character set with 9-bit bytes?  Map them to 8-bit bytes and
lose some information, or split them with shifting/masking and transmit
them as 8-bit units anyway.  One then finds things rather awkward.  One
embraces the 8-bit byte as soon as possible.

Consider the cost-benefit analysis manufacturers must do.  Those that
want bytes to be other than 8 bits must give up the convenience of
using a lot of off-the-shelf parts.  Custom hardware is expensive.

Consider simple elegance.  With a 9-bit byte, one is either stuck with
wasted bits in a 32-bit machine word, or one must use a 36-bit word and
end up with wasted bits within machine instructions and within data
structures and/or get a nonorthogonal machine architecture.  (Aside:
Why do we see useless machine instructions such as "jump never, label"
and "mov a,a"?  Because orthogonality simplifies machine design.)  The
same goes for any other byte size except 16 bits, in which case we
could just as well take a pair of 8-bit bytes and call them by a new
name.

Consider devices.  The 8-bit byte is a standard unit of information
transfer using tape drives.  And I have a hunch most disk drives/
controllers are designed with 512-bytes-per-sector formatting in mind,
which won't neatly fit with any arbitrary byte/word size.

Consider a lot of things, and the 8-bit byte stares you in the face.

And consider that in most cases, if 8 bits are not enough, neither are
9, or 10, or perhaps even 11.

How, then, does one deal with a character set that won't fit in 8 bits?

Predictions:

o    Such characters will, in the future, occupy two bytes.
o    There will be an increasing trend towards using transliterations
     that will allow unusual character sets to be represented using
     the Roman alphabet
o    Increasingly, computations will be done using English, even in
     countries where English is not a major language
o    Special-purpose machines using esoteric sizes of data units will
     continue to exist but will not replace general-purpose computers,
     which will continue to be based on the 8-bit byte.
-- 
Rahul Dhesi         UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi