Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!rochester!cornell!uw-beaver!mit-eddie!bloom-beacon!mit-hermes!iuvax!bsu-cs!dhesi From: dhesi@bsu-cs.UUCP (Rahul Dhesi) Newsgroups: comp.unix.wizards Subject: byte != 8 bits Message-ID: <911@bsu-cs.UUCP> Date: Fri, 31-Jul-87 17:13:11 EDT Article-I.D.: bsu-cs.911 Posted: Fri Jul 31 17:13:11 1987 Date-Received: Sun, 2-Aug-87 06:26:20 EDT References: <218@astra.necisa.oz> <142700010@tiger.UUCP> <2792@phri.UUCP> <857@bsu-cs.UUCP> Reply-To: dhesi@bsu-cs.UUCP (Rahul Dhesi) Organization: CS Dept, Ball St U, Muncie, Indiana Lines: 67 Summary: byte == 8 bits In article <857@bsu-cs.UUCP> I wrote: >A byte is therefore exactly 8 bits. No more and no less. Amidst all the name-calling that followed, the following objection to my statement was faintly discernible: Not all character sets will fit in 8 bits. This is true, but it does not affect my claim. A byte *is* exactly 8 bits. First, 8 bits suffices for *most* of the world's languages. Second, even if 8 bits is insufficient to hold a given character set (and this is true for only a few languages), this simply means that tradition must give way, and "character" and "byte" will not be synonymous. (If ANSI is not prepared for this, it's in for a rude shock, in my opinion.) Consider computer communications. The world's networks deal in 8-bit units. Political reality being what it is, it was considered unwise to call these bytes. They are called octets. What does one do with a machine/character set with 9-bit bytes? Map them to 8-bit bytes and lose some information, or split them with shifting/masking and transmit them as 8-bit units anyway. One then finds things rather awkward. One embraces the 8-bit byte as soon as possible. Consider the cost-benefit analysis manufacturers must do. Those that want bytes to be other than 8 bits must give up the convenience of using a lot of off-the-shelf parts. Custom hardware is expensive. Consider simple elegance. With a 9-bit byte, one is either stuck with wasted bits in a 32-bit machine word, or one must use a 36-bit word and end up with wasted bits within machine instructions and within data structures and/or get a nonorthogonal machine architecture. (Aside: Why do we see useless machine instructions such as "jump never, label" and "mov a,a"? Because orthogonality simplifies machine design.) The same goes for any other byte size except 16 bits, in which case we could just as well take a pair of 8-bit bytes and call them by a new name. Consider devices. The 8-bit byte is a standard unit of information transfer using tape drives. And I have a hunch most disk drives/ controllers are designed with 512-bytes-per-sector formatting in mind, which won't neatly fit with any arbitrary byte/word size. Consider a lot of things, and the 8-bit byte stares you in the face. And consider that in most cases, if 8 bits are not enough, neither are 9, or 10, or perhaps even 11. How, then, does one deal with a character set that won't fit in 8 bits? Predictions: o Such characters will, in the future, occupy two bytes. o There will be an increasing trend towards using transliterations that will allow unusual character sets to be represented using the Roman alphabet o Increasingly, computations will be done using English, even in countries where English is not a major language o Special-purpose machines using esoteric sizes of data units will continue to exist but will not replace general-purpose computers, which will continue to be based on the 8-bit byte. -- Rahul Dhesi UUCP: {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi