Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site cca.UUCP Path: utzoo!linus!decvax!cca!g-rh From: g-rh@cca.UUCP (Richard Harter) Newsgroups: net.lang.c Subject: Re: Byte order (retitled) Message-ID: <7127@cca.UUCP> Date: Thu, 10-Apr-86 02:48:18 EST Article-I.D.: cca.7127 Posted: Thu Apr 10 02:48:18 1986 Date-Received: Fri, 11-Apr-86 20:42:04 EST References: <> <7046@cca.UUCP> <> Reply-To: g-rh@cca.UUCP (Richard Harter) Organization: Computer Corp. of America, Cambridge Lines: 129 Summary: In article <> ggs@ulysses.UUCP (Griff Smith) writes: >... >> Well, no, little-endian came about because the engineers at DEC >> who designed the PDP-11 made an arbitrary decision that was not well >> thought out. I will not essay to defend the sanity of DEC engineers, >> and cannot recommend that any one else do so (:-)). It was a bad >> decision. >. >... >> In short, little-endian was a mistake, is a mistake, and will continue >> to be a mistake. >> >> Richard Harter, SMDS Inc. > >As an old PDP-11 hacker, I can't agree with the condemnation of the >DEC engineering decision. You are looking at it from the perspective >of a modern software engineer who wouldn't think of type punning and >other hacks. To an assembly language programmer, however, the ability >to use the same address to test the low and high bytes of a device >status register meant that code would be shorter and faster. It also >increased the number of cases where indirect addressing could be used >with register pointers. You can't expect the engineers to have >anticipated that high-level languages would discredit these practices. Ah, but I too am an old PDP-11 hacker. (In fact, my first DEC machine was a PDP-1!) I've done all those good things you talk about -- however you could do exactly the same things in a correctly designed big-endian machine. The issues at hand have nothing to do with modern software engineering and high-level languages. See below. > >My own theory about big vs. little end usage is that the mistake was >made hundreds of years ago when merchants started to adopt the Arabic >(as adapted from earlier Hindu sources) number system. Note that >Arabic is written right to left; note that numbers are written right >to left. I think the Arabs knew what they were doing; they set the >notation so that the natural computational order followed the >conventional lexical order. The European merchants missed the point >and copied the notation verbatum instead of compensating for the >opposite lexical convention. > >In summary, big-endian was a mistake, but there is no use fighting it. >Any better-informed historical challenges will be cheerfully accepted; >the best data I could get was from an ex-patriot of Iran. Being that we (You and I and a select few others) are all reasonable beings let us all eschew slogans and epitaths (especially me) and reason together. Perhaps we can find some truth. Let us see. Discussions of little-endian vs big-endian are often muddied by two collateral issues, the merits of coherent addressing, and the merits of byte addressing. Coherent addressing is a neologism I have invented for this discussion. All it really means is that all addressing goes in the same direction. Thus bit 0 of byte 0 of a word is bit 0 of the word, etc. The diagram below illustrates coherent addressing: Byte: 0 1 2 3 4 5 6 7 8 .... Int*2: 0 1 2 3 4 .... Int*4: 0 1 2 .... Coherent addressing is clearly desirable. However coherent addressing has nothing to do, per se, with little-endian or big-endian. A common point of confusion in these arguments is to argue for the advantages of coherent addressing in the belief that one is thereby arguing for ones favorite position. The PDP-11 uses uniform byte addressing. The advantage of uniform byte addressing is that the addresses are independent of the size of the entity being addressed. The disadvantage of uniform byte addressing is that it consumes address bits -- one for shorts, two for longs, and more if we extend the scheme to larger blocks. This is not critical in present architectures; it would be if we dropped to the level of uniform bit addressing. Again, the merits of little-endian versus big-endian have nothing to do with the merits of uniform byte addressing. Big-endian versus little-endian only arises when we decide which bit (byte) of a word is byte 0 -- the most signifigant byte or the least signifigant byte. Either choice will do for coherent addressing. However the choice does affect two areas, arithmetic and comparison. Let us consider the problems of doing arithmetic on integers of indefinite size. In that case the natural method is to represent integers as polynomials in powers of two and do the arithmetic starting with the lsb and work up (the point is that the algorithms do not depend on knowing the location of the msb's of the operands.) In short, little-endian is the correct choice for doing arithmetic on integers of indefinite size. For comparisons of strings of indefinite size, on the other hand, the correct choice is big-endian. The key point is that the natural method for comparison is to first compare msb's and work down. It turns out, if one thinks about it, that the natural model for strings is the binary fraction rather than binary polynomial. The advantages of little-endian probably show up in hardware design at the microcode level even though the machine instructions for arithmetic operate on fixed size operands. If this is the case then this might have been a design factor when the PDP-11 was first being designed. (Cheap small computers were a LOT slower in those days, and every trick to gain speed and simplicity at the hardware level counted.) At the machine code level, however, all arithmetic instructions are fixed length so little-endian/big-endian is, again, irrelevant. In general programming, indefinite length arithmetic is a rare animal. Comparison of strings, either bits or characters, is ubiquitous. And it is here that big-endian is preferable. In view of this fact I conclude again, little-endian was a mistake, albeit less of one than I made it out to be. Some side issues: We write left to right because most people are right handed. Script style matters here -- if we are drawing characters the direction doesn't matter. In choosing the representation of numbers the issues are the same as they are for byte ordering. If size is the issue, big-endian has the advantage; if arithmetic is the issue, little-endian has the advantage. As a personal note, when I was young I drilled myself on fast multiplication. I could write down two numbers and then write down their product underneath at hand writing speed. When I was in practice I could do products of five digit numbers readily. The trick isn't hard -- you just do cross multiplication. However you have to do it little-endian style, i.e. work from the low end up. Richard Harter, SMDS Inc.