Xref: utzoo comp.arch:7963 comp.misc:4776 comp.lang.misc:2583 comp.protocols.misc:463 Path: utzoo!attcan!uunet!lll-winken!ncis.llnl.gov!helios.ee.lbl.gov!nosc!ucsd!sdcsvax!trantor.harris-atd.com!x102c!bbadger From: bbadger@x102c.uucp (Badger BA 64810) Newsgroups: comp.arch,comp.misc,comp.lang.misc,comp.protocols.misc Subject: Re: "big endian" and "little endian" - first usage for computer Keywords: dump little-endian strings Message-ID: <1447@trantor.harris-atd.com> Date: 21 Jan 89 20:35:48 GMT References: <170@microsoft.UUCP> <4008@hubcap.UUCP> <482@babbage.acc.virginia.edu> <5658@cbmvax.UUCP> <1433@trantor.harris-atd.com> <5703@cbmvax.UUCP> Sender: news@trantor.harris-atd.com Reply-To: bbadger@x102c.UUCP (Badger BA 64810) Organization: Harris GISD, Melbourne, FL Lines: 112 In article <5703@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes: >In article <1433@trantor.harris-atd.com> bbadger@x102c.UUCP (Badger BA 64810) writes: >>In article <5658@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes: >>>2) Hardware people like to draw diagrams with 0 at bottom-right, software >>>people, used to printers and screens that print top to bottom, left to right, >>>like to put 0 at upper-left. It also makes dumping memory with strings easier >>>to read. > >>DEC VAX DUMP prints out in a format that makes both integers and strings >>easy to read. Namely, it prints out each in their ``natural'' order: >>Integers in little-endian (right to left), and strings from left to right. > >> 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000 >> 69685420 5A595857 56555453 5251504F OPQRSTUVWXYZ Thi 000010 >> 74736574 20612079 6C6E6F20 73692073 s is only a test 000020 >> <----- numbers go this way <---*---> strings go this way ---> >> >>People who expect the first word (000000) to appear first (at left) will be >>suprised by this, but it's perfectly consistent with the way we write >>our numbers and strings. > > I don't know about you (or your hardware), but I tend to write from >left to right, not right to left. :-) And I don't start writing in the >middle of the page, and go both left and right from there. :-) > Actually, my hardware (VT100 terminal) normally writes left-to-right, but this doesn't stop me from *reading* right-to-left (and LtR) once an entire line is on-screen. > Sure you can write this way, or even make things scroll up, but >most terminals/whatever are easier to deal with in a sequential, left to >right, top to bottom fashion. It's marginally more annoying to deal with >in your way. Also, I get a headache trying to find the word/byte/whatever >I'm looking for in a listing like that, I have to reverse my thinking. :-) (Left-to-right and Top-to-bottom are separate issues.) > > Personally, that's a nice kludge to get around the fact that little- >endian is "naturally" written right to left, bottom to top by most people. >However, people don't read that way, certainly not text. > Aaahh! That's just it. People reading VMS DUMP output looking for numbers *do* read from right-to-left (RtL) (once they get the hang of it :-). It's not really hard, and it make sense of all lengths of integers from 1 byte to n. The reasons for *choosing* big- or little-endian integer representations play more to hardware and software issues than adherence to historical human reading conventions. The point I'm trying to make about DUMP output is that (Western) people expect to be able to *read* numeric output from left-to-right with the most-significant digits first. If you think the first (i.e., leftmost) byte printed should also have the lowest byte-address, you are really *specifying* big-endian order. By dropping this abitrary restriction, VMS DUMP can print the bytes out in a contiguous block for that line. Taking the first line of the dump as an example, >> 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000 note that the first two bytes of the file specify a single integer number, LSB order: 002F ==> byte(0) = 2F, byte(1) = 00. It's certainly easier to read written MSB (002F) than in storage order (2F00). If the next element of the file were ``really'' an INTEGER*4 variable (please excuse the use of FORTRAN in mixed company :-), you would catenate the "4443 4241" into 44434241. But if it turned out to be two INTEGER*2 values you would read "4241" first, then "4443". This does result in your eyes moving RtL to increment addressing -- as when counting to a specific offset in a record structure -- and then scanning back from LtR to read an integer. This is far easier to put up with than printing hexadecimal output with addresses increasing from left-to-right on a little-endian machine! As far as consistency goes, I always liked the fact that on little-endian architectures, the bit numbering (0..31) makes bit $ k $ represent $ 2^k $ no matter what the word size is. Whereas on big-endian 32-bit words bit $ k $ equals $ 2 ^ {31 - k} $ and on 16-bit (half) words, the value is $ 2 ^ {15 - k}$. That is: LSB (little-endian): 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 2^7 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 So 2^7 sets bit number 7. MSB (little-endian): 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2^7 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] So 2^7 sets bit number 24. 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 2^7 = 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 So 2^7 sets bit number 8. Normally we can sweep these distinctions under a rug of abstraction. It's only when we start to examine machine code or numeric representations that we operate on that low a level. > I think little-endian is a long-standing joke played by hardware >engineers of software writers. :-) Right. So if we just play along with the joke in DUMP output, we won't have to tangle up our bits too badly. Of course, then there's communications software where some data is MSB and some is LSB, depending whether you're using the host format or the network format. In that case, no matter which way we print our dump lines, some data will be written with the LSB on the left. P.S. You mentioned the bottom/top issue: whether to print the low addresses at the top (normal first-things-first order) or at the bottom (like most hardware address space diagrams, or STACK dumps). Again the most convenient order depends on the use that is made of the data, what its internal format *is*. Both forms of output are useful. The VAX DUMP doesn't have a "FFFFFFFF at top" option. Too bad. Bernard A. Badger Jr. 407/984-6385 | ``Use the Source, Luke!'' Secure UNIX Products | That's not a bug! It's a feature! Harris GISD, Melbourne, FL 32902 | Buddy, can you paradigm? Internet: bbadger@x102c.harris-atd.com | 's/./&&/' Tom sed [sic] expansively.