Xref: utzoo comp.arch:7963 comp.misc:4776 comp.lang.misc:2583 comp.protocols.misc:463
Path: utzoo!attcan!uunet!lll-winken!ncis.llnl.gov!helios.ee.lbl.gov!nosc!ucsd!sdcsvax!trantor.harris-atd.com!x102c!bbadger
From: bbadger@x102c.uucp (Badger BA 64810)
Newsgroups: comp.arch,comp.misc,comp.lang.misc,comp.protocols.misc
Subject: Re: "big endian" and "little endian" - first usage for computer
Keywords: dump little-endian strings
Message-ID: <1447@trantor.harris-atd.com>
Date: 21 Jan 89 20:35:48 GMT
References: <170@microsoft.UUCP> <4008@hubcap.UUCP> <482@babbage.acc.virginia.edu> <5658@cbmvax.UUCP> <1433@trantor.harris-atd.com> <5703@cbmvax.UUCP>
Sender: news@trantor.harris-atd.com
Reply-To: bbadger@x102c.UUCP (Badger BA 64810)
Organization: Harris GISD, Melbourne, FL
Lines: 112

In article <5703@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:
>In article <1433@trantor.harris-atd.com> bbadger@x102c.UUCP (Badger BA 64810) writes:
>>In article <5658@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:
>>>2)  Hardware people like to draw diagrams with 0 at bottom-right, software
>>>people, used to printers and screens that print top to bottom, left to right,
>>>like to put 0 at upper-left.  It also makes dumping memory with strings easier
>>>to read.
>
>>DEC VAX DUMP prints out in a format that makes both integers and strings 
>>easy to read.  Namely, it prints out each in their ``natural'' order:
>>Integers in little-endian (right to left), and strings from left to right.
>
>> 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000
>> 69685420 5A595857 56555453 5251504F OPQRSTUVWXYZ Thi 000010
>> 74736574 20612079 6C6E6F20 73692073 s is only a test 000020
>>     <----- numbers go this way <---*---> strings go this way --->
>>
>>People who expect the first word (000000) to appear first (at left) will be 
>>suprised by this, but it's perfectly consistent with the way we write 
>>our numbers and strings.
>
>	I don't know about you (or your hardware), but I tend to write from
>left to right, not right to left.  :-)  And I don't start writing in the
>middle of the page, and go both left and right from there.  :-)
>
Actually, my hardware (VT100 terminal) normally writes left-to-right, but
this doesn't stop me from *reading* right-to-left (and LtR) once an entire 
line is on-screen.
>	Sure you can write this way, or even make things scroll up, but
>most terminals/whatever are easier to deal with in a sequential, left to
>right, top to bottom fashion.  It's marginally more annoying to deal with
>in your way.  Also, I get a headache trying to find the word/byte/whatever
>I'm looking for in a listing like that, I have to reverse my thinking.  :-)
	(Left-to-right and Top-to-bottom are separate issues.)
>
>	Personally, that's a nice kludge to get around the fact that little-
>endian is "naturally" written right to left, bottom to top by most people.
>However, people don't read that way, certainly not text.
>
Aaahh! That's just it.  People reading VMS DUMP output looking for numbers 
*do* read from right-to-left (RtL) (once they get the hang of it :-).  
It's not really hard, and it make sense of all lengths of integers from 
1 byte to n.  The reasons for *choosing* big- or little-endian integer 
representations play more to hardware and software issues than adherence 
to historical human reading conventions.  The point I'm trying to make about 
DUMP output is that (Western) people expect to be able to *read* numeric 
output from left-to-right with the most-significant digits first.  If you 
think the first (i.e., leftmost) byte printed should also have the lowest 
byte-address, you are really *specifying* big-endian order.  By dropping 
this abitrary restriction, VMS DUMP can print the bytes out in a contiguous 
block for that line.

Taking the first line of the dump as an example, 
>> 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000
note that the first two bytes of the file specify a single integer number,
LSB order:  002F  ==> byte(0) = 2F, byte(1) = 00.  It's certainly easier to 
read written MSB (002F) than in storage order (2F00).   
If the next element of the file were ``really'' an INTEGER*4 variable 
(please excuse the use of FORTRAN in mixed company :-), you would catenate 
the "4443 4241" into 44434241.  But if it turned out to be two INTEGER*2 
values you would read "4241" first, then "4443".  

This does result in your eyes moving RtL to increment addressing -- as when 
counting to a specific offset in a record structure -- and then scanning 
back from LtR to read an integer.  This is far easier to put up with than 
printing hexadecimal output with addresses increasing from left-to-right on 
a little-endian machine! 

As far as consistency goes, I always liked the fact that on little-endian 
architectures, the bit numbering (0..31) makes bit $ k $ represent 
$ 2^k $ no matter what the word size is.  Whereas on big-endian 32-bit words 
bit $ k $ equals $ 2 ^ {31 - k} $ and on 16-bit (half) words, the value is
$ 2 ^ {15 - k}$.
That is:
LSB (little-endian):	
        3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
        1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
2^7 =   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
   So 2^7 sets bit number 7.
MSB (little-endian):	
        0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2^7 =  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
   So 2^7 sets bit number 24.
        0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 
	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
2^7 =   0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
   So 2^7 sets bit number 8.

Normally we can sweep these distinctions under a rug of abstraction.  It's 
only when we start to examine machine code or numeric representations that 
we operate on that low a level.

>	I think little-endian is a long-standing joke played by hardware
>engineers of software writers.  :-)
Right.  So if we just play along with the joke in DUMP output, we won't have 
to tangle up our bits too badly.  Of course, then there's communications 
software where some data is MSB and some is LSB, depending whether you're 
using the host format or the network format.  In that case, no matter which 
way we print our dump lines, some data will be written with the LSB on the 
left.

P.S.  You mentioned the bottom/top issue:  whether to print the low addresses 
at the top (normal first-things-first order) or at the bottom (like most 
hardware address space diagrams, or STACK dumps).  Again the most convenient 
order depends on the use that is made of the data, what its internal format 
*is*.  Both forms of output are useful.  The VAX DUMP doesn't have a "FFFFFFFF
at top" option.  Too bad.  
Bernard A. Badger Jr.	407/984-6385   | ``Use the Source, Luke!''
Secure UNIX Products                   | That's not a bug! It's a feature!
Harris GISD, Melbourne, FL  32902      | Buddy, can you paradigm?
Internet: bbadger@x102c.harris-atd.com | 's/./&&/' Tom sed [sic] expansively.