Path: utzoo!utgpu!water!watmath!clyde!rutgers!mit-eddie!bloom-beacon!think!ames!oliveb!sun!gorodish!guy From: guy@gorodish.Sun.COM (Guy Harris) Newsgroups: comp.arch Subject: Re: RISC data alignment Message-ID: <39815@sun.uucp> Date: 24 Jan 88 01:04:00 GMT References: <2635@calmasd.GE.COM> <3246@psuvax1.psu.edu> Sender: news@sun.uucp Lines: 130 > >If this is true, then it would seem to also be true that a C structure > >could have different lengths, depending on whether it was compiled > >on a RISC or non-RISC machine. True, but not necessarily for reasons having to do with RISC vs. non-RISC: 1) I know of one CISC that requires 4-byte alignment of 4-byte quantities, and 2-byte alignment of 2-byte quantities: the WE32100. 2) While the VAX does not impose any alignment restrictions, I think most, if not all, VAX implementations run faster if 4-byte quantities are aligned on 4-byte boundaries and 2-byte quantities are aligned on 2-byte boundaries. As such, both the VAX UNIX C compiler and the WE32K C compiler, and probably the VAX/VMS C compiler, align 4-byte quantities in structures on 4-byte boundaries and 2-byte quantities in structures on 2-byte boundaries. The structure as a whole is aligned on the boundary required by its most strictly aligned member. These are the same rules used by the SPARC C compiler; however, on the SPARC *8*-byte quantities (e.g., double-precision floating point numbers) must be aligned on *8*-byte boundaries. These restrictions are not imposed by e.g. the WE32K nor the VAX, so they only align them on 4-byte boundaries. However, there are machines with different alignment restrictions, and C compilers with different alignment rules: 1) The MC68010 requires 2-byte quantities to be aligned on 2-byte boundaries, but does not require 4-byte quantities to be aligned on 4-byte boundaries. Most of the C compilers for UNIX 68K implementations put 4-byte quantites only on 2-byte boundaries, and always align structures on 2-byte boundaries even if no member requires this alignment. These rules are often propagated to the 68020, which imposes no alignment restrictions. 2) The CCI Power 6/32 C compiler, last time I dealt with it, always aligns structures on at least 4-byte boundaries. > >Further, it would seem that if that C structure were written out to a file, > >it could only be read properly by a machine of the same type as that which > >wrote it. > > This is exactly correct. And not only that, it would still be true even if all C implementations imposed the exact same alignment rules! VAXes, National Semiconductor 32Ks, and Intel 80*86es address the bytes within a 2-byte or 4-byte quantity from bottom to top; the least significant byte is byte 0. These architectures are called "little-endian". IBM 360/370s, Motorola 68Ks, AT&T WE32Ks (except for the WE32000), SPARCs, and CCI Power 6/32s address them from top to bottom; the *most* significant byte is byte 0. These architectures are called "big-endian". The WE32000, and, if I remember correctly, the MIPS chips, can select which byte order to use, although I think all WE32000 implementations use the "big-endian" byte order. Tapes, disks, and networks are usually byte-serial. They generally do not record (in the case of tapes and disks) or transmit (in the case of networks) 2-byte or 4-byte quantities in parallel. This means that a sequence of *bytes* will, when copied via tape or disk or transmitted over a network, from a big-endian to a little-endian machine, appear the same. If you put the character string "hi mom" on the tape, disk, or wire, and send it to a machine with the opposite byte sex, that machine will see "hi mom" (assuming, of course, that the hardware and/or software on both ends uses the same character set). However, if you put the number 127 on the tape, disk, or wire as a 4-byte integer, and send it between two machines with different byte sexes, the number will appear to be 2130706432 on the other machine. A machine will generally write a 4-byte integer on tape or disk or send it over the wire by putting the byte with address 0, then the byte with address 1, then 2, then 3. This means that a little-endian machine will put out a byte with the value 127, and then 3 bytes with the value 0. A big-endian machine will put out 3 bytes with the value and then a byte with the value 127. A machine with the opposite byte sex will put the 127 in the *most*-significant byte of the integer and put the zeroes in the lower three bytes. Furthermore, floating-point formats differ in ways other than their byte order. Most of the architectures listed above use the IEEE floating-point format (either directly or in their floating-point coprocessors); however, neither the IBM 360/370 nor the VAX do, and I don't think the Power 6/32 does either. And, on top of that, the size of the C data types are not guaranteed to be the same. "int" is generally 4 bytes on the 360/370, VAX, the NS32K, WE32K, SPARC, and MIPS architectures. It may be 2 or 4 bytes on the 80*86 and Motorola 68K architectures, depending on the implementation. It may be *8* bytes on a supercomputer. It may be *3* bytes on a 24-bit machine. On top of this, there's not even a guarantee that a byte is 8 bits, or that an "int" is 16 or 32 bits; there exists at least two C implementations on 32-bit machines, one of which even runs UNIX. In short, the statement made by Scott Schwartz in the summary line: you had better use XDR or something similar is 10,000% true, as is the statement in the original article: Further, it would seem that if that C structure were written out to a file, it could only be read properly by a machine of the same type as that which wrote it. There are exceptions to this statement: a structure written out on an Intel 386-based machine *might* be readable directly on a NS32K-based machine, for instance - althought I don't know that their alignment rules or floating-point formats are the same (both are, I think, IEEE, but I don't know that the byte order in *floating*-point numbers is the same). These exceptions are rare, and as indicated I don't even know which of them really exist. If you want to write data to a file or put it out on the network so that some other machine of a different type can read it, *don't* just dump a raw structure; use the Sun XDR library, or roll your own routines that put things out in a standard byte order with a standard floating point format, standard alignment, standard data sizes, etc., etc.. And as for the particular question: > >Does such incompatibilty truly exist? If I create a file on a Sun/4 > >will I be able to read it on a Sun/3? As Mr. Schwartz has already pointed out, the answer is "yes". The Sun-3 uses the MC68020 chip, and uses the alignment rules that most 68K UNIX C implementations use: structures are always aligned on at least a 2-byte boundary, and most quantities are only aligned on 2-byte boundaries. The Sun-4 uses the SPARC chip, and uses the rules listed above for that chip: structures may be aligned on 1-byte boundaries if they contain nothing requiring a stricter alignment, 4-byte quantities are aligned on 4-byte boundaries, and 8-byte quantities are aligned on 8-byte boundaries. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com