Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!decvax!decwrl!sun!guy From: guy@sun.uucp (Guy Harris) Newsgroups: net.unix-wizards Subject: Re: Magic Numbers (and incredible stupidity in "cpio") Message-ID: <3061@sun.uucp> Date: Sat, 7-Dec-85 01:03:45 EST Article-I.D.: sun.3061 Posted: Sat Dec 7 01:03:45 1985 Date-Received: Mon, 9-Dec-85 03:33:12 EST References: <124@rexago1.UUCP> <416@ihdev.UUCP> <908@ncoast.UUCP> Distribution: net Organization: Sun Microsystems, Inc. Lines: 116 > Executables using ``standard'' binary formats, i.e. a.out (PDP-11, Z8000) > and b.out (MC68000) use the standard magic numbers 0405, 0407, 0410, 0411. > Non-standard formats, like Xenix x.out (0x0206) and COFF (flames to > /dev/null; most systems are [ab].out) use distinctive magic numbers. Well, VAX UNIX (32V, 4.xBSD, System III, Version 8?) also uses those magic numbers (with 413 added for demand paged executables on 4.xBSD), and probably lots of other 4.xBSD systems (Sun's does). Does "most" mean "most UNIX implementations" or "most boxes running UNIX"? If the latter, I think Xenix is running on a lot of systems, possibly most. Then again, *my* copy of "Xenix(TM) Standard Object File Format (January 1983)" implies that that "0x0206" is the "magic number" and is *not* distinctive; the "x_cpu" field indicates what CPU it's intended for. (This is sort of like the new Sun UNIX 3.0 object file format, where the "a_machtype" field indicates whether it's intended for a 68010 or 68020). COFF seems to invert this, since the "file header" indicates what machine it's intended for (and tons of other glop) and the "UNIX header" (which is basically the old a.out header) has the 0405, 0407, 0410, 0411, and 0413 (yes, that's what they use for paged executables, surprise surprise) which indicates the format of the image but is machine-independent (modulo byte ordering). Then again, the "file header" magic number seems to indicate something about the format of the executable, but see a previous posting of mine for some dyspepsia caused by the proliferation of multiple file header magic numbers. > There are other magic numbers. Old-style archives (ar) have 0177545 as a > magic number; again, the loader knows about this, since a library is an > archive. System V archives begin with the magic ``number'' "!\n". System V, Release 2 archives, anyway; System V Release 1 had a portable archive format which was different from the 4.xBSD one which was the first one to use the "!\n" magic "number". I'm told they came to their senses because Version 8, being 4.1BSD-based, used that format. > Cpio archives also have magic numbers in them, but at the archive-member > level. No, it has a magic number at the beginning - 070707 (either as a "short" or a string, depending on whether it's an old cruddy "cpio" archive or a nice new "gee, we've finally caught up with 'tar' when it comes to portability" "cpio -c" archive. (S3 had "-c", but it had a bug so it wasn't really portable. S5 fixed this bug. S5 also broke the byte-swapping garbage: S3 had an option to swap the bytes within 2-byte quantities. Presumably, this was because running the tape through "dd" to byte-swap *everything*, and then byte-swapping the data and pathnames inside "cpio", thus swapping the binary portion of the header once and everything else twice, is obviously more efficient than just swapping the binary portion of the header once. ("cpio" already has hacks to deal with 4-byte quantities - namely, file size and modified time - automagically, by shoving "1L" into a "long" and seeing whether the 0th byte of that "long" is 0 or not, so PDP-11s and VAXes don't have problems.) It is also obvious that forcing the user to specify a byte-swapping option is better than just looking at the magic number and seeing whether it's 070707 or a byte-swapped 070707 and deciding whether to swap or not based on that. Whoever worked on "cpio" for S5 obviously figured that the purpose of this byte-swapping crap was to make it possible to move binary data between machines with different byte orders (as everybody knows, most files with binary data are continuous streams of 2-byte or 4-byte quantities), not to provide a gross and kludgy way of byte-swapping the binary portion of a "cpio" header, so they added an option to swap the 2-byte portions of 4-byte quantities ("stupid FP-11", to quote - if I remember correctly - the VAX System III linker, that particular piece of DEC hardware being responsible for some PDP-11 software, including but *NOT* limited to UNIX, having a different format for 32-bit integers than the VAX's hardware supports) and an option to swap both bytes and 2-byte quantities. They also "fixed" it not to swap the bytes of the pathnames. This "fix" means that running the "cpio" archive through "dd" to swap the bytes, and then doing a byte swap again in "cpio", results in path names with their bytes swapped! ("/nuxi", anyone?) In effect, you are now screwed if you have a "cpio" tape, not made with "-c", which was produced on a machine with a different byte order. You can't read it in conveniently. (This has been experimentally verified. I had to whip up a version of "cpio" which does what "cpio" should have done in the first place - namely, just byte swap the damn "short"s in the header - to read a tape made on a System V VAX using the System V "cpio" on a Sun.)) There are a number of quite intelligent and talented people working on UNIX development at AT&T Information Systems. It looks like the people in charge of keeping track of COFF magic numbers, and in charge of "cpio", are in need of some supervision by the aforementioned people. (Fortunately, it looks like the IEEE P1003 committee is looking at a "tar"-based format, with fixes to support storing information about directories and special files, for tapes. I'm told that the European UNIX vendor consortium, X/OPEN, chose a "cpio" format because of the "cpio" *program*'s byte-swapping "capabilities". Aside from the basic stupidity (and incorrectness, in the case of the S5 "cpio") of these "capabilities", they are irrelevant to the choice of tape *format* because: 1) "tar" doesn't need byte-swapping options because the control information is in printable ASCII string format (any tape controller which is good as anything other than a target for skeet-shooting will write character strings in memory out to the tape in character-string order) 2) "cpio" has the "-c" option which does the same thing, so it doesn't need those options except for reading old tapes (any reasonable "cpio"-format-based standard would be based on "cpio -c" format, not "cpio" format), and 3) a *good* program which handles "cpio" format can figure out the byte order it needs for reading pre-"cpio -c" tapes by looking at the magic number anyway! (Flame off, until next time a collection of stupidities this gross comes to light.) Guy Harris