Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!mcsun!tuvie!vmars!hp From: hp@vmars.tuwien.ac.at (Peter Holzer) Newsgroups: comp.lang.c Subject: Re: Binary data file compatibility across machines Message-ID: <2172@tuvie> Date: 26 Nov 90 16:09:34 GMT References: Sender: news@tuvie Lines: 82 stiber@cs.ucla.edu (Michael D Stiber) writes: >On different machines, the implementation of C data types is different. >I forget what fixed types' lengths are, but I know that at least some of >them may vary. I also know that doubles can have different encoding >schemes (ie, IEEE vs. DEC). Then, there's little endian machines versus >big endian ones. >So, my question is this: Say you want to share data files among >different machines. You also want to be able to use the same code on >each machine. Therefore, you want to have either a uniform file format, >or you want the code to be able to figure out what the file format is, >and convert it to the native data type representation. Now, one alternative >would be ASCII files --- this is guaranteed to work (assuming that you >can get C on an IBM 3090 to write ASCII). However, in my application, >ASCII would produce files that are way too huge --- I must use a binary >format. So, is there an already-existing, standard solution to this >problem of binary data file transfer? >-- > Michael Stiber > stiber@cs.ucla.edu > ...{ucbvax,ihpn4}!ucla-cs!stiber > UCLA Computer Science Dept. I do not know of any standard solution (ANSI or ISO or something) but here is my personal ``standard'': (Well, most of the time I just use ASCII files. They are not that much bigger, and I can examine (and change!!) them with standard tools) For integer data I choose the format that is used on the machines I am working on most of the time. Each binary data file then gets a header describing the data format. Something like 2 Bytes: 'P' 'B' (portable binary) 1 Byte: 0 = 2compl., 1 = 1compl., 2 = sign/mag, 1 Byte: 0 = little, 1 = big. ?? I didn't need float format until now but I would adopt at least two different formats: IEEE and a generic format where a float is broken into a mantissa (long int) and exponent (short int). Shorts are assumed to be 2 bytes, longs 4 bytes (The minimums required by ANSI). A program which reads these files would first check if the data format is the same as it uses internally. If it does it can use fread/fwrite for the rest of the file, else it has to call special routines to deal with the various types. Most of the time the file will be read on the same machine it was written, so files can usually be read fast. A portable routine to read a big-endian sign/magnitude long would then be: long read_long_bs (FILE * fp) { unsigned long ul; long l; ul = getc (fp); ul = (ul << 8) | getc (fp); ul = (ul << 8) | getc (fp); ul = (ul << 8) | getc (fp); l = ul & 0x80000000 ? - (ul & 0x7fffffff) : ul; return l; } Oh yes I am assuming that a character is 8 bits and the machine is using the ASCII character set. If that is not the case the program must not use more than the lowest eight bits of any character and strings must be converted to ASCII first. -- | _ | Peter J. Holzer | Think of it | | |_|_) | Technical University Vienna | as evolution | | | | | Dept. for Real-Time Systems | in action! | | __/ | hp@vmars.tuwien.ac.at | Tony Rand |