Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!mcsun!tuvie!vmars!hp
From: hp@vmars.tuwien.ac.at (Peter Holzer)
Newsgroups: comp.lang.c
Subject: Re: Binary data file compatibility across machines
Message-ID: <2188@tuvie.UUCP>
Date: 30 Nov 90 14:30:50 GMT
References: <STIBER.90Nov23134600@maui.cs.ucla.edu> <2172@tuvie> <1967@mts.ucs.UAlberta.CA>
Sender: news@tuvie.UUCP
Lines: 74

userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) writes:

>In article <2172@tuvie>, hp@vmars.tuwien.ac.at (Peter Holzer) writes:
>>stiber@cs.ucla.edu (Michael D Stiber) writes:
>>
>>
>>>On different machines, the implementation of C data types is different.
><<<deletions>>>
>>
>>For integer data I choose the format that is used on the machines
>>I am working on most of the time. Each binary data file then
>>gets a header describing the data format. Something like
>>
>><magic number>          2 Bytes: 'P' 'B' (portable binary)
>><integer type>          1 Byte: 0 = 2compl., 1 = 1compl.,
>>                                2 = sign/mag,
>><endianness>            1 Byte: 0 = little, 1 = big.
>><float-format>          ??
>>
>Pardon my curiosity, but, if you write such a file on a
>particular type of machine, then read it back on another,
>won't your code have to do some decoding of this header
>information? Say, for example, you write from an ASCII
>machine and read from an EBCDIC one. The "PB" will map to
>some other combination of characters. Will your program
>determine from whatever they happen to be that the source
>machine is ASCII? I always forget whether "endianness"
>refers to the ordering of bytes in words or bits in bytes -
>if the latter, your program will also have to do some
>juggling to properly decode the third and fourth bytes.
>What about machine architectures you don't know about yet?
>What about 12 and 60 bit machines (PDP8, Cyber)?

You left out the last four lines of my posting:

=Oh yes I am assuming that a character is 8 bits and the machine
=is using the ASCII character set. If that is not the case the 
=program must not use more than the lowest eight bits of any
=character and strings must be converted to ASCII first.

The idea of my data representation is to provide easy access to data on
the machines I usually work on. They use the ASCII character set, have
8bit characters, 16bit shorts and 32bit longs (The minimum sizes
guarantueed by the ANSI-C standard). So the magic number is always 0x50 0x42.

EBCDIC machines must convert character data (Both on read and write).
If they have 8bit-char, 16bit-short, 32bit-long they may read write
integers in their native format. 

Machines which have characters with more than 8bits, or shorts with more
than 16 bits or longs with more than 32 bits must convert integer data
both on read and write. They have to split shorts in two 8bit packets
and longs in 4 8-bit packets and store these 8bit packets as consecutive
characters. Thus a machine with 9bit characters and 36 bit shorts could
only use the lowest 16 bits of its shorts and and would need 2
9bit-characters to store them in a file.
> 
>If transportability is important, use ASCII (pardon, character),
>and let some o/s utility do the conversion. If efficiency is
>paramount, use binary and include a disclaimer about moving
>the file to another machine (you can't, after all, move the
>executable that way, can you?). If both are crucial, provide
>a separate conversion program.

Conversion programs are fine if the data is not moved around much. If
you have a file that is on a file system mounted by different machines
it is not as good (We don't have the situation, VAXes, DECstations and
PCs all have the same data representation, only our real-time system
(based on 68000) does it the other way round).
--
|    _  | Peter J. Holzer                       | Think of it   |
| |_|_) | Technical University Vienna           | as evolution  |
| | |   | Dept. for Real-Time Systems           | in action!    |
| __/   | hp@vmars.tuwien.ac.at                 |     Tony Rand |