Path: utzoo!attcan!uunet!tank!ncar!mailrus!tut.cis.ohio-state.edu!bloom-beacon!athena.mit.edu!scs From: scs@athena.mit.edu (Steve Summit) Newsgroups: comp.lang.c Subject: Re: Portability across architectures.. Keywords: Portability, common data, files Message-ID: <7038@bloom-beacon.MIT.EDU> Date: 13 Sep 88 01:29:30 GMT References: <103@simsdevl.UUCP> Sender: daemon@bloom-beacon.MIT.EDU Reply-To: scs@adam.pika.mit.edu (Steve Summit) Distribution: all Lines: 91 In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes: > I've run across a need to have data files in various forms of UN*X > be portable to each other. > I could write data out to files in ASCII, but this is cumbersome, > slow and may hamper the products' marketability. Please strongly consider using ASCII after all. The advantages are many; the disadvantages are comparatively minor. 1. ASCII is well-nigh universal; portability is virtually assured. Even if you ever want to go to an EBCDIC machine, conversion utilities are bound to be readily available (and conversion may indeed happen implicitly when transferring a text file to such a machine). 2. It's usually not nearly as inefficient as you'd think. Ironically, even sophisticated computer programmers commonly ignore the fact that computers are just blisteringly fast and can usually complete a seemingly inefficient ASCII parse in far lees time than it takes to think about it. (I am aware that there are high- bandwidth, high-performance systems which cannot afford the luxury of an ASCII parse, and are well-advised to use binary transfer methods. I maintain that surprisingly many real applications do not fall into this category, and can use ASCII without paying a performance penalty.) 3. Reading and writing ASCII formats isn't really that cumbersome; in fact I'd argue that binary formats, when properly designed to account for word ordering and other difficulties which ASCII formats easily overcome, are more cumbersome in the long run. 4. Don't overlook debugging. ASCII formats can be inspected with cat, piped through grep and sed and other familiar utilities, patched with ordinary text editors, etc., etc. The first program you write for your binary format is usually not the application you were trying to write, but the disassembler you find you need for debugging; getting the disassembler working is often a prerequisite for getting the end application working. 5. ASCII formats can make good, backwards-compatible version number schemes easy to implement. Data formats inevitably require revision to accommodate new features. Fixed binary formats, especially those that simply write structures out as bytes, are usually not amenable to such changes, unless you did a lot of work to make them extensible (which is another aspect that makes binary formats more, not less, cumbersome than ASCII). Introducing a "version 2" format then requires a host of extra translation utilities, and nasty incompatibility problems when programs try to read files of the wrong format. (These compatibility problems can be successfully worked around, but only if all files contain a version number, which is usually not recognized or implemented until version 1 is in place and version 2 is being contemplated, by which time it's too late.) Suppose, on the other hand, that your ASCII format consists of arbitrary lines of text, with a keyword at the beginning of each line indicating what kind of data, (e.g. what field of a structure) that line contains. If programs ignore unrecognizable lines (a good practice), "version 1" programs can read "version 2" files without modification, if the version 2 keywords are a superset of version 1's. Version 1 filters and editors can even modify version 2 files, without losing version-2-specific information, by saving, and echoing to the output, unrecognized lines without interpretation. (It's true that a binary format employing variable-length records with a type field in a consistent place would also enjoy these advantages. Such records are in fact common in network protocols.) The only real problem I've ever had with ASCII data interchange formats is that you tend to lose a bit of precision when reading and writing doubles, but you can minimize this by printfing things with %.ne, for n sufficiently large. If the precision inherent in the data is less than that of a double, you're only "losing" something you didn't have in the first place. I'm not sure how using ASCII data formats could "hamper a products' marketability." If not an efficiency concern, it's probably some attempt to keep information hidden in a cryptic binary format rather than having it in plain text that anyone could read. Steve Summit scs@adam.pika.mit.edu