Xref: utzoo comp.sys.amiga.tech:13729 comp.lang.c:30787 Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!mailrus!ames!haven!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.sys.amiga.tech,comp.lang.c Subject: Re: I/O of complex data structures in C Message-ID: <25884@mimsy.umd.edu> Date: 4 Aug 90 14:27:07 GMT References: <140087@sun.Eng.Sun.COM> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 84 In article <140087@sun.Eng.Sun.COM> jasonf@cetemp.Eng.Sun.COM (Jason Freund) writes: >Followup-to: s (There is no such newsgroup. I put followups back in the groups in which the original appeared.) >Basically, a maze is a complex data structure (a 2D array of and array of >pointers to blah, blah... (it's deep)). So that means I want to use fread() >and fwrite() (right?). Maybe; indeed, even probably: >When you save data in a database, does the program just go: >"fwrite(pointer, sizeof, *pointer, items, stream)" which somehow magically >saves every piece of data (specified in the arguments) in such a way that it >will be able to read in every piece of data back into their correct cells in >the data structure? No. The primary rule of magic is this: `There is no magic'. Fread and fwrite read and write binary data (given a binary stream, as opened via, e.g., fopen(name, "wb")) by writing `nitems' objects, each of whose size is as given, from the location given by the pointer argument. If each object is composed of simple bytes (e.g., ASCII or EBCDIC or ISO Latin 1 text), those bytes will be written directly to the stream. If each object is composed of something more complicated, the bytes making up that object will be written directly to the stream, whether or not that is a sensible thing to do. In particular, if the bytes composing the object are in the form of a pointer, the resulting pointer (when read back via fread) is not guaranteed to be sensible. It will have the same bit pattern that the original pointer had, but that bit pattern may no longer be a valid pointer value---even if the reading is done by the same run of the same program (garbage collecting implementations of C are legal, if rare). If the reading is done by a different run, or---consider the effect of fixing a bug in the game---a different but similar program, the chances are great that the bit pattern will not be useful. So what can you do? There are many approaches. You can assume (as the Unix `rogue' game does) that different runs of the same program will be able to make use of the old values, and instead of writing out just what you need, write out all data. This approach has its pitfalls, as anyone who had a winning game of rogue and saved it for the 17th time will know. You can convert pointers into indicies (provided that the pointers point into contiguous memory regions), and write only the data you need. You can assume that the values of pointers can be used to uniquely identify every object, no matter what its type, and write the data `as is' but convert the pointers when reading back. We used this latter approach to save arbitrary Lisp data in files for later recovery, although in this case the saving routine had to worry about circular data structures and was therefore more complicated--- the output format was a sequence of triples. The id was a unique identifier---probably actually the original pointer value---and the tag and bytes appeared if and only if the id had not yet appeared in the save file. Id 0 was nil. In this case the tag said what kind of Lisp object the bytes represented, and if the object had pointers, e.g., a dotted pair, the were themselves sequences. Thus, a list (a b a) was represented as, e.g., id=1, tag=DTPR, bytes=( car: id=2, tag=ATOM, bytes="a"; cdr: id=3, tag=DTPR, bytes=( car: id=4, tag=ATOM, bytes="b"; cdr: id=5, tag=DTPR, bytes=( car: id=2; cdr: id=0 ) ) ) (I simplified this example; the atoms were actually composed of name, boundp, value-if-bound, property-list sequences.) This sort of format is pretty much ideal for portability (particularly if you record numeric values in printed form). The major disadvantage is that producing and reading such files tends to be slow. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris (New campus phone system, active sometime soon: +1 301 405 2750)