Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ncar!husc6!bloom-beacon!adam.pika.mit.edu!scs From: scs@adam.pika.mit.edu (Steve Summit) Newsgroups: comp.lang.c Subject: Re: binary data files Message-ID: <11021@bloom-beacon.MIT.EDU> Date: 2 May 89 06:36:25 GMT References: <10946@bloom-beacon.MIT.EDU> <12546@ut-emx.UUCP> <8758@csli.Stanford.EDU> Sender: daemon@bloom-beacon.MIT.EDU Reply-To: scs@adam.pika.mit.edu (Steve Summit) Lines: 69 In article <8758@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes: >I agree that in many cases it is desirable to use ASCII data files, >but in some situations binary is better. One such situation is when >you need to know how many items are in the file before you read it >(say to allocate storage). If the data is binary you just >stat the file and divide by the item size. Actually, this illustrates another thing it's worth shying away from if you can. The assumption that you can determine, without actually reading them, exactly how many characters a file contains, can get you in to trouble, although of course it's a perfectly valid assumption on Unix systems. Not so on VMS and MS-DOS and doubtless other lesser systems -- stat() or the equivalent may only give you an approximation. A prime example is Unix tar format: a tar file consists of a file header, followed by a file, followed by a file header, etc. The file header contains the (following) file's size; the size must be exact because the program reading the tar file must use it to determine where the file ends and the next header begins. It's trivial to write the header correctly on Unix: just stat the file. If you're trying to create tar files on other systems (a reasonable thing to do, since tar is an interchange format) you typically have to read each file twice: once to count the characters in it, and a second time to copy it to the tar output file. The moral is that if you're writing a program that might be ported to a non-Unix system, don't depend on the ability to find a file's size, "in advance," without explicitly reading it. Getting back to data files, it's not necessary to know how big they are while reading them. Just use code like the following: int nels = 0; int nallocated = 0; struct whatever *p = NULL; while(there's another item) { if(nels >= nallocated) { nallocated += 10; if(p == NULL) p = (struct whatever *)malloc( nallocated * sizeof(struct whatever)); else p = (struct whatever *)realloc((char *)p, nallocated * sizeof(struct whatever)); if(p == NULL) complain; } read item into p[nels]; nels++; } If realloc can handle a NULL first argument, you can dispense with the initial test and call to malloc, and always call realloc (which is why I'm always ranting in favor of this realloc functionality, which ANSI C incidentally requires). The on-the-fly reallocation may look inefficient, but "it doesn't matter much in practice." (At least for me. When I'm really unconcerned with efficiency, I even skip the nallocated += 10 chunking jazz and call realloc for each item read, and that has never caused problems either. Your mileage may vary.) Steve Summit scs@adam.pika.mit.edu