Path: utzoo!utgpu!water!watmath!clyde!att!ihlpb!nevin1 From: nevin1@ihlpb.ATT.COM (Liber) Newsgroups: comp.lang.misc Subject: Re: Text or data files? Message-ID: <8915@ihlpb.ATT.COM> Date: 12 Oct 88 00:57:41 GMT References: <3980@enea.se> Reply-To: nevin1@ihlpb.UUCP (55528-Liber,N.J.) Organization: AT&T Bell Laboratories - Naperville, Illinois Lines: 141 In article <3980@enea.se> sommar@enea.se (Erland Sommarskog) writes: ES> Of course, no matter what type of files we use, we should encapsulate ES> the disk I/O routines for our data structure. What the rest of the ES> program should see is just Get(put)_one_record(Data) where Data is ES> of some type. Agreed. ES> And these routines are easier to maintain if they ES> simply write Data (or Data.all) in its binary format to the disk. Only if I want to write all the data that is contained in my record in the format that is in my record. This, in my experience, is not usually the case. NL> This assumes that all the programs are not only run on the same type of NL> machine and operating system, but that they are written in the same NL> language using the same compiler (stuff like pack arrays are not only NL> *machine* dependent and *operating system* dependent, they are *language* NL> dependent, *compiler* dependent, and in some cases are even *optimization* NL> dependent). This is unnecessarily restrictive, and typically not practical NL> in commercial environments. ES> This is true if you can't call your common interface routines from ES> another language. And if you can't, well, you have a maintenance ES> problem no matter the file format. What happens in the situation when you no longer have the original source code? With a text file, it is fairly easy to figure out the data format (eg: the uuencoding scheme is extremely easy to figure out, and I did so when I couldn't find a uudecode program around for my PC). With a non-compressed binary format, it is a little tougher (you have to know how integers are represented on the target machine, use some good test data, etc.), especially if you think it is 'not sane' to hand-edit a binary file. If your data is compressed at all (like Pascal's packed arrays), you had better know the compression scheme or figuring out the format will be very difficult. ES> As I also mentioned in my ES> previous article, a tool like VAX CDD is a help in a multi-language ES> environment. Since I haven't seen the CDD, I cannot comment on it. NL> You wouldn't necessarily want a prepackaged DBMS. There is usually a NL> lot of overhead associated with DBMS systems, and you have decide NL> whether it is worth it. ES> And there is a lot of development overhead associated with not using ES> a DBMS. Did I hear NIH? No, you didn't hear NIH. What I meant by deciding whether or not it is worth having a DBMS is whether or not the overhead of a DBMS outweighs the overhead of development without it. Sometimes adding a DBMS *adds* overhead to development (you have to learn the interface to your language, you have to learn the DBMS, etc.). This topic, however, is not appropriate for comp.lang.misc. If you wish to discuss it, move it to comp.databases. Nuff said. ES> (Side note: On a system like VMS you still have some ES> use for SEARCH, DIFFERENCES etc for binary files, since they recognize ES> the file format.) These tend to be very limited. NL> Again, a deficiency of the programming language, not of the data format. NL> In C, people use the standard routines with no problems; they don't NL> ungracefully crash when an error occurs like Wirth-type languages do. ES> Possibly C handles this case better than other languages do. All langauges ES> that I've seen protest in some way when they get a non-digit when ES> trying to read an integer. Not all of them crash though. Simula and ES> standard Pascal do. Ada and Fortran have exception mechanisms to help ES> you. ES> (But I wonder what C does? If I guess, [...] ES> [very bad guess deleted] The C function strtol (string to long), takes a string and converts in into a long int. It ignores leading whitespace and it scans until it finds a character which is inconsistent with the base. It returns the converted number and a pointer to the character which terminated the scan. This is much more graceful than the standard Pascal solution. ES> Whether a binary file is more compressed than the corresponding ES> text file, depends on the data. With many numbers it's true, ES> but with many string fields, you can save disk space with a text ES> file, since you don't have to store trailing blanks. Not a valid point. Since we were talking about *fixed-format*, the trailing blanks have to be included whether or not we are using text files of binary files. Fixed-format binary files are more compressed than corresponding fixed-format text files (with the possible exception of all-text files). ES> The size has little to with the integrity. The assumption is ES> that no sane person would start to edit a binary file "by hand", ES> but you can't overlook this case for a text file. I guess I'm insane :-), but let me give you a few examples where I had to edit a binary file by hand. Ever had a head crash on a disk drive? On a PC, guess where the head usually resides. On the file that was last accessed. Guess what that file usually is. It's usually the file containing the current directory. On numerous occasions, I had to reconstruct a sector of a directory, and the only way I know to do this is to go in and edit the sector with a binary editor. Another example: every once in a while someone deletes a file that they wanted to save. On PCs, the file isn't actually removed; the directory entry is flagged and the space that the file occupied is put back on the free list. By going in and hand editing the directory file, it is a very simple process to undelete the file. Also, being able to hand edit a binary file is a useful debugging tool. ES> If we can assume ES> that the file is only accessed through the common I/O routines ES> mentioned earlier, we are assured that format integrity is maintained. Not true, either. Still another example: A word processor that I was using would mark the file that I was editing in such a way that no one else using the word processor would be able to edit this file. Guess what happened when the system came down. The file was permanently marked open, and the word processor did not have an option for unmarking a file (until the next release, anyway). Without hand editing it, I would have to scrap the file. With the binary editor, however, I was able to change the 1-bit marking flag, and I lost no work. The format was corrupted by having my process interrupted (although this can happen with text files, too, the recovery is much easier with text files). -- _ __ NEVIN J. LIBER ..!att!ihlpb!nevin1 (312) 979-4751 IH 4F-410 ' ) ) "I catch him with a left hook. He eels over. It was a fluke, but there / / _ , __o ____ he was, lying on the deck, flat as a mackerel - kelpless!" / (_