Path: utzoo!utgpu!attcan!uunet!mcvax!enea!sommar From: sommar@enea.se (Erland Sommarskog) Newsgroups: comp.lang.misc Subject: Re: Text or data files? Message-ID: <3980@enea.se> Date: 9 Oct 88 15:43:32 GMT Organization: ENEA DATA AB, Sweden Lines: 91 (>> is me.) Nevin J Liber (nevin1@ihlpb.UUCP) writes: >Irregardless of whether I use text files or binary files, I would >rather write my own read/write routines (even if they only call the >standard ones) than be dependent on my compiler. Of course, no matter what type of files we use, we should encapsulate the disk I/O routines for our data structure. What the rest of the program should see is just Get(put)_one_record(Data) where Data is of some type. And these routines are easier to maintain if they simply write Data (or Data.all) in its binary format to the disk. >This assumes that all the programs are not only run on the same type of >machine and operating system, but that they are written in the same >language using the same compiler (stuff like pack arrays are not only >*machine* dependent and *operating system* dependent, they are *language* >dependent, *compiler* dependent, and in some cases are even *optimization* >dependent). This is unnecessarily restrictive, and typically not practical >in commercial environments. This is true if you can't call your common interface routines from another language. And if you can't, well, you have a maintenance problem no matter the file format. As I also mentioned in my previous article, a tool like VAX CDD is a help in a multi-language environment. >> If you have many programs that are to read the same data, you are >>likely to get a database system, and I don't think they store data >>in a text-file format... > >You wouldn't necessarily want a prepackaged DBMS. There is usually a >lot of overhead associated with DBMS systems, and you have decide >whether it is worth it. And there is a lot of development overhead associated with not using a DBMS. Did I hear NIH? >> The only case when I can see that this argument is valid is when >>"the other program" is standard a text-oriented utility. > >Well, if you're on a Un*x (...) system, this may >be very desirable. You can use all your familiar tools (like grep, >sed, etc.) to do many of your manipulations. Agreed. Just because I said it was the only, doesn't mean that it's unimportant. (Side note: On a system like VMS you still have some use for SEARCH, DIFFERENCES etc for binary files, since they recognize the file format.) >>The problem is that you often have little use for these standard >>routines, unless you can accept that the program crashes because there >>was a letter where you expected a number. > >Again, a deficiency of the programming language, not of the data format. >In C, people use the standard routines with no problems; they don't >ungracefully crash when an error occurs like Wirth-type languages do. Possibly C handles this case better than other languages do. All langauges that I've seen protest in some way when they get a non-digit when trying to read an integer. Not all of them crash though. Simula and standard Pascal do. Ada and Fortran have exception mechanisms to help you. (But I wonder what C does? If I guess, it sets some error variable that you can forget to check, returns for zero for the integer, and doesn't move the current position in the file, so when you're reading the following string field you're starting in the wrong place. This would be just as bad as simply crashing.) >>Storing data in text files >>gives you a bigger problem with data integrity, than with binary >>files. > >Actually, the opposite is true. Since the effective data is more >compressed in binary formats (if this wasn't true, there would be nothing >that would distinguish text formats from binary formats), it is more likely >that a data error will go by unnoticed. Whether a binary file is more compressed than the corresponding text file, depends on the data. With many numbers it's true, but with many string fields, you can save disk space with a text file, since you don't have to store trailing blanks. The size has little to with the integrity. The assumption is that no sane person would start to edit a binary file "by hand", but you can't overlook this case for a text file. If we can assume that the file is only accessed through the common I/O routines mentioned earlier, we are assured that format integrity is maintained. -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP