Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ames!vsi1!altnet!uunet!mcvax!enea!sommar From: sommar@enea.se (Erland Sommarskog) Newsgroups: comp.lang.misc Subject: Re: Text or data files? Message-ID: <3967@enea.se> Date: 1 Oct 88 13:57:14 GMT Organization: ENEA DATA AB, Sweden Lines: 106 I had the example: >> Data_record = RECORD >> Date : PACKED ARRAY(.1..8.) OF char; >> Time : PACKED ARRAY(.1..8.) OF char; >> Incident : Incident_type; (* Enumerated *) >> No_of_warnings : integer; >> Alarmed : boolan; >> Username : PACKED ARRAY(.1..12.) OF char; >> END; >> >>The simplest way to read and write this is to through a FILE OF Data_record, >>if no other programs is to read it. Marc W. Mengel (mmengel@cuuxb.UUCP) wrote: >Two major problems with this idea. The first is that most of the time >other programs will need to read the data sooner or later. If we have data that are to be read by more than one program, two programs can import the declaration of the data record from a common source, and thus they do not need to be rewritten if the format is changed. With a text file we can achieve the same effect with common procedures for read and writing, to be imported together with the data definition, giving a higer degree of dependency. Also now we have the problem that for one change we have to edit three in places: the read and write routines and the data definition, introducing a source of error.) If you have many programs that are to read the same data, you are likely to get a database system, and I don't think they store data in a text-file format... The only case when I can see that this argument is valid is when "the other program" is standard a text-oriented utility. >Second, when >files are written in a binary format like this, the same program cannot >read the data when run on a different machine with a different byte >ordering, so after you have built up a list of 2000 incidents, and have >to move to a new machine, you lose big time. A valid point. However, text files are not necessarily compatible either. Imagine that the data record above has a message field, 80 characters long. Assume that the program started its life on VMS and that one of the messages contains a CR-LF. Now we move to a Unix system... And I have seen Pascal systems that gladly read 123 from the line "123ABD", and those who chokes, saying "inavlid integer". Both these problems can be avoided with careful programming, it should be added. >You have a data file with packed records in it, and you (the programmer) >have *no idea* how the data is actually formatted. Isn't this a point? I always thought that a high level of abstraction as possible was a good thing. You don't need to know the actaul disk format until you really have a need to move the file. >It's true, you have to parse some of the data file (the numbers), but >even Pascal gives you a means of writing and reading integers of a >fixed width. The problem is that you often have little use for these standard routines, unless you can accept that the program crashes because there was a letter where you expected a number. Storing data in text files gives you a bigger problem with data integrity, than with binary files. >What's so tough about fixed format text parsing? >... >you *can* add records with a text editor, A plus, but applying the text editor is clearly a violence on data integrity. >you can debug your code much more easily, Since I have less code, binary files win here, as long as I have good debugger around. >you can write programs in other languages If you work on VMS and have CDD (Common Data Dictionary) around this is possible with binary files too. (With CDD you can write data definitions in specific data definition language. Several of DEC's compiler suppiles a DICTIONARY directive to import these definitions.) >Binary data saves you a whole 20 minutes to an hour when writing >the program, It saves you more time than so. You don't have to think so much about integreity checks, you have less problem changing the format during development, maintenance benefits from the reduced code volume. I'm not saying that you should never use text files for storing data. In many cases this may be very desirable. I once wrote a simple text formatter with an interactive syllabication facility. I stored the syllabications on i text file, since I realized that the user wanted to be able to remove an erroneous syllabication, and I didn't want to write a tool for maintaining the file. What to use is a decision the programmer has to make based on the requirements on portability (+ for text), performance (+ for binary), data integrity (+ for binary) and so on. Generally it seems to me that a cheap system with low requirements on integrity and maintainability a text file is the natural choice. But as the complexity and the amount of data grow you are likely to chose binary files and eventually you pass the line where you need a database management system. -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP