Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site decwrl.UUCP Path: utzoo!watmath!clyde!burl!ulysses!ucbvax!decwrl!dec-rhea!dec-mrfort!jcampbell From: jcampbell@mrfort.DEC (Jon Campbell) Newsgroups: net.unix-wizards,net.unix Subject: unix file system Message-ID: <3287@decwrl.UUCP> Date: Thu, 25-Jul-85 13:53:22 EDT Article-I.D.: decwrl.3287 Posted: Thu Jul 25 13:53:22 1985 Date-Received: Sat, 27-Jul-85 02:51:51 EDT Sender: daemon@decwrl.UUCP Organization: DEC Engineering Network Lines: 115 Xref: watmath net.unix-wizards:14017 net.unix:5139 From: Jon Campbell Digital Equipment Corp. Marlboro, MA 617-467-6876 DECnode:MRFORT::JCAMPBELL To: UNIX developers and users Subject: problems with the UNIX file system Some of us at Digital think we have found a basic problem with the UNIX file system for FORTRAN. The problem is that there is no place to put various kinds of information about the contents of the file. More specifically: 1. The FORTRAN language requires that one be able to have "random access" files, with a fixed "recordsize". The obvious UNIX implementation is one which uses a fixed number of bytes (perhaps even with a at the end) for each "record". However, there is no way on UNIX that one can open such a file and find out the size of each record. Thus it is impossible to write a utility to look at, modify, or extract data from such a file without the user having previous knowledge about the file. 2. As you probably know, most FORTRAN output data files reserve the 1st character position of each output line for a "FORTRAN carriage control character". When the file is printed (or, in some circumstances, typed) these control characters are supposed to be translated into corresponding vertical motion characters (such as one or more line-feeds, a form-feed, a vertical tab, etc.) and the character at the end of the "record" is removed. So FORTRAN output files are "different" than other files, even though you cannot tell that by looking at them - they just have "funny numbers" in the 1st character position of each line. UNIX provides a utility for piping the FORTRAN output through a translator module, so that the vertical motion characters appear directly in the output file. But often that is not what is desirable. Often one wants to leave the file in its original ("FORTRAN data file") state, modify it many weeks later, and then print it. Again, as in the case above, the user must know that the file was produced by a FORTRAN program and pipe it to a filter program on the way out to the printer or terminal. 3. The ANSI Magnetic Tape Label Standard defines a set of file attributes in the file labels which must be filled in when the tape is written. Among them are record size and carriage control (referred to in the Standard as "Form Control"). I would like to propose that UNIX users and developers begin thinking about which "file attributes" (knowledge about the file that would be useful to know for generalized programs which cannot have previous knowledge about each file) would be useful to attach to UNIX files. Keep in mind that these "attributes" would NOT in any way detract from the simplicity of UNIX - one would not have to use them; they would be Page 2 there only for those users who wish to carry information about the files along with the files. Nor would files with attribute information be looked at by UNIX in any way than they are looked at now - they just have some more information about them that can be discovered when they are opened. No "file management layer" is implied for UNIX by the creation of these "attributes". We would not even have to make an "incompatible change" for the printing of files with the "FORTRAN data file" attribute: a new command could be introduced to take the place of LPR for those users who wish the utility to find out whether the attribute is set and print the file accordingly; many people would probably continue to use LPR. Below is a list of those "attributes" which I have found useful in my work in implementing the FORTRAN runtime library for TOPS-10 and TOPS-20. Many of them have been included in the ANSI Magnetic Tape Label Standard: Carriage control FORTRAN - funny numbers in char position 1, translated on printing LIST - take just the contents of the "record", add a . This is for files which have no characters in them NONE - print the file as it appears (the default) Character set (for those folks who want to have both EBCDIC and ASCII files) Record format - (refer to the Tape Label Standard) Delimited - each record has a 4-character byte count in front of it Fixed - all records have the same length, with no terminators Undefined - the default - no implied record format Record size (For "fixed" record format, the size of all records; for variable-length records, this is usually interpreted as the maximum record length - zero means "unknown" maximum record length) File type (for "data management" programs...) Sequential (the default) Others (user-definable, for various flavors of other types of access, such as [ugh] indexed sequential, database, etc.) Bytesize (for typesetting applications which use 16- or 32-bit character sets) I'm sure you'll all think of others that would be useful. Since I have not looked at the UNIX internal file system much, I do not know how difficult it would be to find a place to attach this large (and, potentially, expanding) set of attributes, or what the FOPEN (or other) interface would look like to set/get the attribute values. Thanks for your time, Jon Campbell --------