Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site topaz.ARPA Path: utzoo!watmath!clyde!cbosgd!cbdkc1!desoto!packard!topaz!hedrick From: hedrick@topaz.ARPA (Chuck Hedrick) Newsgroups: net.unix-wizards,net.unix Subject: Re: unix file system Message-ID: <2970@topaz.ARPA> Date: Sun, 28-Jul-85 01:05:33 EDT Article-I.D.: topaz.2970 Posted: Sun Jul 28 01:05:33 1985 Date-Received: Mon, 29-Jul-85 06:54:50 EDT References: <3287@decwrl.UUCP> Reply-To: hedrick@topaz.UUCP (Chuck Hedrick) Organization: Rutgers Univ., New Brunswick, N.J. Lines: 92 Xref: watmath net.unix-wizards:14069 net.unix:5165 Jon: I am very glad to see that DEC is interested in Fortran on Unix. You would make many people very happy if you bring to Unix a Fortran compiler of the quality of the DEC VMS (or TOPS-20) compiler. However... I think it is a bad idea to add attributes to the Unix file system. You indicate that it would not cause any incompatibility. There is a sense in which this is true. But you would have to change all the utility programs that copy files, to copy the attributes. You would have to change the formats of backup tapes and tapes such as tar, to include the attributes. To the extent that the attributes are used, you would have to modify language runtime systems and utilities to take attributes into account when reading files that have them. One of my staff members has just written a network spooler for VMS. It is amazing how complex it is to read VMS files in their full generality, at least from Modula 2. (Perhaps this is a defect in the runtime system.) This complexity has nothing to do with whether there is an extra layer of RMS between you and the file system. Indeed that layer may make things more liveable. It has to do simply with the complexity of the file system. I am recommending that our Computer Science Dept use Unix, partly because I want an O.S. that is simple. I would like our students to be able to do some system programming. I would not like to face them with the complexities of an RMS file. If you add attributes to your Unix, I would regretfully have to rule it out as a candidate for our department. However the problem that you pose still remains. I think you want to distinguish between 2 kinds of files: those that are intended to be human-readable, and binary files. I believe you should do whatever violence is necessary to keep human-readable files in a single, simple format. This is the clear difference between Unix/Tenex on the one side and IBM/VMS on the other. I believe Unix people have chosen which side of the fence they want to be on, and you should respect that decision. Fortunately, I believe you do not have to do much violence to Fortran to make this work. The only structure you really have to worry about in human-readable files is carriage control. I suggest that the runtime system should turn the carriage control into carriage return, line feed, form feed, etc. At first glance, this appears to be a problem. After all, you say, Fortran programs might write a file using carriage control, and expect that when the file is read back in, the carriage control is still there. However as I understand it, Fortran 77 has deemphasized carriage control. I believe it is now used only in "print" files. It seems reasonable to believe that a print file is not normally going to be read back in as data to another Fortran program. Thus I believe you should do the following: - by default, map carriage control into CR, LF, etc. when output is to a "print" file. I suggest a convention that by default units 0 (stderr) and 6 (stdout) are print files. - supply an option to OPEN to override this. - for programs that do not use these mechanisms properly (e.g. old Fortran 66 programs), the only damage is that the ANSI carriage control characters will show up in column 1. There can still be a filter to handle this explicitly for those exceptions. I do not like the TOPS-20 idea of defaulting depending upon the actual output device (/dev/tty and /dev/lpt being print, disk files nonprint). The program will not then know in advance whether the file is a print file. That makes it unnecessarily hard to code. For binary files, I like the idea of a "magic number" that specifies "This is a structured binary file". In case you are not familiar with the concept of magic number, all relocatable and executable binaries have a certain number in their first 32 bits. There is no danger of confusing these files with text files, since the magic numbers are small integers. Thus the first 2 or 3 bytes are always 0, which is unlikely in a text file. You then need a way to specify the attributes. Experience with network protocols and other things suggests a text format for this. If you use bits, you will always run out of bits. There are several reasonable formats. My favorite (you are going to laugh, I'm sure) is Lisp format: a parenthesized list with attribute-value pairs, e.g. ((RECORD-SIZE 200) (FORMAT VBA)) This is simple to parse using a higher-level language. Xerox used it for specifying file attributes in PUP FTP, and it is easier to handle than the alternatives I have seen elsewhere. A more "binary" format might be pairs of null-terminated strings, ending with an extra null. But I think the Lisp format is better. You would probably want a convention that the actual data begins on the next 32-bit boundary after the end of the attributes, since that might simplify processing for certain situations. (For paged files, such as B-trees, you would probably want to skip to the next page boundary, but that would be an action implied by certain attributes.) PS: in future messages, could you give a UUCP route? I don't have a routing to mrfort.DEC offhand. Charles Hedrick Rutgers University uucp: ...{harvard, seismo, ut-sally, sri-iu, ihnp4!packard}!topaz!hedrick arpa: HEDRICK@RUTGERS