Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!apple!snorkelwacker!bloom-beacon!world!burley From: burley@world.std.com (James C Burley) Newsgroups: comp.lang.fortran Subject: Re: File handling in Fortran 77 Message-ID: Date: 31 Aug 90 07:19:12 GMT References: <46016@masscomp.ccur.com> <1990Aug29.173235.9405@ux1.cso.uiuc.edu> <1990Aug30.132335.20164@ux1.cso.uiuc.edu> Sender: burley@world.std.com (James C Burley) Organization: The World Lines: 157 In-Reply-To: mcdonald@aries.scs.uiuc.edu's message of 30 Aug 90 13:23:35 GMT In article <1990Aug30.132335.20164@ux1.cso.uiuc.edu> mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes: I meant that a computer that can't run C is a joke. If it can run C, it must be physically possible, and possible by some sort of OS calls. If I create a language called "X" with an I/O model expecting files to be composed of a given number of bits, does that make "jokes" out of all existing computers that measure file lengths in granularities down to only 8-bit bytes? Hardly. Meanwhile, yes, Fortran programmers can make direct OS calls to do things not provided directly by the language's I/O specification. In other words, you are saying Primes are (were?) so deficient that they could not even tell how long a file was? Primes had all sorts of problems dealing with C. C is a language designed to closely match the machine-code architecture of a particular kind of system; Primes were designed with another standard in mind (Multics, PL/I). Of course they knew how long a file was -- down to a granularity of 16 bits, not 8 bits as C demands. I'd like to see contemporary systems that do C well compete against Primes running demanding PL/I applications using dynamic linking and such. Prime 50 series machines were built largely around a PL/I model; the OS call interface tended to be Fortran-oriented, but the OS in general was evolving towards a PL/I and Multics model during the late 70s and early 80s. I still don't understand why not. A file IS a sequence of bits. Why can't the OS simply move those bits (as needed, or all at once, if the address space is big enough) into memory and present them to the program? If you had written the file with some arcane internal organization, you would then SEE that organization, and could deal with as you pleased. The idea of C is to allow the programmer to see the exact internal organization of the file. The idea of Fortran is to hide the internal organization of the file from the programmer to improve portability. Fortran has a different design goal than C. Fortran can run (or run reasonably efficiently) on some machines that standard C (with I/O libraries) cannot. C can run more efficiently on many machines under circumstances than Fortran; and it is often easier to implement a C compiler system than a Fortran one because you don't have to implement so many "insulating" levels. A Fortran compiler system is easily implemented entirely within the ANSI standard C environment (I think; I'm doing one now), but the converse is not true. However, in an environment where standard C cannot be provided, Fortran can still be implemented in another language and/or assembler. Aside from the sheer size of a Fortran compiler, it is hard to imagine any computer architecture on which a Fortran system could not be placed (at least a run-time environment, if compiling could take place on a different machine due to the size of the compiler). Most of these statements about Fortran can be applied to C if you forget about the standard run-time library, especially I/O, but possibly including things like setjmp/longjmp. Even then, one must contend with character pointers (which Fortran does not have) and recursion, which C implementations require but not all systems can provide (or provide at "base level", i.e. without an omnipresent extra layer of processing to simulate a different machine). Incidentally, how do the OS's with the arcane file types get all that arcana installed in the files? Not even in the CISC-yist architectures is it done in hardware is it? What language IS it done in? Certainly not Fortran --- C? assembly? The point is that any language is joke if it is severly limited in how it can use files. The "byte" model (or, even moreso the bit model) is general. The "record model" is a hindrance - if it is the ONLY model a language or OS supports. A "record" model would be OK as a language specific overlay of a basic general model. But to have it stuffed deep down inside the OS as in VMS and IBM mainframes is a disaster. Doug MCDonald The OSes do whatever they want in assembly, at worst. PRIMOS was originally written in Fortran and assembler; yet it could just as well have been implemented in C, because it never used Fortran I/O statements to do anything, just system calls, which is the C model. (I.e. C has no "built in" I/O at all; it all happens via function calls to the library. Fortran is more like Pascal and Basic in the sense that it has built-in I/O statements.) There is nothing magic about any of the OS's I/O when you think of it this way. In fact PRIMOS was more true to the UNIX model of files than VAX/VMS (the offspring of PDP-11 systems on which C/UNIX was kind of born), because a PRIMOS file was essentially just a sequence of zero or more 16-bit chunks. Change that to 8-bit chunks and you (almost) have the UNIX file system model. VMS, on the other hand, offers so many different ways to represent similar file concepts (stream-LF, stream-CR, stream-CRLF, variable record with CR, var w/o CR, just to name a few), it can drive you crazy. C implementations default to picking the lowest-level format (I think it's stream-LF), which many system utilities don't support well. No special hardware support is needed. At the lowest level, one is simply writing bits. Disk subsystems deal primarily in fixed-size blocks (1024 16-bit words, as I recall, on Primes); so their lowest level of granularity for storing bits is their block size. To write anything other than a block of bits at a module-block bit address on a disk (i.e. anything other than "write block number x" or "write block numbers x through y"), the OS must first read the block(s), then overlay the data and rewrite them. Most OS's I've seen provide special optimization paths when user software asks to write one or more complete blocks at block boundaries. I think early drum-based systems had blocks more the size of old-style records (72 or 80 bytes or something) so Fortran evolved from that. If the Fortran programmer wanted to rewrite a given portion of the record, let him write the code to first read the entire record, overlay, then rewrite; the performance implications are clearer that way. This was probably the thinking; plus some devices (like line printers) couldn't deal with partial records at all, and they didn't have the ability (or the understanding) to do buffering in an I/O library (or maybe they just didn't have libraries at the time, and generated the I/O instructions directly at compile time?). The same goes for C when you want to write an arbitrary number of bits at an arbitrary bit address in a file. C is "deficient" in this sense. (And of course reading is an issue too, but has less performance reduction.) 8-bit bytes are only magic to you because that is what you're used to. Some of us got used to 16-bit, 36-bit, 32-bit, 12-bit, 18-bit, 64-bit, or 24-bit chunks as the granularity for addressing in memory and/or files. So 8-bit bytes is just another value to us. For C programmers, character addressing and reading/writing is important: Fortran programmers only got at all excited about character data around 12 years ago. Until then, and this is still true to a large extent, Fortran cared only about numbers and, for efficiency, grouping sequences of numbers into records. Early file systems didn't have much info on files; sometimes none. So a file simply started at a block address on a disk and went on for so many blocks, perhaps controlled by an index block(s) or perhaps not if contiguous files were always used. Adding something as innocuous as "here's the logical length of the file" was a big thing; getting the granularity "right" still hasn't happened. So PRIMOS' concept of logical length was 16-bit chunks; but most systems still do it in 8-bit chunks, and that is "wrong" since a new language might want to use 1-bit chunks. Your questions and comments suggest to me that you are interested in learning about operating systems and file systems (their histories, design, and implementation). It is hard to sum up this field in response to these issues, though I've tried to provide some highlights from my own experience. I suggest you go out and find a book on implementation of some OS over 10 years old that isn't UNIX; best if it includes information on why certain design tradeoffs were made. That might give you more insight not only on why old systems weren't built to run C (which should be obvious to anyone), but why current systems might well not be adequate for future languages that don't add any real exciting new "features" but simply demand a more general machine model (like the 1-bit granularity for memory and/or files), and why systems that are designed for such new languages might well be worse, in terms of amount of complexity to achieve a given performance, in terms of running today's C programs. Further, it might be revealing to think through what kinds of optimizations are available when doing language-level record-blocking and how they might be made available to languages (like C) that don't provide it. Ultimately, the point I wish to make is that, if a given machine isn't well suited to hosting a standard C environment, the only thing you can say with certainty about that system is that it isn't well suited to C. You can't really say it's generally "deficient". After all, it might be better at running Fortran than an equivalent machine that fits C like hand-in-glove! James Craig Burley, Software Craftsperson burley@world.std.com