Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!apple!snorkelwacker!bloom-beacon!world!burley
From: burley@world.std.com (James C Burley)
Newsgroups: comp.lang.fortran
Subject: Re: File handling in Fortran 77
Message-ID: <BURLEY.90Aug31001912@world.std.com>
Date: 31 Aug 90 07:19:12 GMT
References: <46016@masscomp.ccur.com> <1990Aug29.173235.9405@ux1.cso.uiuc.edu>
	<BURLEY.90Aug30024743@world.std.com>
	<1990Aug30.132335.20164@ux1.cso.uiuc.edu>
Sender: burley@world.std.com (James C Burley)
Organization: The World
Lines: 157
In-Reply-To: mcdonald@aries.scs.uiuc.edu's message of 30 Aug 90 13:23:35 GMT

In article <1990Aug30.132335.20164@ux1.cso.uiuc.edu> mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes:

   I meant that a computer that can't run C is a joke. If it can run C,
   it must be physically possible, and possible by some sort of OS calls.

If I create a language called "X" with an I/O model expecting files to be
composed of a given number of bits, does that make "jokes" out of all existing
computers that measure file lengths in granularities down to only 8-bit
bytes?  Hardly.

Meanwhile, yes, Fortran programmers can make direct OS calls to do things not
provided directly by the language's I/O specification.

   In other words, you are saying Primes are (were?) so deficient that
   they could not even tell how long a file was? 

Primes had all sorts of problems dealing with C.  C is a language designed
to closely match the machine-code architecture of a particular kind of system;
Primes were designed with another standard in mind (Multics, PL/I).  Of course
they knew how long a file was -- down to a granularity of 16 bits, not 8 bits
as C demands.

I'd like to see contemporary systems that do C well compete against Primes
running demanding PL/I applications using dynamic linking and such.  Prime
50 series machines were built largely around a PL/I model; the OS call
interface tended to be Fortran-oriented, but the OS in general was evolving
towards a PL/I and Multics model during the late 70s and early 80s.

   I still don't understand why not. A file IS a sequence of bits.  Why can't the
   OS simply move those bits (as needed, or all at once, if the address
   space is big enough) into memory and present them to the program?  If 
   you had written the file with some arcane internal organization, you
   would then SEE that organization, and could deal with as you pleased.

The idea of C is to allow the programmer to see the exact internal organization
of the file.  The idea of Fortran is to hide the internal organization of the
file from the programmer to improve portability.  Fortran has a different
design goal than C.  Fortran can run (or run reasonably efficiently) on some
machines that standard C (with I/O libraries) cannot.  C can run more
efficiently on many machines under circumstances than Fortran; and it is
often easier to implement a C compiler system than a Fortran one because you
don't have to implement so many "insulating" levels.  A Fortran compiler
system is easily implemented entirely within the ANSI standard C environment
(I think; I'm doing one now), but the converse is not true.  However, in an
environment where standard C cannot be provided, Fortran can still be
implemented in another language and/or assembler.  Aside from the sheer size
of a Fortran compiler, it is hard to imagine any computer architecture on which
a Fortran system could not be placed (at least a run-time environment, if
compiling could take place on a different machine due to the size of the
compiler).

Most of these statements about Fortran can be applied to C if you forget about
the standard run-time library, especially I/O, but possibly including things
like setjmp/longjmp.  Even then, one must contend with character pointers
(which Fortran does not have) and recursion, which C implementations require
but not all systems can provide (or provide at "base level", i.e. without an
omnipresent extra layer of processing to simulate a different machine).

   Incidentally, how do the OS's with the arcane file types get all that
   arcana installed in the files? Not even in the CISC-yist  architectures
   is it done in hardware is it? What language IS it done in? Certainly not
   Fortran --- C? assembly? The point is that any language is joke if it is
   severly limited in how it can use files. The "byte" model (or, even 
   moreso the bit model) is general. The "record model" is a hindrance -
   if it is the ONLY model a language or OS supports. A "record" model
   would be OK as a language specific overlay of a basic general model.

   But to have it stuffed deep down inside the OS as in VMS and IBM mainframes
   is a disaster. 

   Doug MCDonald

The OSes do whatever they want in assembly, at worst.  PRIMOS was originally
written in Fortran and assembler; yet it could just as well have been
implemented in C, because it never used Fortran I/O statements to do anything,
just system calls, which is the C model.  (I.e. C has no "built in" I/O at all;
it all happens via function calls to the library.  Fortran is more like Pascal
and Basic in the sense that it has built-in I/O statements.)  There is nothing
magic about any of the OS's I/O when you think of it this way.

In fact PRIMOS was more true to the UNIX model of files than VAX/VMS (the
offspring of PDP-11 systems on which C/UNIX was kind of born), because a PRIMOS
file was essentially just a sequence of zero or more 16-bit chunks.  Change
that to 8-bit chunks and you (almost) have the UNIX file system model.  VMS,
on the other hand, offers so many different ways to represent similar file
concepts (stream-LF, stream-CR, stream-CRLF, variable record with CR, var w/o
CR, just to name a few), it can drive you crazy.  C implementations default to
picking the lowest-level format (I think it's stream-LF), which many system
utilities don't support well.

No special hardware support is needed.  At the lowest level, one is simply
writing bits.  Disk subsystems deal primarily in fixed-size blocks (1024 16-bit
words, as I recall, on Primes); so their lowest level of granularity for
storing bits is their block size.  To write anything other than a block of
bits at a module-block bit address on a disk (i.e. anything other than "write
block number x" or "write block numbers x through y"), the OS must first read
the block(s), then overlay the data and rewrite them.  Most OS's I've seen
provide special optimization paths when user software asks to write one or
more complete blocks at block boundaries.  I think early drum-based systems
had blocks more the size of old-style records (72 or 80 bytes or something)
so Fortran evolved from that.  If the Fortran programmer wanted to rewrite
a given portion of the record, let him write the code to first read the entire
record, overlay, then rewrite; the performance implications are clearer that
way.  This was probably the thinking; plus some devices (like line printers)
couldn't deal with partial records at all, and they didn't have the ability
(or the understanding) to do buffering in an I/O library (or maybe they just
didn't have libraries at the time, and generated the I/O instructions directly
at compile time?).

The same goes for C when you want to write an arbitrary number of bits at
an arbitrary bit address in a file.  C is "deficient" in this sense.  (And
of course reading is an issue too, but has less performance reduction.)

8-bit bytes are only magic to you because that is what you're used to.  Some
of us got used to 16-bit, 36-bit, 32-bit, 12-bit, 18-bit, 64-bit, or 24-bit
chunks as the granularity for addressing in memory and/or files.  So 8-bit
bytes is just another value to us.  For C programmers, character addressing
and reading/writing is important: Fortran programmers only got at all excited
about character data around 12 years ago.  Until then, and this is still true
to a large extent, Fortran cared only about numbers and, for efficiency,
grouping sequences of numbers into records.

Early file systems didn't have much info on files; sometimes none.  So a file
simply started at a block address on a disk and went on for so many blocks,
perhaps controlled by an index block(s) or perhaps not if contiguous files
were always used.  Adding something as innocuous as "here's the logical
length of the file" was a big thing; getting the granularity "right" still
hasn't happened.  So PRIMOS' concept of logical length was 16-bit chunks; but
most systems still do it in 8-bit chunks, and that is "wrong" since a new
language might want to use 1-bit chunks.

Your questions and comments suggest to me that you are interested in learning
about operating systems and file systems (their histories, design, and
implementation).  It is hard to sum up this field in response to these issues,
though I've tried to provide some highlights from my own experience.  I
suggest you go out and find a book on implementation of some OS over 10 years
old that isn't UNIX; best if it includes information on why certain design
tradeoffs were made.  That might give you more insight not only on why old
systems weren't built to run C (which should be obvious to anyone), but why
current systems might well not be adequate for future languages that don't
add any real exciting new "features" but simply demand a more general machine
model (like the 1-bit granularity for memory and/or files), and why systems
that are designed for such new languages might well be worse, in terms of
amount of complexity to achieve a given performance, in terms of running
today's C programs.

Further, it might be revealing to think through what kinds of optimizations
are available when doing language-level record-blocking and how they might be
made available to languages (like C) that don't provide it.

Ultimately, the point I wish to make is that, if a given machine isn't well
suited to hosting a standard C environment, the only thing you can say with
certainty about that system is that it isn't well suited to C.  You can't
really say it's generally "deficient".  After all, it might be better at
running Fortran than an equivalent machine that fits C like hand-in-glove!

James Craig Burley, Software Craftsperson    burley@world.std.com