Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site csd1.UUCP Path: utzoo!linus!decvax!harpo!floyd!cmcl2!csd1!condict From: condict@csd1.UUCP (Michael Condict) Newsgroups: net.lang Subject: I/O operations in programming languages Message-ID: <108@csd1.UUCP> Date: Thu, 25-Aug-83 13:39:37 EDT Article-I.D.: csd1.108 Posted: Thu Aug 25 13:39:37 1983 Date-Received: Sat, 27-Aug-83 10:21:57 EDT Organization: New York University Lines: 80 Recently I was prompted to reconsider a problem of programming language design that I was working on several years ago. Simply put, it concerns the abysmally inelegant manner in which I/O is performed in even the fanciest, most new-fangled languages, such as PROLOG and modern LISP dialects. No matter how applicative or logically-based the language is, the best that designers seem able to come up with for I/O primitives amounts to procedure calls with side effects that get or put the next element of a sequence of values. The user is forced to view input and output in a procedural manner and is forced to view a file as a sequence of objects (in many languages the objects must be scalars, such as bytes or integers), rather than as, say, an array of records each one of which contains a mixture of character string and numeric fields. Wirth knew the way out of this mess when he designed Pascal way back in the early 70's, I think. Instead of making I/O operations available only through procedures with side effects, he put a FILE type into the language so that files could, to some extent, be treated as variables. This gave files access to the entire world of Pascal data structures, greatly increasing the elegance with which operations on complicated data bases could be performed. There are a few substantial problems and limitations associated with Pascal I/O, however, not all of which are attributable to its design as stated in Jensen & Wirth. The most serious of these is in fact a matter or interpretation by compiler writers, caused by Wirth's unfortunate choice of the reserved word FILE (which shows that he may have tacitly agreed with the compiler writers subsequent interpretation, even though he was not willing to endorse it in the Revised Report). To make a long story short, there is little justification beyond compatibility with other implementations for requiring that the FILE data type be allocated on a mass-storage device with a name and lifetime external to the program and for prohibiting any other data type from being used this way. It would have been far better, I think, had Wirth chosen the name SEQUENCE instead of FILE, and not just because a legalistic reading of the Revised Report shows that the FILE data type has no particular connection with operating system files (beyond the connection through the PROGRAM header, which is further indication that Wirth would take the implementor's side against me). Several more substantial arguments can be made for treating FILE's as nothing more than sequence, or string, variables, and for allowing arbitrary Pascal variables to be connected to external files. First there is the often stated complaint that Pascal has only sequential I/O (this because a FILE is a sequence), a problem that is remedied in incompatible ways by making extensions to the sequence operations GET and PUT, to allow, e.g. the specification of an index. If arrays could be connected to external files in the same way that FILE variables are forced to be, Pascal WOULD HAVE random-access I/O. Conversely, many users want the ability to manipulate sequences of objects, such as variable-length character strings, stored in "fast" program variables, without having to put up with all the junk and inefficiency associated with operating system files. Proposals for a STRING data type of various sorts abound in the Pascal literature, even though the FILE data type (with some restrictions removed) is the data type they want and it is already there. In fact, although it needs such library routines as INDEX and SUBSTR to make it convenient, it is quite a general character-string mechanism, since it includes conversion between numbers and characters in its built-in set of operations (through the use of READ/WRITE). Let me make it clear that I am no longer interested in convincing Pascal purists that there is anything wrong with their wonderful and esquisitely perfect language. I merely use it to give a proposal for a better solution of the I/O problem (and most would agree that it is a serious problem) in the design of programming languages. We will never achieve portability in the I/O operations of real-world programs until the languages they are written in can support at least a fragment of the complexity (and versatility) of I/O operations that are allowed by modern operating systems. Otherwise each language implementor will continue to insert incompatible hooks into each compiler to allow access to these operations, resulting in a perpetual and unnecessary proliferation of language dialects. To close a somewhat windy and highly opinionated article, I put forth a question to this readership: Tell me why must there be a separate set of constructs in a programming language for performing I/O, rather than just one construct that associates a variable with an external file. In what way are the data structures of a language like C, Pascal or Ada inadequate to the task of representing and manipulating data on mass-storage devices? Michael Condict Courant Inst., New York Univ. 251 Mercer St. New York, NY 10012 ...!cmcl2!csd1!condict