Path: utzoo!utgpu!water!watmath!clyde!rutgers!cmcl2!nrl-cmf!mailrus!ames!pasteur!ucbvax!decvax!decwrl!sun!quintus!ok
From: ok@quintus.UUCP (Richard A. O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: behavior of read/get0 at end_of_file
Keywords: get0 read end_of_file BSI
Message-ID: <824@cresswell.quintus.UUCP>
Date: 27 Mar 88 09:19:58 GMT
References: <2410@zyx.UUCP>
Organization: Quintus Computer Systems, Mountain View, CA
Lines: 120

In article <2410@zyx.UUCP>, grzm@zyx.UUCP (Gunnar Blomberg) writes:
> Hmm...  isn't this a lot of fuss about very little?

No.

I have a suggestion for you.  Write a Pascal tokeniser in the
following three programming languages:
    o	C (end-of-file is a special value)
    o	Pascal (end-of-file is tested by eof(input))
    o	PL/I (end-of-file is an exception).
_Then_ come back and tell us it's "very little".
Based on my experience with these three, I'd rank them out of 10 on a
"difficulty" scale as C:  1, Pascal:  3, PL/I:  10.  (Try telling a C
programmer that he would be better off if end-of-file were handled by
a new SIGEOF signal.  If, back when I was writing PL/I, you had offered
me a version of PL/I which handled end-of-file the way Pascal does, I'd
have thanked you with tears in my eyes.)

What happens when you hit the end of a file is not a minor matter.
After all, every file has at least one end!  If we were designing a new
programming language, it would deserve the most careful attention.  The
treatment of end-of-file has a large effect on the structure of programs.

But the Prolog standard is not supposed to be a matter of designing
a new programming language.  I keep saying this, and people seem to
keep failing to see the point:  the criteria for changing an existing
language are MUCH more stringent than the criteria for designing a
new one.  For example, I think that abbreviations in the names of
evaluable predicates are bad, so that argument/3 would be a better
name than arg/3.  So what?  It isn't better ENOUGH to warrant the
change.  I could list a score of such things in Edinburgh Prolog which
are not to my personal taste, and which I believe I have objective
grounds for criticising.  What of that?  There is none of them bad
enough to warrant my breaking other people's code.  Now changing the
behaviour get0/1 and read/1 would break every program I have ever
written that does any input.  (The change from is_endfile(26) in
DEC-10 Prolog and some versions of C Prolog to is_endfile(-1) in
some versions of C Prolog and Quintus Prolog took an average of
about 10 seconds per file to fix with a good editor.)

If someone comes up to you and asks you to improve their programming
language, you have a pretty heavy responsibility to do a good job of
it.  Quintus move very slowly and very cautiously:  once we've put
something in the language, customers are likely to start using it,
and pulling a feature out on the grounds that we don't like it any
more is not really ethical behaviour.  The moral responsibility of
a group of people who take it on themselves to change a language
around without being asked to by the people who will be affected by
such changes is much much greater.  At the very least, a paramount
concern of such a group should be to provide enough operations and
hooks in the standard that "99%" compatibility packages for some
reasonably representative set of dialects should be KNOWN to be
definable using standard operations.  For example, in my work on
this in 1984, I very carefully worked through Waterloo Prolog (NOT
an Edinburgh-compatible Prolog) to find out what extra hooks would
be needed.

> It seems to me that whatever semantics is chosen, it is simple to get
> the other:

> BSIread(X) :-			| get_char(X) :-
>    DEC10read(X),		|	get0(C),
>    X \== end_of_file.		|	C =\= -1,
				|	string_list(X, [C]).

> DEC10read(X) :-		| get0(C) :-
>    BSIread(Y),		|	( get_char(X) -> string_list(X, [C])
>    !,				|	; C = -1
>    X = Y.			|	).
> DEC10read(end_of_file).

I can't find a BSI document which describes read/1 anything but vaguely,
so I've added the character I/O versions on the right, and it's those
I'll comment on.  (By the way, string_list/2 is a pretty appalling name;
you would expect it to have something to do with lists of strings.)

The latest character I/O document I checked was so phrased as to suggest
that having failed once, get_char/1 would continue to fail.  There was a
note which pointed out that it was still an open question whether
get_char/1 should do this or should report an error if called again
after having once failed.  This presumably carries over to read/1.
So we simply don't yet know whether the first definition is correct or
not, because BSI I/O is not yet fully defined.

    Case 1:  calling get_char/1 after it has already failed results in
	     an error report.  The cross-definitions of get_char/1 and
	     get0/1 would then be correct, IF an end-of-file condition
	     could be indicated only once in a file, which is false.

    Case 2:  get_char/1 keeps on failing quietly.  Then none of the
	     cross-definitions would be correct.

Since the only motivation that anyone has ever told me about for the
fail-at-end approach is the analogy between a file and a list of
characters, case 2 is the "natural" one.  That is, a parallel is
thought to exist between
	next_term([Head|Tail], Head, Tail).
and
	next_term(File, Head) :- read(File, Head).
and if we take this seriously, we would expect read(File, Head) to
keep on failing at the end of a file, just as next_term([], Head, _)
would keep failing.  Now the analogy is very far from being a good
one, so there may be some other motivation I have not been told about
which would make case 1 the "natural" one.

Even in case 1, and even discounting the extremely useful possibility of
a literal 'end_of_file' appearing in the input, it is still not clear
that the cross-definitions for read/1 would be correct.  There are two
difficulties:  what about syntax errors?  and what about end of file?
There are end-of-file problems in read/1 additional to those in get0/1,
due to the fact that a term is an extended object, and the fact that
read/1 may consume arbitrarily many characters without encountering a
term.

> It is after all easy to write DEC10read in terms of BSIread.

Strictly speaking, it is impossible, because the two syntaxes are
different.  Even ignoring that, it isn't clear to me that it is possible.
It *would* be possible to write read/1 in terms of get_char/1 (though it
would be rather more painful than it would be given get0/1).