Path: utzoo!utgpu!water!watmath!clyde!rutgers!cmcl2!nrl-cmf!mailrus!ames!pasteur!ucbvax!decvax!decwrl!sun!quintus!ok From: ok@quintus.UUCP (Richard A. O'Keefe) Newsgroups: comp.lang.prolog Subject: Re: behavior of read/get0 at end_of_file Keywords: get0 read end_of_file BSI Message-ID: <824@cresswell.quintus.UUCP> Date: 27 Mar 88 09:19:58 GMT References: <2410@zyx.UUCP> Organization: Quintus Computer Systems, Mountain View, CA Lines: 120 In article <2410@zyx.UUCP>, grzm@zyx.UUCP (Gunnar Blomberg) writes: > Hmm... isn't this a lot of fuss about very little? No. I have a suggestion for you. Write a Pascal tokeniser in the following three programming languages: o C (end-of-file is a special value) o Pascal (end-of-file is tested by eof(input)) o PL/I (end-of-file is an exception). _Then_ come back and tell us it's "very little". Based on my experience with these three, I'd rank them out of 10 on a "difficulty" scale as C: 1, Pascal: 3, PL/I: 10. (Try telling a C programmer that he would be better off if end-of-file were handled by a new SIGEOF signal. If, back when I was writing PL/I, you had offered me a version of PL/I which handled end-of-file the way Pascal does, I'd have thanked you with tears in my eyes.) What happens when you hit the end of a file is not a minor matter. After all, every file has at least one end! If we were designing a new programming language, it would deserve the most careful attention. The treatment of end-of-file has a large effect on the structure of programs. But the Prolog standard is not supposed to be a matter of designing a new programming language. I keep saying this, and people seem to keep failing to see the point: the criteria for changing an existing language are MUCH more stringent than the criteria for designing a new one. For example, I think that abbreviations in the names of evaluable predicates are bad, so that argument/3 would be a better name than arg/3. So what? It isn't better ENOUGH to warrant the change. I could list a score of such things in Edinburgh Prolog which are not to my personal taste, and which I believe I have objective grounds for criticising. What of that? There is none of them bad enough to warrant my breaking other people's code. Now changing the behaviour get0/1 and read/1 would break every program I have ever written that does any input. (The change from is_endfile(26) in DEC-10 Prolog and some versions of C Prolog to is_endfile(-1) in some versions of C Prolog and Quintus Prolog took an average of about 10 seconds per file to fix with a good editor.) If someone comes up to you and asks you to improve their programming language, you have a pretty heavy responsibility to do a good job of it. Quintus move very slowly and very cautiously: once we've put something in the language, customers are likely to start using it, and pulling a feature out on the grounds that we don't like it any more is not really ethical behaviour. The moral responsibility of a group of people who take it on themselves to change a language around without being asked to by the people who will be affected by such changes is much much greater. At the very least, a paramount concern of such a group should be to provide enough operations and hooks in the standard that "99%" compatibility packages for some reasonably representative set of dialects should be KNOWN to be definable using standard operations. For example, in my work on this in 1984, I very carefully worked through Waterloo Prolog (NOT an Edinburgh-compatible Prolog) to find out what extra hooks would be needed. > It seems to me that whatever semantics is chosen, it is simple to get > the other: > BSIread(X) :- | get_char(X) :- > DEC10read(X), | get0(C), > X \== end_of_file. | C =\= -1, | string_list(X, [C]). > DEC10read(X) :- | get0(C) :- > BSIread(Y), | ( get_char(X) -> string_list(X, [C]) > !, | ; C = -1 > X = Y. | ). > DEC10read(end_of_file). I can't find a BSI document which describes read/1 anything but vaguely, so I've added the character I/O versions on the right, and it's those I'll comment on. (By the way, string_list/2 is a pretty appalling name; you would expect it to have something to do with lists of strings.) The latest character I/O document I checked was so phrased as to suggest that having failed once, get_char/1 would continue to fail. There was a note which pointed out that it was still an open question whether get_char/1 should do this or should report an error if called again after having once failed. This presumably carries over to read/1. So we simply don't yet know whether the first definition is correct or not, because BSI I/O is not yet fully defined. Case 1: calling get_char/1 after it has already failed results in an error report. The cross-definitions of get_char/1 and get0/1 would then be correct, IF an end-of-file condition could be indicated only once in a file, which is false. Case 2: get_char/1 keeps on failing quietly. Then none of the cross-definitions would be correct. Since the only motivation that anyone has ever told me about for the fail-at-end approach is the analogy between a file and a list of characters, case 2 is the "natural" one. That is, a parallel is thought to exist between next_term([Head|Tail], Head, Tail). and next_term(File, Head) :- read(File, Head). and if we take this seriously, we would expect read(File, Head) to keep on failing at the end of a file, just as next_term([], Head, _) would keep failing. Now the analogy is very far from being a good one, so there may be some other motivation I have not been told about which would make case 1 the "natural" one. Even in case 1, and even discounting the extremely useful possibility of a literal 'end_of_file' appearing in the input, it is still not clear that the cross-definitions for read/1 would be correct. There are two difficulties: what about syntax errors? and what about end of file? There are end-of-file problems in read/1 additional to those in get0/1, due to the fact that a term is an extended object, and the fact that read/1 may consume arbitrarily many characters without encountering a term. > It is after all easy to write DEC10read in terms of BSIread. Strictly speaking, it is impossible, because the two syntaxes are different. Even ignoring that, it isn't clear to me that it is possible. It *would* be possible to write read/1 in terms of get_char/1 (though it would be rather more painful than it would be given get0/1).