Path: utzoo!mnetor!uunet!mcvax!unido!ecrcvax!micha From: micha@ecrcvax.UUCP (Micha Meier) Newsgroups: comp.lang.prolog Subject: Re: behavior of read/get0 at end_of_file Message-ID: <518@ecrcvax.UUCP> Date: 22 Mar 88 15:56:23 GMT References: <608> <1197@kulcs.kulcs.uucp> <783@cresswell.quintus.UUCP> Reply-To: micha@ecrcvax.UUCP (Micha Meier) Organization: ECRC, Munich 81, West Germany Lines: 162 Keywords: get0 read end_of_file In article <783@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >Here's how I'd write copy_chars/0 for real: > > copy_chars :- > get0(Char), > ( is_endfile(Char) -> true > ; put(Char), > copy_chars > ). > >This is only superficially different from C's > while ((Char = getchar()) != EOF) putchar(Char); > >Which version is "so obviously better and more readable"? > >I'm not going to try to answer that question, because it is entirely >the wrong question. An example this small can be coped with even if >it is badly written. This is true, we have to distinguish various uses of get0/1. The above example is indeed easier written when get0/1 fails at the eof, because the is_endfile/1 test is not needed. However, most often one wants to do more with the character rather than just test the eof, and only then the differences are meaningful. By the way, get0/1 does *not* exist in BSI, it uses get_char/1 instead, and its argument is a character, i.e. a string of length 1. This means that the type 'character' is inferred from the type 'string' (and not the other way round like in C). Does anybody out there know what advantages this can bring? It is independent on the character <-> integer encoding, but this only because explicit conversion predicates have to be called all the time. >We can convert a deterministic finite-state automaton to Edinburgh >Prolog with very little effort. We represent a state of the automaton >by a predicate with one argument: the next character. We represent an >arc of the automaton by a clause. For example, the arcs > > s1: a -> s2. > s1: b -> s1. > s1: $ -> accept. > >would be coded like this: > > s1(0'a) :- get0(Next), s2(Next). > s1(0'b) :- get0(Next), s1(Next). > s1(- 1) :- true. > In his tutorial to the SLP '87 Richard has taken another representation of a finite automaton which is more appropriate: s1 :- get0(Char), s1(Char). s1(0'a) :- s2. s1(0'b) :- s1. s1(-1) :- accept. The difference is, that if one wants to perform some action in some states, this must be done *before* reading the next character, i.e. just at the beginning of s1/0. Such representation can be more easily converted to the BSI's variant of get: s1 :- % do the corresponding action ( get0(Char) -> s1(Char) ; accept ). s1(0'a) :- s2. s1(0'b) :- s1. Note that the eof arc has to be merged into s1/0 in this way since if we'd write it like s1 :- s1_action, get0(Char), !, s1(Char). s1 :- accept. then after an eof we would backtrack over s1_action and undo what we've done. I must say, none of the two seems to me satisfactory. Richard's version is not portable due to the -1 as eof character. We can improve this into s1(X) :- eof(X), accept. s1(0'a) :- s2. s1(0'b) :- s1. and hope that the compiler will unfold the eof/1 inside the indexing mechanism, otherwise we have choice points even if the code is deterministic. The BSI version is much more arguable, though. Having to wrap a disjunction (and a choice point) around the get0/1 call suggests that for this application the BSI choice is not the appropriate one. It is interesting to note, however, that it could work even with nondeterministic automata, where the BSI's failure was (I thought) more likely to cause problems. >> BTW, if you ever want to convert a program with a different interpretation, >> the solution is easy : >> >> /*QP*/read(X) :- /*bim*/read(X), ! . >> /*QP*/read(whatever_is_used_to_indicate_end_of_file) . >> >It may be easy, but it isn't a solution. Suppose we write this: > > buggy_read(Term) :- bim_read(Term), !. > buggy_read(end_of_file). > >Now, suppose the current input stream contains > fred. >and we call > buggy_read(end_of_file). >IT WILL SUCCEED! It should have failed. Since the Edinburgh get0/1 can easily simulate the BSI's one with get0_BSI(Char) :- get0_Edinburgh(Char), not_eof(Char). but as Richard has shown, not vice versa, it is clear that for a Prolog system it is better to have get0/1 return some *portable* eof (e.g the atom end_of_file, for get0/1 there can be no confusion with source items) instead of some integer. This, however, just shifts the problem up to read/1: BSI objects that if it returns e.g. the atom end_of_file then any occurrence of this atom in the source file could not be distinguished from a real end of file. In this case, a remedy would be the introduction of a term with a local scope (e.g. valid only in the module where read/1 and eof/1 are defined) and using eof/1 instead of unifying the argument of read/1 with the end_of_file term. Hence read/1 would return this term on encountering the file end and eof/1 would check whether its argument is this term. --Micha