Path: utzoo!utgpu!water!watmath!clyde!att-cb!att-ih!pacbell!ptsfa!ames!pasteur!agate!aurora!labrea!decwrl!sun!quintus!ok From: ok@quintus.UUCP (Richard A. O'Keefe) Newsgroups: comp.lang.prolog Subject: Re: BSI syntax Summary: some more examples (long) Message-ID: <742@cresswell.quintus.UUCP> Date: 8 Mar 88 07:29:56 GMT References: <736@cresswell.quintus.UUCP> <737@cresswell.quintus.UUCP> Organization: Quintus Computer Systems, Mountain View, CA Lines: 271 As in article <737@cresswell.quintus.UUCP>, I'll refer to the document BSI/IST/5/15 > ISO/IEC JTC1 SC22 WG17 "Draft for comment: Feb 1988" as "the grimoire". (In case you didn't already know, the word "grimoire" is derived from "grammar", as is "glamour".) Before getting to the examples, I'd like to quote a sentence from Chris Moss's message. He says of the current BSI syntax that It is much closer to "Edinburgh" than some previous proposals, but does try to regularize some of the obvious defects. Absolutely true, though perhaps I don't interpret the second clause the way he does! It would be very interesting to know why the earlier proposals were discarded. I know why _I_ would have discarded them, but why did the BSI committee discard them? Now let's look at some examples. -------------------- The end-of-term token -------------------- One of the more visible changes is that "." is a perfectly good token in Edinburgh Prolog, and the end-of-term token is ".", but in BSI Prolog the end-of-term token is ".". What does this buy us? (1) It breaks most of Lee Naish's programs (because he uses the fact that you can declare '.' as an infix operator, and use it as infix cons), and some of mine (because I tend to type "write(.)", "functor(X, ., N)", "X = .(A,B,C)", and other things that rely on "." being a valid token). The BSI committee may possibly see this as an advantage. I don't. Ok, sometimes you have to trade costs against benefits; but if you want to break my code, you had better offer me some pretty good benefits! (2) Here is an example which is not legal in Edinburgh Prolog, but is legal in BSI Prolog: p(a).p(b).p(c). This is a benefit? In fact, this benefit is illusory as well as dubious. Suppose you want to put a PL/I-style comment after a clause. In BSI Prolog, you have to write p. /*that-space-was-needed!*/ because p./* is, according to the Lexical Syntax (in particular, rule L7), two tokens, "p", and "./*", just as in Edinburgh Prolog. (There is a similar problem in C++: i = j//*oops*/k; is equivalent to "i = j/k;" in C, but because C++ has // end-of-line comments, it is equivalent to "i = j" in C++. I just wanted to point out that this kind of problem is not specific to Prolog.) In Edinburgh Prolog, you know that you have to have a layout character at the end of every clause, so such problems are automatically avoided. There is a similar problem with :- op(10,fx,a).:- op(10,fx,b). ^^^ where .:- is a single token in BSI Prolog, just as it is in Edinburgh Prolog. (3) Well, maybe the change is for the sake of implementors, to make it easier to write tokenisers? Sorry, but it does the exact opposite. Edinburgh Prolog lexical syntax can require two characters of LOOKAHEAD. For example, when we see the sequence "2." we have to look at the next character, and if it is not a digit, we have two characters in hand that do not form part of the current token. However, Edinburgh Prolog doesn't require any characters of PUSHBACK. That is, when an Edinburgh Prolog parser reads a term, it reads all and only the characters which comprise the term. The layout character which is part of the end-of-term token is by definition part of the clause, and is and should be consumed by read/1. So if I do ?- get0(C). * the answer I get is C=42, because the space between the dot and the asterisk was part of the end-of-term token, and was properly consumed. Now when a BSI Prolog parser reads a term, it has to look at the character following the dot, because the query might have been ?- get0(C)..put(C). Having found that the character after the dot is NOT part of what the grimoire calls "a graphic symbol", a BSI Prolog parser had better put that character back in the input stream, otherwise it will have read something which was NOT part of the term! By the way, did you notice that I just gave an example of a query which cannot be transliterated into BSI Prolog? Suppose I try ?- get0(C). * in a BSI Prolog system. (Actually, get0/1 wasn't a BSI built-in the last time I looked, but let's pretend that it is possible to synthesise it.) The answer will be C=32, because the space after the dot is NOT part of the clause in BSI Prolog, so should not be read by read/1, so SHOULD be read by get0/1. Well, that's not what I want. How about ?- get0(C).* Sorry, "*" is a "graphic character", so ".*" is a single token, so that's not going to work either. (Yes, it is obvious how to make it work, but it's not a simple matter of transliteration.) Anyway, the point is that it is possible to write a very fast tokeniser for Edinburgh Prolog, without requiring the host I/O system to support pushback, and without having to simulate pushback, but that this is NOT possible for BSI Prolog. The formatted input facility which was being proposed last year required unbounded pushback (I am not kidding), so the BSI may not regard pushback as a problem. -------------------- No syntax for character lists -------------------- BSI Prolog introduces strings. Interfacing to Lisp or Pop or Basic or some other language which has strings is a perfectly sensible thing to do, so strings have a place in the standard. I don't even mind the fact that BSI syntax is not compatible with Arity Prolog. With someone from ESI on the committee and no-one from Arity it was inevitable that the standard would resemble ESI Prolog rather than Arity Prolog. But let's look at a sensible example of Edinburgh Prolog code. usa_phone(Area,Exchange,Number) --> "(", digits(3, Area), ") ", digits(3, Exchange), "-", digits(4, Number). This won't work in BSI Prolog, because "(" and so on are strings, not lists. Fair enough: I'm prepared to change "(" to $($ or #(# or whatever. But there isn't anything for me to change them to! If Quintus Prolog didn't have double quote, I could write [0'(], [0'),0' ], and [0'-] -- which comes from DEC-10 Prolog, as I've mentioned before -- but BSI Prolog won't let me do that either. No, I have to write the character codes as integers. As I wrote it, usa_phone works just fine in Quintus Prolog on an IBM 370 (using EBCDIC). With base zero, it works just fine in EBCDIC. But BSI Prolog forces you to write the ASCII codes or whatever. Yes, I know old versions of Quintus Prolog didn't support base zero, but it should have, to be compatible with DEC-10 Prolog. I think the way ALS handles character codes is unduly clumsy, but if that were to be the standard I could put up with it. Strings are not an adequate substitute for lists of character codes, and having a simple syntax for strings is no excuse for not having ANY syntax for lists of character codes. It makes me wonder if anyone on the BSI committee uses grammar rules. It might be argued, though, that there are only so many characters, so as new features (however dubious) are added, old ones must go. This is not true, as it happens. There is a very satisfactory solution. I leave it as an exercise for the BSI committee to work out what that solution is. (Hint: it makes the language more powerful, not less, and substantially simplifies forward conversion.) One minor gripe about strings and quoted atoms is that DEC-10 Prolog followed the Fortran/SNOBOL/... convention of doubling the quoting character, e.g. 'don''t' "quote ""me"" here". BSI Prolog partially follows the C convention. Adding C-style \escapes is one thing; breaking old code is another. Why not allow the old convention as well? Quintus Prolog does this, and it isn't hard, not at all. -------------------- A much-needed gap -------------------- Something many people have complained of is the fact that there is no standard way of reading a term without an end-of-term token. It would be quite straightforward to provide this for Edinburgh syntax, at the price of two characters of pushback. You may be familiar with the "syntax" of Prolog in the DEC-10 Prolog manual. It looks like term-read-in --> subterm(1200) end-of-term subterm(N) --> term(M), {M =< N}. ... term(0) --> functor'(' arguments ')' | list | ... | number ... It would be straightforward to provide a built-in predicate read0(-Term0) which would enter the grammar at the term(0) point, and would share the read/1 characteristic of reading all and only the characters of the term of interest. Why are two characters of pushback needed? Because the input might look like 2.@ where we cannot tell until we've read the @ that the input isn't 2.0. By juggling with definitions, we could get this down to one character of pushback: we might, for example, rule that . in _this_ context, was equivalent to alone. Could we do this with BSI Prolog? No. Because BSI Prolog is intended to allow arbitrary amounts of layout and comment between a function symbol and its left parenthesis (I say "intended to", because the grimoire doesn't actually allow comments anywhere), it requires unbounded lookahead (hence, in this context, unbounded pushback) to distinguish between f and f (1) This is not a case of breaking anything that currently works. It IS, however, a case of "filling a much-needed gap". -------------------- The strange case of -3 -------------------- This isn't in the grimoire. It's "Syntax Issue Op5". The question is this: how are negative numbers to be handled? The answer in Edinburgh Prolog is a wee bit tricky, but it works very nicely in practice. A number like -3 is read as two tokens. In a context where a prefix operator would be allowed, -3 is converted by the parser to a single number, and in that case - binds more tightly than any other operator and cannot be disabled. In a context where an infix operator is expected, -3 remains as two tokens. In Edinburgh Prolog, it follows that X is-I mod-2-3 is parsed as is(X,-(mod(-(I),-2),3)) What does "Syntax Issue Op5" say? Adopted 2 all syntactic cases of negative numbers must be converted, but built-in predicates treat - as functor. I'm sorry, but this is just as puzzling to me as it is to you. What I _think_ it means is that -3 is to be read as a single number, and that integer(-3) and atomic(-3) are to succeed, but that functor(-3, -, 1) is ALSO to succeed. WALLOP! Nearly all of my programs just bit the dust! There used to be this nice little property you could trust: for all C, atomic(C) <-> nonvar(C) & functor(C, C, 0) I don't see any sense in destroying this property. Anyone who wants -(3) can WRITE -(3). I hope I have misunderstood. What does the grimoire say? Well, it's rather interesting. L13 number = ["-"], digits, [".", digits], [exponent] ; L18 digits = digit, {digit} ; L14 exponent = ("E" | "e"), ["+" | "-"], digits ; There's a rather nasty problem here, in that it requires unbounded lookahead to recognise one of these things: 1e0000000000000000000000000000000000000000000000000 is a number, but 1e0000000000000000000000000000000000000000000000000_ is 1 with the atom 'e0000000000000000000000000000000000000000000000000_' after it. The Turing definition adopts the principle of maximal scan to rule such things out as lexical errors, but the grimoire does NOT adopt the principle of maximal scan and tries to handle such things directly in the grammar. However, the real snag with this is that it breaks code such as three_less(N, M) :- M is N-3 . because that -3 is now a single token. Maybe it's not what the BSI committee _mean_, but it's what the grimoire _says_. -------------------- Enough for today -------------------- There is a minor glitch: the grimoire requires the ISO 8859 character with code 223 to be treated as an upper case letter. The DIS 8859 draft I was sent as one of the BSI documents says that the character with that code is the German "sz" character (looks like a Beta), which is a lower-case letter. This may be due to me having an old draft: if someone knows where I can get the latest draft or the actual standard, could you tell me? I had a lot more that I meant to say in this message, arguing the point that trying to define the syntax of Prolog as if it were Pascal is the wrong _kind_ of definition, but this message is already too long. Am I being unjust to the grimoire? After all, it _is_ labelled "draft for comment". Bearing in mind that the BSI committee was set up in 1984, I don't think I'm being unjust.