Path: utzoo!utgpu!water!watmath!clyde!att-cb!att-ih!pacbell!ptsfa!ames!pasteur!agate!aurora!labrea!decwrl!sun!quintus!ok
From: ok@quintus.UUCP (Richard A. O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: BSI syntax
Summary: some more examples (long)
Message-ID: <742@cresswell.quintus.UUCP>
Date: 8 Mar 88 07:29:56 GMT
References: <736@cresswell.quintus.UUCP> <737@cresswell.quintus.UUCP>
Organization: Quintus Computer Systems, Mountain View, CA
Lines: 271

As in article <737@cresswell.quintus.UUCP>, I'll refer to the document
BSI/IST/5/15 > ISO/IEC JTC1 SC22 WG17 "Draft for comment: Feb 1988"
as "the grimoire".  (In case you didn't already know, the word
"grimoire" is derived from "grammar", as is "glamour".)

Before getting to the examples, I'd like to quote a sentence from
Chris Moss's message.  He says of the current BSI syntax that
	It is much closer to "Edinburgh" than some previous proposals,
	but does try to regularize some of the obvious defects.
Absolutely true, though perhaps I don't interpret the second clause
the way he does!  It would be very interesting to know why the earlier
proposals were discarded.  I know why _I_ would have discarded them,
but why did the BSI committee discard them?

Now let's look at some examples.

-------------------- The end-of-term token --------------------

    One of the more visible changes is that "." is a perfectly good token
in Edinburgh Prolog, and the end-of-term token is ".<layout>", but in
BSI Prolog the end-of-term token is ".".  What does this buy us?

(1) It breaks most of Lee Naish's programs (because he uses the fact that
    you can declare '.' as an infix operator, and use it as infix cons),
    and some of mine (because I tend to type "write(.)", "functor(X, ., N)",
    "X = .(A,B,C)", and other things that rely on "." being a valid token).

    The BSI committee may possibly see this as an advantage.  I don't.
    Ok, sometimes you have to trade costs against benefits; but if you
    want to break my code, you had better offer me some pretty good
    benefits!

(2) Here is an example which is not legal in Edinburgh Prolog, but is
    legal in BSI Prolog:
	p(a).p(b).p(c).
    This is a benefit?

    In fact, this benefit is illusory as well as dubious.  Suppose you
    want to put a PL/I-style comment after a clause.  In BSI Prolog,
    you have to write
	p. /*that-space-was-needed!*/
    because
	p./*
    is, according to the Lexical Syntax (in particular, rule L7),
    two tokens, "p", and "./*", just as in Edinburgh Prolog.
    (There is a similar problem in C++:
	i = j//*oops*/k;
    is equivalent to "i = j/k;" in C, but because C++ has // end-of-line
    comments, it is equivalent to "i = j" in C++.  I just wanted to point
    out that this kind of problem is not specific to Prolog.)

    In Edinburgh Prolog, you know that you have to have a layout character
    at the end of every clause, so such problems are automatically avoided.

    There is a similar problem with
	:- op(10,fx,a).:- op(10,fx,b).
		      ^^^
    where .:- is a single token in BSI Prolog, just as it is in Edinburgh
    Prolog.

(3) Well, maybe the change is for the sake of implementors, to make it
    easier to write tokenisers?  Sorry, but it does the exact opposite.

    Edinburgh Prolog lexical syntax can require two characters of
    LOOKAHEAD.  For example, when we see the sequence "2." we have to
    look at the next character, and if it is not a digit, we have two
    characters in hand that do not form part of the current token.
    However, Edinburgh Prolog doesn't require any characters of PUSHBACK.
    That is, when an Edinburgh Prolog parser reads a term, it reads all
    and only the characters which comprise the term.  The layout character
    which is part of the end-of-term token is by definition part of the
    clause, and is and should be consumed by read/1.  So if I do
	?- get0(C). *
    the answer I get is C=42, because the space between the dot and the
    asterisk was part of the end-of-term token, and was properly
    consumed.  Now when a BSI Prolog parser reads a term, it has to
    look at the character following the dot, because the query might have
    been
	?- get0(C)..put(C).
    Having found that the character after the dot is NOT part of what the
    grimoire calls "a graphic symbol", a BSI Prolog parser had better put
    that character back in the input stream, otherwise it will have read
    something which was NOT part of the term!

    By the way, did you notice that I just gave an example of a query which
    cannot be transliterated into BSI Prolog?  Suppose I try
	?- get0(C). *
    in a BSI Prolog system.  (Actually, get0/1 wasn't a BSI built-in the
    last time I looked, but let's pretend that it is possible to
    synthesise it.)  The answer will be C=32, because the space after
    the dot is NOT part of the clause in BSI Prolog, so should not be
    read by read/1, so SHOULD be read by get0/1.  Well, that's not what
    I want.  How about
	?- get0(C).*
    Sorry, "*" is a "graphic character", so ".*" is a single token, so
    that's not going to work either.  (Yes, it is obvious how to make it
    work, but it's not a simple matter of transliteration.)

    Anyway, the point is that it is possible to write a very fast tokeniser
    for Edinburgh Prolog, without requiring the host I/O system to support
    pushback, and without having to simulate pushback, but that this is NOT
    possible for BSI Prolog.

    The formatted input facility which was being proposed last year
    required unbounded pushback (I am not kidding), so the BSI may not
    regard pushback as a problem.


-------------------- No syntax for character lists --------------------

    BSI Prolog introduces strings.  Interfacing to Lisp or Pop or Basic
or some other language which has strings is a perfectly sensible thing
to do, so strings have a place in the standard.  I don't even mind the
fact that BSI syntax is not compatible with Arity Prolog.  With someone
from ESI on the committee and no-one from Arity it was inevitable that
the standard would resemble ESI Prolog rather than Arity Prolog.  But
let's look at a sensible example of Edinburgh Prolog code.

	usa_phone(Area,Exchange,Number) -->
	    "(", digits(3, Area), ") ",
	    digits(3, Exchange), "-", digits(4, Number).

This won't work in BSI Prolog, because "(" and so on are strings, not lists.
Fair enough:  I'm prepared to change "(" to $($ or #(# or whatever.  But
there isn't anything for me to change them to!  If Quintus Prolog didn't
have double quote, I could write
	[0'(], [0'),0' ], and [0'-]
 -- which comes from DEC-10 Prolog, as I've mentioned before --
but BSI Prolog won't let me do that either.  No, I have to write the
character codes as integers.  As I wrote it, usa_phone works just fine
in Quintus Prolog on an IBM 370 (using EBCDIC).  With base zero, it
works just fine in EBCDIC.  But BSI Prolog forces you to write the ASCII
codes or whatever.  Yes, I know old versions of Quintus Prolog didn't
support base zero, but it should have, to be compatible with DEC-10 Prolog.
I think the way ALS handles character codes is unduly clumsy, but if that
were to be the standard I could put up with it.

    Strings are not an adequate substitute for lists of character codes,
and having a simple syntax for strings is no excuse for not having ANY
syntax for lists of character codes.  It makes me wonder if anyone on
the BSI committee uses grammar rules.

    It might be argued, though, that there are only so many characters,
so as new features (however dubious) are added, old ones must go.  This
is not true, as it happens.  There is a very satisfactory solution.  I
leave it as an exercise for the BSI committee to work out what that
solution is.  (Hint:  it makes the language more powerful, not less,
and substantially simplifies forward conversion.)

    One minor gripe about strings and quoted atoms is that DEC-10 Prolog
followed the Fortran/SNOBOL/... convention of doubling the quoting
character, e.g. 'don''t' "quote ""me"" here".  BSI Prolog partially
follows the C convention.  Adding C-style \escapes is one thing; breaking
old code is another.  Why not allow the old convention as well?  Quintus
Prolog does this, and it isn't hard, not at all.


-------------------- A much-needed gap --------------------

    Something many people have complained of is the fact that there is no
standard way of reading a term without an end-of-term token.  It would be
quite straightforward to provide this for Edinburgh syntax, at the price
of two characters of pushback.

    You may be familiar with the "syntax" of Prolog in the DEC-10 Prolog
manual.  It looks like
	term-read-in --> subterm(1200) end-of-term
	subterm(N) --> term(M), {M =< N}.
	...
	term(0) --> functor'(' arguments ')' | list | ... | number
	...
It would be straightforward to provide a built-in predicate
	read0(-Term0)
which would enter the grammar at the term(0) point, and would share the
read/1 characteristic of reading all and only the characters of the term
of interest.  Why are two characters of pushback needed?  Because the
input might look like
	2.@
where we cannot tell until we've read the @ that the input isn't 2.0.
By juggling with definitions, we could get this down to one character
of pushback:  we might, for example, rule that <integer>. in _this_
context, was equivalent to <integer> alone.

    Could we do this with BSI Prolog?  No.  Because BSI Prolog is intended
to allow arbitrary amounts of layout and comment between a function symbol
and its left parenthesis (I say "intended to", because the grimoire doesn't
actually allow comments anywhere), it requires unbounded lookahead (hence,
in this context, unbounded pushback) to distinguish between
	f
and	f                                                               (1)

    This is not a case of breaking anything that currently works.  It IS,
however, a case of "filling a much-needed gap".


-------------------- The strange case of -3 --------------------

    This isn't in the grimoire.  It's "Syntax Issue Op5".  The question
is this:  how are negative numbers to be handled?

    The answer in Edinburgh Prolog is a wee bit tricky, but it works
very nicely in practice.  A number like -3 is read as two tokens.
In a context where a prefix operator would be allowed, -3 is converted
by the parser to a single number, and in that case - binds more tightly
than any other operator and cannot be disabled.  In a context where an
infix operator is expected, -3 remains as two tokens.  In Edinburgh
Prolog, it follows that
	X is-I mod-2-3
is parsed as
	is(X,-(mod(-(I),-2),3))

    What does "Syntax Issue Op5" say?
	Adopted 2 all syntactic cases of negative numbers must be
	converted, but built-in predicates treat - as functor.

I'm sorry, but this is just as puzzling to me as it is to you.  What
I _think_ it means is that -3 is to be read as a single number, and
that
	integer(-3)
and	atomic(-3)
are to succeed, but that
	functor(-3, -, 1)
is ALSO to succeed.  WALLOP!  Nearly all of my programs just bit the
dust!  There used to be this nice little property you could trust:

	for all C, atomic(C) <-> nonvar(C) & functor(C, C, 0)

I don't see any sense in destroying this property.  Anyone who wants
-(3) can WRITE -(3).  I hope I have misunderstood.

    What does the grimoire say?  Well, it's rather interesting.

L13	number = ["-"], digits, [".", digits], [exponent] ;

L18	digits = digit, {digit} ;

L14	exponent = ("E" | "e"), ["+" | "-"], digits ;

There's a rather nasty problem here, in that it requires unbounded
lookahead to recognise one of these things:
	1e0000000000000000000000000000000000000000000000000
is a number, but
	1e0000000000000000000000000000000000000000000000000_
is 1 with the atom 'e0000000000000000000000000000000000000000000000000_'
after it.  The Turing definition adopts the principle of maximal scan to
rule such things out as lexical errors, but the grimoire does NOT adopt
the principle of maximal scan and tries to handle such things directly in
the grammar.

    However, the real snag with this is that it breaks code such as
	three_less(N, M) :- M is N-3 .
because that -3 is now a single token.  Maybe it's not what the BSI
committee _mean_, but it's what the grimoire _says_.


-------------------- Enough for today --------------------

    There is a minor glitch:  the grimoire requires the ISO 8859 character
with code 223 to be treated as an upper case letter.  The DIS 8859 draft I
was sent as one of the BSI documents says that the character with that code
is the German "sz" character (looks like a Beta), which is a lower-case
letter.  This may be due to me having an old draft:  if someone knows where
I can get the latest draft or the actual standard, could you tell me?

    I had a lot more that I meant to say in this message, arguing the point
that trying to define the syntax of Prolog as if it were Pascal is the
wrong _kind_ of definition, but this message is already too long.

    Am I being unjust to the grimoire?  After all, it _is_ labelled
"draft for comment".  Bearing in mind that the BSI committee was set up
in 1984, I don't think I'm being unjust.