Path: utzoo!mnetor!uunet!husc6!sri-unix!quintus!ok
From: ok@quintus.UUCP (Richard A. O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: BSI syntax
Message-ID: <797@cresswell.quintus.UUCP>
Date: 23 Mar 88 05:11:45 GMT
References: <234@gould.doc.ic.ac.uk>
Organization: Quintus Computer Systems, Mountain View, CA
Lines: 490

In article <234@gould.doc.ic.ac.uk>, cdsm@doc.ic.ac.uk (Chris Moss) writes:
> Forwarded for Roger Scowen -- KRG0@gm.rl.ac.uk
>  
> RESPONSE TO COMMENTS FROM RICHARD O'KEEFE ON PROLOG STANDARDIZATION
>  
> GENERAL RESPONSE
>  
> Richard O'Keefe started by saying that he would respond to the
> mailing from Chris Moss. In fact many comments refer
> to a document (Prolog syntax, Draft 4.1) that
> most news readers (and members of the ISO and BSI panels) will
> not have seen.
> This seems somewhat unfair on readers who will be unable to judge 
> whether draft, criticism, or rebuttal is justified.

My postings were in fact a response to Chris Moss's mailing.  They were
not confined to the content of that mailing, true.  It seemed to me that
Chris Moss's mailing implied that the BSI syntax was in a satisfactory
state, and that it wasn't as difference from the de facto standard as
people feared.  I set out to show that neither of those statements is
true, and I believe that I succeeded.

Many comments did refer to a document that most news readers won't have
seen.  But then, most news readers won't have seen ***ANY*** of the BSI
documents.  Am I then to say nothing?   As for fairness to readers,
(a) I was quoting from the very latest document I had.
    Surely it would be more unfair to quote from something I believed
    to have been superseded?
(b) The "February 88" and "Feb 88" documents arrived in my mailbox here
    in the same week.  I had no way of telling who had or had not
    received the document I was quoting.  All I knew was that this was
    the latest document available, sent to me by the author.
(c) In order to permit readers to judge for themselves whether my
    criticisms were justified, I quoted extensively from the document.
    I did not ask anyone to take it on faith that this or that was the
    case:  where the grimoire appeared to say something particularly
    silly I exhibited the rules responsible.  This is unfair?

> First some general comments. The objective is to define an
> International Standard for the programming language Prolog.
> This means that standard conforming programs will run correctly
> on standard conforming processors, neither more nor less.
> It will not limit implementers from introducing new features and
> facilities into their Prolog compilers. 
>  
> Neither will it mean programmers cannot use such extensions; only
> that if they do, their programs will not conform to the standard.
>  
This is a little misleading.  The general rule in other languages is
that implementors can add extensions, provided that such extensions
are either illegal or undefined in the standard.  Thus a Pascal compiler
can provide alphabetic labels as an extension.  But an implementor
should not provide an extension which alters the meaning of a program
which the standard would have ruled legal.

Let's apply this to the case of :- read(_). directives in a file which
is being consulted or compiled.  Specifically, let's consider a file
which looks like
	:- read(_).
	p(a).
and has nothing else in it.  Does this define p, or does it not?  The
BSI grammar, in all versions, provides the syntax of entire files:
according to the grimoire this MUST mean exactly one directive followed
by exactly one clause.  Since this is a defined and legal file, it would
be most improper for an implementor to give it any other meaning.
Therefore, reading out of a file being compiled or consulted is NOT
a permitted extension.  (This wouldn't bother Quintus, but it is legal
in some other Prologs.)

Let's apply this to another case:  functor/3.  It has always been the
case in DEC-10 Prolog that functor(1, 1, 0).  In at least one draft of
the BSI built-in predicates document, this has been required to raise
an error.  (BSI Prolog includes an error handling facility; needless
to say it doesn't look like IF/Prolog's or M-Prolog's or ...)  So a
BSI conforming program is entitled to rely on this error being raised,
and an implementor may NOT provide DEC-10 compatibility.

The ANSI C committee have found it necessary to explicitly indicate
which identifiers may be used by implementors.  (The list includes
all identifiers starting with "_" or "str" and there are others I
can't remember right at the moment.)  Why is this?  Because the
programmer needs a guarantee that the identifiers he has chosen for
his code won't be in conflict with an implementation.  For example,
(not)/1 is not defined in the BSI stuff, so Scowen says that an
implementation is free to define it.  But if the implementation is
free to do so, then the programmer ISN'T.  Since setof/3 is not in
the BSI Prolog language, a program which defines

	setof(List, Set) :-
		setof(List, [], Set).		

	setof([], Set, Set).
	setof([Head|Tail], Set0, Set) :-
		(   member(Head, Set0) ->
		    setof(Tail, Set0, Set)
		;   /* not member(Head, Set0) */
		    setof(Tail, [Head|Set0], Set)
		).

is a standard-conforming program.  But a Prolog system which is exactly
BSI except for providing setof/3 as an extension is a conforming processor.
Will such a conforming program run correctly on such a conforming
processor?  You must be joking.  So, taken in their ordinary sense,
the claim that "standard conforming programs will run correctly on
standard conforming processors", while true of some standards, is NOT
true of the BSI work, unless "standard conforming processors" is
construed very strictly as meaning "providing NO additional built-in
predicates".

You will recall that Fortran 77 provides the EXTERNAL and INTRINSIC
statements precisely to cope with this problem, and that ANSI C
provides the reserved-to-implementors list and #undef precisely to
cope with this problem.  BSI Prolog does have some reserved words,
but is ludicrously far from providing a solution to this problem.

> So some features of Edinburgh Prolog will not be in the standard 
> because although they fulfilled a need at one time, they are
> not a sensible longterm solution.

Let's be realistic.  There are languages on the horizon which are much
better approximations to logic programming than Prolog.  (NU Prolog has
been around for a while.)  There are lots of software engineering needs
which old Prolog completely failed to address, such as modules.  (Last
I heard, the consensus of the BSI Modules subcommittee was that they
would probably never agree.)  I think we ought to regard Prolog as a
stopgap; and that the goal of the standard should be to protect EXISTING
investments in Prolog.  Frankly, advocates of BSI Prolog, with its
use of user-supplied atoms as stream names, are in no position to talk
about sensible solutions.

************************************************************************
** It would be most interesting to have an explicit list of the features
** of Edinburgh Prolog which fulfilled a need at one time and are now
** disliked by the committee, and a description of their replacements.
************************************************************************

> >	(4) The basic structure of the BSI approach to syntax has been
> >	    to cut the Gordian Goose.  That is, instead of regarding the
> >	    (actually rather low) diversity of Prolog syntax as an
> >	    opportunity to be solved by making the language more powerful
> >	    (e.g. having a table-driven tokeniser), it has been treated as
> >	    a problem to be solved by inventing a new, more restricted,
> >           language.
>  
> Well, yes and no. Chris Moss has produced tests that give
> different results on every system tested so far. Perhaps there
> is rather more diversity than Richard O'Keefe realizes.
> One objective has been to define a syntax where many existing 
> systems would not generally disagree on the meaning of 
> standard-conforming programs. 
  
The amount of diversity one perceives depends on which "Prolog" systems
one decides to include in one's sample.  My sample includes only systems
whose implementors _tried_ to be Edinburgh (or at least Clocksin &
Mellish) compatible.  For example, AAIS Prolog is openly and frankly
not an Edinburgh-compatible system.  We may (and should) look to it for
ideas, but we should not include it in a sample of "Edinburgh compatible"
Prologs.  BIM Prolog has its own unique syntax; while we should perhaps
include the '-c' syntax of BIM Prolog in the sample, we should not
include BIM Prolog's native syntax.  If we go by numbers, then Turbo
Prolog should determine the syntax of standard Prolog.  If not by numbers,
by what?  Simple justice suggests that the Prologs to look at are the
Prologs whose authors TRIED to be compatible with one another.  Prudence
suggests the same sample.

But even if the diversity among the Prologs whose authors didn't suffer
from NIH-itis is much greater than I believe, that doesn't answer my
point.  What I said was that the diversity should be regarded "as an
opportunity to be solved by making the language more powerful (e.g.
having a table-driven tokeniser)".  [As an aside, this is no more than
Lisp and PopLog already have.]  It turns out that it is quite easy to
write a tokeniser which can handle all of
	ALS Prolog
	Arity Prolog
	BIM Prolog native syntax
	C Prolog
	DEC-10 Prolog
	PopLog	(nested comments)
	Quintus Prolog
	Stony Brook Prolog
and can almost handle ADA [ADA is no longer a trademark], simply by fiddling
with a table.  AAIS took exactly this approach (though their tokeniser is
not as flexible as mine).  I found it necessary to support several kinds
of quotes in my tokeniser:
	ATMQT		- the quoted thing is an atom (')
	STRQT		- the quoted thing is a string ($)
	LISQT		- the quoted thing is a list (")
	CHRQT		- the quoted thing is a character (`)
Suppose the standard were to adopt this approach, then they could rule,
if they wished, that the standard assignment was "->STRQT, with nothing
being assigned LISQT.  That needn't prevent me reading my existing code:
I'd be able to change the table while reading my old files.
[The best approach seems to be to associate a read table with a stream;
 naturally this is the approach PopLog takes.]

What I have in mind here is that a file would start with a directive
such as
	:- use_syntax(dec10).
or	:- use_syntax(standard).
or	:- use_syntax(als).

Especially if the tokeniser were made available to user code (as it is
in the DEC-10 Prolog library, or built-in in NU Prolog), the result would
be a much more powerful language at very little cost to the implementor.
And conversion from old dialects to the BSI dialect would be enormously
simplified.

Do we need to come up with a "best possible" tokeniser for the standard?
Of course not.

Again, what are we to do about syntactic variations, such as the
treatment of operators?  My answer, in 1984, was that the standard
should not specify read/1 and write/1, but should specify
	standard_read/1
	standard_write/1
and should allow users to redefine read/1 and write/1, but require
that the initial definitions be the standard one.  consult and compile
should use read/1, not standard_read/1, so that someone who wanted to
read M-Prolog files into standard Prolog could do so by suitably
defining read/1.

Now, if you are a self-appointed standards committee member determined
to impose your vision of what is a "sensible longterm solution" on
every Prolog user whether they like it or not, this sort of approach
won't seem all that attractive.  But if, like me, you think that the
people who matter in all this are the people who have paid money to
USE Prolog, and if, like me, you think that the fact that M-Prolog
is appalling is no reason to make life any harder for people with a
lot of data in M-Prolog format than we have to, you'll think that
letting people do

	read(Term) :- magyar_read(Term).

is obviously the way to go.	(It doesn't much matter how you install
your own code in the hook, the important thing is that there should be
a read-hook where you can install your own reader to be used by compile
and consult.)

> PROLOG CONTROL STRUCTURES AS SYNTAX
> >	(3) The attempt to describe Prolog control structures as *syntax*
> >	    is fundamentally misdirected.
> This is a matter of opinion. One reason for regarding Prolog control
> structures as *syntax* is so that a person or program reading
> a Prolog program can always recognize its overall structure.

It is not a matter of opinion.  Either I am right about this, or I am
wrong.  There is a very important reason for my belief:  Prolog is
simply not the sort of language for which this kind of thing can WORK.
Consider the difference between

	foo(X, P, Q, L) :- bag(X, (P & Q), L).
				  ^^^^^^^
and
	de_morgan((P & Q), (R | S)) :- de_morgan(P, R), de_morgan(Q, S).
		  ^^^^^^^
The first is code, and the treatment of it in the grimoire is appropriate.
(That is, it will be mapped to whatever "(and ?P ?Q)" would have been
mapped to in the BSI Lisp-like syntax.)
But the second is data, and the treatment of it in the grimoire is
NOT appropriate.  It will be mapped to whatever "(and ?P ?Q)" would
have been mapped to in the BSI Lisp-like syntax, but it SHOULD be
mapped to whatever "[& ?P ?Q]" would be mapped to.

If we consider a slightly different example:

	baz(X, P, L) :- bag(X, P, L).
			       ^
and
	de_morgan(not(P), R) :- de_morgan(P, R).
					  ^
we find the opposite problem: the second is data and will be mapped to
whatever "?P" will be mapped to in the BSI Lisp-like syntax, but the
first is code, and should be mapped to whatever "(and ?P)" would be
mapped to, BUT IT WON'T BE.

The trouble is that the grimoire tries to guess whether something is
code or data by looking at its form, but that's the wrong place to
look:  the place to look is the predicate being called.  And the
trouble is that we can't build that information into the grammar,
because the programmer can define new predicates with code-like arguments.

Let me stress this:
	the whole basis of the build-it-all-into-the-syntax approach
	is the assumption that code is code and data are data and
	never the twain shall meet.
This is true of Pascal.  It is true of Fortran.  It is almost true of C.
But it is utterly false of Lisp and Prolog.  A grammar of this type does
not make SENSE for Prolog any more than it makes sense for Lisp.

I hereby wager US$100, payable once to Chris Moss, that if the next draft
of the grimoire attempts to maintain this rigid distinction between code
and data, I will be able to find inconsistencies like the ones above in
it.  I don't think it's Chris Moss's fault:  if anyone can find a way of
working around this basic mistake (not HIS mistake, by the way, this is
the kind of grammar the BSI committee have always wanted), I'm sure that
Chris Moss could.  I make my wager *despite* my belief in Chris Moss's
competence, because I believe that it is _impossible_ for this approach
to work.  (If I do not receive said draft by the end of this year, the
wager will expire.)

> ',' and '&' AS OPERATORS
> > Oddly enough, if one takes the grimoire literally, the user CAN
> > declare ',' and '&' as operators, and can use them in that form.
> > However, ',' and '&' cannot possibly have the same precedence as
> > "," or "&" in BSI Prolog, and it seems clear that (A ',' B) and
> > (A '&' B) must be different terms.  
>  
> It is not intended that it will be possible to declare ',' and '&'
> as operators.
>  
There is nothing in the grimoire to say so, and it is a very odd restriction.
Intentions are beside the point:  all that matters is what the documents
actually say.  It *is* the intention that it should be possible to write
','(A,B) as a term, and it remains the case that ','(A,B) and '&'(A,B)
must be different terms, and if we take the grimoire literally, neither of
them can be the same as (A,B) or (A&B).

[Yes, I know about (P|Q) and (P;Q) in Dec-10 Prolog.  I have always thought
 and said that this was a mistake, and I think it is one of the very few
 areas where a difference between the standard and existing practice might
 be justifiable.
]

> QUOTE OPERATORS USED AS OPERANDS
> >	compare(R, X, Y) :-
> >		( X @> Y -> R = >
> >		; X @< Y -> R = <
> >		;	    R = =
> >		).
>  
> Richard O'Keefe realizes that the above example is intended to be
> syntactically incorrect in the standard. When operators are
> used as operands, there many problems of possible ambiguity.
> A cure is still under discussion, but some problems are
> avoided by the rule that "An operator used as an operand must be
> bracketted".
>  
Well, it would be more accurate to say that I COMPLAIN that it is
intended to be syntactically correct in the standard.
There isn't any problem of possible ambiguity here whatsoever.

	) :- (		:- must be infix
	X @> Y		@> must be infix
	Y -> R		-> must be infix
	R = >		= must be infix or suffix, has no suffix reading
	= > ;		> must be atom or prefix, has no prefix reading
	> ; X		; must be infix
    and so on
Now if = and > _both_ had a suffix reading, (R = >) would be ambiguous.
Since neither of them has, there is no ambiguity here at all.

The elimination of ambiguity is not a very good argument for breaking
existing UNAMBIGUOUS code!

> NEGATION
> >	not Goal :-		% "not" is not a built-in operator
> >	    (	ground(Goal) -> \+ Goal		% neither is "\+".
> >	    ;	signal_error(instantiation_fault(Goal,0))
> >	    ).
> It is intended that Standard Prolog will not contain 'not' or '\+'.
> Standard Prolog will not require systems to implement true
> logical negation and it would be misleading to include an 
> operator or predicate that implies that they have done so.
> Instead the way is left open for processors to implement a version
> of 'not' as an extension and still remain standard conforming.
> Standard Prolog will contain a built-in predicate 
> that implements 'negation by failure', i.e.
>       fail_if(G) :- call(G), !, fail.
>       fail_if(_).

My main point here was a semantic one.  Most other control structures
are defined in the grammar.  It seems odd that
	( G -> fail ; true )	should be in the grammar, but that
	fail_if(G)		which is identical in effect, should not.
Because one of these forms is in the grammar and the other isn't, they
have different properties.  For example,
	( 1 -> fail ; true )	is syntactically illegal, but
	fail_if(1)		is syntactically legal.
There are other differences as well.

If BSI Prolog contains fail_if/1, then it WILL contain '\+', but with
a different name.  Why not use an existing name for an existing
operation?  Looks to me like nonhicinventusitis.  \+ is a crossed-out
|-, meaning, obviously enough, "not provable".

> A program that resolves ambiguity implicitly is not acceptable as
> defining a standard; there must be further definition.
> One reason is that a program specifies too much. Some features need to
> remain 'implementation dependent' because we must not specify
> them, for example: the accuracy and largest values of floating point
> numbers, or the integer value corresponding to a character.
>  
> Another reason is that it is harder to understand and find errors.

It is harder to understand and find errors in a program you can run
than in a never-used-anywhere-else formalism?  Judging by the results,
this is the opposite of the truth.

What is the difference between the public-domain DEC-10 Prolog parser
and the BSI grimoire?  Both are programs, in a formalism based on logic.
Neither is more explicit or less explicit than the other, and both are
of similar size.  So what is the difference?  The difference is that
the public-domain DEC-10 Prolog parser CAN be run, HAS been run, and
has had most of the mistakes knocked out of it by actual experience.
The BSI grimoire is in a new formalism, the definition of which is
provided in ***NO*** BSI document (so that I had to keep guessing what
things meant), and each of the three drafts I have seen was riddled
with errors from end to end.  I haven't told you about all the problems
I found; there are nearly as many problems as rules!

The BSI Prolog group HAVE specified the integer value corresponding to
a character:  they require the ISO 8859 character set.  GREAT!
The DEC-10 public-domain ***parser*** does NOT specify the integer
value corresponding to a character (that's the tokeniser's job).
{The old tokeniser did have ASCII codes built in, but the current
version of the tokeniser uses 0'x syntax for the appropriate
constants to avoid that problem.}
If the BSI committee are so concerned to avoid character code problems,
how come they haven't got anything like 0'x or `x` (in a standard which
doesn't have to cope with existing code that uses ` as an atom, `x` is
a good notation for character code constants)?

The public-domain tokeniser doesn't specify anything more about floating
point numbers than what they look like, it relies on being provided with
a number_chars/2 predicate (which we want ANYWAY) do to the actual
conversion.

Note that the BSI grimoire says NOTHING about what happens if you write
a constant which exceeds the capacity of your implementation.  Is the
program
	p(1.2e3456).
a BSI-conforming program or not?  Well, syntactically it is, but the
lexical rules say nothing about what it MEANS.  For all that the
grimoire or any other BSI document I can recall says to the contrary,
a Prolog implementation which reads this as
	p(0.0).
is conforming.  This kind of thing is a real portability problem; it
exists with respect to integers too.  Is 1000000000000000000 a legal
Prolog term?  According to the grimoire, yes.  What does it mean?
The grimoire doesn't say.

> DISCLAIMER AND CONCLUSION
> Never rely on working papers and draft standards. They are subject to
> changes and review. All documents and working papers, however
> confidently expressed, are also subject to review. There will be no
> standard until the member bodies of ISO have approved it.

But what ELSE is there to comment on?

> Many countries, but not at present USA, have national Prolog panels
> coordinating their views on the emerging standard. I encourage all 
> Prolog implementers and users to participate in this effort in order that
> the eventual standard is one that preserves the best of the past
> and also provides development paths for the future.
>  
> Roger Scowen, 11 March 1988

Sorry, but it's too late.  Prolog implementors and users should have been
invited to contribute before the committee went on a four-year binge of
inventing their own language.  I explicitly suggested some years ago that
the people at WISDOM should be invited to participate, and was told that
that was out of the question.  I have put a lot of effort into writing
responses to the BSI stuff, and for all the feedback I've had I might as
well have been shouting into a vacuum.  The BSI committee having been
resolute in their contempt for existing Prolog users (I have repeatedly
urged that they should explicitly adopt a principle of not breaking
existing code without strong necessity, as the ANSI C committee did, and
the last I heard was that they had explicitly rejected any such idea),
I cannot regard "preserves the best of the past" as anything but a sick
joke.

Look, if you want to preserve the best of the past, why have you
renamed findall/3 to bag/3?  Why have you adopted ESI Prolog-2's
streams rather than Arity/Prolog's streams, despite having been
told about the problems?  Could it be something to do with the
fact that the author of that part of the standard worked for ESI,
not for Arity?  Why have you dropped nl/0 from the standard?  Why
is there no notation for character constants such as PopLog provides?
Why is the error handling facility all new, rather than resembling
either IF/Prolog or M-Prolog?

I have tried, I really have tried, to arouse interest in the BSI work
here in the US.  Do you know what has got in the way?  As soon as I
show people any of the BSI documents (take the 'standardisation issues'
documents as an example) they say "what a pack of turkeys" and assure
me that there is nothing to worry about.  I remain desperately worried
that there will be a BSI/ISO Prolog standard, and that it will be as
bad as the current drafts, and that it will do a great deal of damage.
What *really* worries me is that the people on the BSI committee don't
seem to realise how bad it is.