Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!cs.utexas.edu!think!ames!uhccux!munnari.oz.au!mudla!ok
From: ok@mudla.cs.mu.OZ.AU (Richard O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: More fun with WG17
Message-ID: <2759@munnari.oz.au>
Date: 20 Nov 89 14:21:42 GMT
References: <2609@munnari.oz.au> <696@sce.carleton.ca> <24257@cup.portal.com>
Sender: news@cs.mu.oz.au
Lines: 103

In article <24257@cup.portal.com>, pgl@cup.portal.com (Peter G Ludemann) writes:
> Two questions to Richard in OZ:

> Last time I checked -
> 	- blanks are not significant in ISO but they are
> 	  signficant in "Edinburgh" [for example, to determine
> 	  the meaning of comma in not (a,b) and not(a,b)]

Unfortunately, I've packed the documents away in a couple of boxes to
go back to Auckland, so I can't check the right answer.  However, the
claim "blanks are not significant in" WG17 Prolog is outrageously
false, always has been, and always SHOULD have been.  If you want a
language where blanks are not significant, go use Fortran.  Blanks are
significant in English:
	light housekeeper	\==	lighthouse keeper
	a nice boat		\==	an iceboat
Blanks are significant in Pascal.  Blanks are significant in C.  They
are significant in COBOL, Ada, PL/I, Lisp, Pop, ML, you name it.
(Think about x+++y in C.  It's legal.  And take a dekko at "preprocessor
numbers" in the C standard.)

This is one of the things which has changed quite often in BSI/WG17 Prolog.
N40 does make blanks significant after integers, where they weren't before:
	0x1
used to be the token 0 followed by the token x1.  In document N40, this is
a single hexadecimal number (1).  (Why in the name of sanity did WG17 copy
hexadecimal numbers from C when EXISTING Prolog systems allow any base from
2 (which I quite often use) to 36 (which is cute rather than useful)?)

Similarly, given the declaration
	:- op(600, xfx, e).
the character sequence 3e-22 unambiguously stood for e(3,-22) in Common
Prolog, but in WG17 Prolog it stands for 3.0e-22, so you have to add
spaces:	3 e -22.  (I suspect that WG17 tokenisers won't be smart enough
to treat 3e -22 or 3e- 22 as 3 e -22.)  Again, what WG17 have done is to
borrow syntax from C rather than from existing Prologs.

WG17 _have_ done some things with operators, for example
	[this,is,an,example]
is no longer legal.

By the way, 'not' is not a predefined operator in WG17 Prolog and has
no assigned meaning.  (At least they haven't insisted on breaking NU
Prolog and Quintus Prolog, where not/1 is sound (or, in Quintus's case,
apologetic) negation.)  The WG17 negation-as-failure form is 'fail_if'.

Without N40 to check, I turned to some notes I made on how WG17 would
break LPA Prolog.  (That's Logic Programming Associates.)  My summary
was
	At a rough guess, for me, WG17/LPA syntax differences would
	show up about one per 70 lines, excluding if->then;else.
(It is not just Quintus Prolog that WG17 would break, far from it!)
But one difference I did _not_ list was the "not(" "not (" case.


> 	- the reference grammars (C&M, Quintus, BSI) all
> 	  are ambiguous (at least, I couldn't figure out a
> 	  way to make them LR(k) --- which is a pretty good test
> 	  of ambiguity, I think)

I was chagrined just now to discover that section 2-8 of the Quintus
Prolog Reference Manual is explicitly headed "Formal Syntax".  Of
course it's only an INFORMAL formal syntax.  It is there to help human
readers, and is not runnable.  Things like
	unit-clause --> head  {where head is not otherwise a sentence}
are clearly not part of a "REFERENCE grammar".  (I just noticed that
the grammar in 2-8-2 doesn't mention if->then;else at all.  That will
give you a clue as to how formal it is.)  The grammar in section 2-8-3
is in rather better shape:  the comments about operators and "(" are
actually handled in the tokeniser:  "p(" and "p (" are actually
different sequences of tokens.

Trying to make a grammar LR(k) is a pretty lousy way of testing whether
an extensible operator-precedence language is ambiguous.  We start with
the simple fact that Prolog has 1201 priority levels (0..1200).  LR(k)
grammars can't handle parameters, so you essentially have to write out
1201 copies of the guts of the grammar.  I admire the tenacity of anyone
who did that.  We then observe that the grammars you tend to see are
mostly descended from the one in the DEC-10 Prolog manual, which didn't
_try_ to be precise, so it fails to point out that "p(" and "p (" are
actually two different token sequences which can be parsed unambiguously.
We next note the distinction between a grammar being ambiguous and a
language being ambiguous.  No Prolog term is assigned more than one
parse by the public-domain parser:  the language is unambiguous.  (But
it is not LR(k) for any finite k.)

It's worth reflecting that many programming languages aren't LR(k).
Algol wasn't, Pascal isn't, C isn't, Modula-2 isn't, Ada isn't.  You can
get an LR(k) grammar for those languages using a very simple technique:
you lie.

I'm afraid I don't know which grammar Ludeman means by "the [BSI] reference
grammar".  There were several, and the one in WG17's N40 document is
different again (and except for the excessive borrowing from C, it's a big
improvement on earlier grammars).  I'm quite sure that WG17 *intend* their
grammar to be locally unambiguous:  there are precisely two places where a
Prolog parser has to make a choice": atom/prefix (which they've blocked
by ruling that no operator can be taken as an atom unless it has parens
directly around it) and infix/postfix (which they've blocked by making it
illegal for an atom to be both an infix and a postfix operator).  The
grammar isn't suitable for LR parsing, but why should it be?  It's an
extensible operator-precedence language, and that's a heck of a lot SIMPLER
to parse than an LR language.