Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!uunet!mcsun!unido!ecrcvax!micha
From: micha@ecrcvax.UUCP (Micha Meier)
Newsgroups: comp.lang.prolog
Subject: Re: Ambiguities in WG17's syntax
Message-ID: <798@ecrcvax.UUCP>
Date: 8 Dec 89 11:12:45 GMT
References: <24702@cup.portal.com>
Reply-To: micha@ecrcvax.UUCP (Micha Meier)
Organization: ECRC, Munich 81, West Germany
Lines: 74

In article <24702@cup.portal.com> pgl@cup.portal.com (Peter G Ludemann) writes:
>In an earlier article, I said that I had tried to make an LR(k)
>grammar for Prolog.  Richard O'Keefe said that no programming
>language is LR(k).  Of course, Algol-60, C, Pascal, etc. are
>not LR(k) or even context-free.  Like all computer languages,
>they are context-sensitive.  But they can easily be made
>context-free by adding a bit of intelligence to the lexical
>analyzer.  And the same can be done for Prolog.

I'm afraid this is not the case. No matter how clever your lexical
analyzer is, the parser has to cope with terms like

	prefix infix infix infix ...

which cannot be resolved in k steps. In the following 'p' means prefix,
'i' infix, upper case letters mean that the corresponding symbol
is parsed as functor, lower case means that it is parsed
as an atom, ! means the main functor of the whole term.

1)	p.
2)	P! i.
3)	p  I! i.
4)	P! i  I  i.		or P  i  I! i.
5)	p  I! i  I  i.		or p  I  i  I! i.
6)	P! i  I  i  I  i.	or P  i  I! i  I  i.	or P  i  I  i  I! i.
	...
As you can see, on even lines the odd symbols represent functors
and the even ones atoms, on odd lines it is the opposite. If there are several
alternatives in one line, they can be resolved using the precedence
and associativity, but until your parser knows how many symbols
in sequence there are, it cannot start to apply these rules.

This is an instance of the general problem that the associativity
and precedence rules allow you to specify which functor
binds more than the other one, but it does not tell you
which symbol is to be taken as functor.

This problem can be resolved e.g. by imposing that an atom which has
been declared as a prefix operator always keeps its precedence,
even if used as atom, like in Quintus. This of course makes
the grammar less clear to normal users but seems to be a good compromise.

Another problem concerns operators which are both infix and postfix.
When you immagine a sequence of such operators, there is 2^(n-1)
of possible interpretations, e.g. for 4:

	a  Ip! ip  Ip  ip
	a  Ip  ip  Ip! ip
	a  Ip! ip  iP  iP
	a  Ip  ip  iP  iP!
	a  iP  Ip! ip  iP
	a  iP  Ip  ip  iP!
	a  iP  iP  Ip! ip
	a  iP  iP  iP  iP!

(Ip means it is taken as infix, iP as postfix).
Here, every symbol, except the first one, can be interpreted either
as atom or as functor, although the precedence and associativity
can help.

Quintus 2.0 and others have the non-documented feature that (a ip ip)
is parsed as ip(a, ip) even if the postfix precedence is lower than
the infix one:
| ?- op(200, yf, a), op(300, xfy, a).

yes
| ?- display(b a a).
a(b,a)
In fact, nobody has yet specified whether in this case the precedence
applies or whether the term is ambiguous.
We have solved this problem in Sepia by forbidding the infix/postfix
combination and this seems to be a reasonable restriction.

--Micha Meier