Path: utzoo!mnetor!uunet!mcvax!ukc!eagle!icdoc!doc.ic.ac.uk!cdsm
From: cdsm@doc.ic.ac.uk (Chris Moss)
Newsgroups: comp.lang.prolog
Subject: Prolog Standardization
Message-ID: <220@gould.doc.ic.ac.uk>
Date: 3 Mar 88 16:36:55 GMT
Sender: news@doc.ic.ac.uk
Reply-To: cdsm@doc.ic.ac.uk (Chris Moss)
Organization: Dept. of Computing, Imperial College, London, UK.
Lines: 136

I realised recently that many people who've commented about the Prolog 
standard have never actually read it. I'm therefore posting a description
of the latest proposal--it only covers syntax, but that is what most
people get worried about first of all! It is much closer to "Edinburgh"
than some previous proposals, but does try to regularize some of the
obvious defects. (The S-Expression syntax isn't covered here; if you want
to know about it, ask!)

--------------------------------------------------------------------------
The following is a brief description of the syntax proposed for the Prolog
standard. It is designed to clarify the syntactic issues by concentrating
on the points of difference between it and other implementations rather
than introducing it for those with no experience of the language. It
should be readable by users of any current dialect of Prolog. It does not
in general attempt to justify the choices: for this refer to the issue
sheets.

Prolog uses a number of rather arbitrary symbols for "if", "and", "or",
etc. By and large the standard follows those used by the most widely
copied implementation: DEC-10 Prolog from Edinburgh. 

e.g.	likes(rabbit, mole).
	likes(rabbit, owl) :- 
		likes(owl, rabbit)
		; obeys(owl, rabbit),
		  helps(owl, rabbit).

i.e. if is ":-", and is ",", or is ";" and the clause terminator is ".".
The alternative symbols '&' and '|' may be used for 'and' and 'or'
respectively and are mapped into the same symbols if used in a term. Thus
the clause above may be written:

	likes(rabbit, mole).
	likes(rabbit, owl) :- 
		likes(owl, rabbit)
		| obeys(owl, rabbit) &
		  helps(owl, rabbit).

To indicate variables, two conventions may be used: a variable may start
with a capital (large) letter or the character "_" (underscore) and it may
contain any other digits or letters or underscores. Thus A, Person,
_person, _123 and _1_2 are variables, but *A, t-lit and _one-more are not.

Lists use exclusively the square bracket convention: [a,b,c] is a proper
list and [a,b,c|X] is a list of which the fourth member is a variable. For
reasons stated below, a.b cannot be written in the language, though the
cons pair [a|b] unifies with the normal functional term '.'(a,b) (i.e. the
functor of a list is '.').

An atomic symbol may always be placed in single quotes (e.g.
'an-atomic-symbol') but the two most common ways of writing it use no
extra marks: an identifier starting with a small letter and consisting
only of letters, digits and the symbol "_"; and a graphic symbol composed
of any number of other symbols excluding those (e.g. ()][,;) used for
special purposes in the syntax. Thus +, //, <=> are all graphic symbols
which can abut directly to identifiers and special symbols such as ",".
For historical reasons, ! is a special symbol, so the sequence :-!. is
treated as three symbols not one, though ! is an atom not a punctuation
mark.

The status of "." is special. It is the clause terminator and thus has a
key role in parsing and error recovery, and also occurs in real numbers.
However it can be used in graphical symbols as long as it does not appear
alone (e.g. in =..). In many implementations, the clause terminator is
identified by the presence of white space (space/new line) after it, so
that (with appropriate declarations) the sequence a(b).c(d). is
interpreted as a single clause with principle functor '.' (which is, as
noted above, the list functor). In standard Prolog the presence or absence
of white space has been reduced as far as possible, so this distinction is
not made. A consequence of this is that the infix dot notation for lists
is not available, which anyway has ambiguity problems when real numbers
are considered (consider 1.2.3.4).

Operators may be declared in a way that is close to that provided in
DEC-10 Prolog. This does not apply to the level of system operators such
as ',' and ';' which are fixed in the syntax and cannot be changed. The
aim of this is to allow better error detection and reporting than is
possible in a fully general system. Prefix, infix and postfix operators
may be declared but the combination of operators that may be declared has
been limited so that the parsing of any expression is unambiguous and may
be parsed without backtracking. The rules that ensure this are as follows:

1. A right-binding operator binds more strongly than a left-binding
operator with the same declared priority number.

2. A symbol which has an operator declaration in force may not be used as
the operand of an expression though it may appear as an isolated symbol.
e.g. "a + +" is illegal, though the 'normal' form of this expression as
it is interpreted by DEC-10 Prolog, +(a,+), would be allowed.

3. If a prefix operator appears before an open bracket symbol it must be
separated by clear space otherwise it will be interpreted as the normal
functional form. This only affects the interpretation of a few expressions
and one is not bound in general to place a functor next to its opening
bracket. Thus the common practice of leaving a space between function or
predicate symbol and opening bracket is allowable, unless there is a
prefix declaration for the function symbol. In the latter case the
interpretation will be identical if there is only one argument, but
- (a,b) would be interpreted as -(','(a,b)).

4. One cannot have two operator definitions for the same symbol current at
the same time, with the exception of combination of the pair: prefix and
infix. This is to allow the such uses as unary minus, which are firmly
embedded in mathematical notation.

The effect of these rules is that it is impossible to construct any
ambiguous expressions and all Prolog implementation will interpret every
expression the same.

The notation for operator declarations is, as currently proposed, similar
to DEC-10 with the exception that only operators in the range 0-999 may be
declared. i.e. 999 is the least binding, or most dominating, and 0 is the
most binding or "highest priority" or "first to be applied". Prefix infix
and postfix operators are represented by the symbols fx, xfx, and xf
respectively for non-associative operators and fy, yfx, xfy and fy for
associative operators. Other notations have been proposed but do not seem
preferable for other than stylistic reasons, and style has generally been
subordinated to common practice in formulating the standard.

Strings are represented in a program between double quotes, but a "C"-like
convention has been adopted for special characters. e.g. "\n" represents a
newline symbol and "\"a quotation\"" represents a string which includes
the quotation marks. The semantics of strings are different to DEC-10; a
string is a separate data type which does not unify with atoms or lists
and several built-in predicates are provided to process them efficiently,
including converting them to lists of small integers. The advantage of
this is the freedom allowed to the implementer to provide efficient
storage and manipulation of longer strings (atoms have a limited length).

There is also an "S Expression" Syntax proposed, which is not dealt with 
here. This is designed to be compatible with normal LISP usage, but 
mapping semantically into the same structures so that programs may be easily
transferred from one syntax to the other.

----------------------------------------------------------------------------
Chris Moss. March 1988.