Path: utzoo!mnetor!uunet!mcvax!ukc!eagle!icdoc!doc.ic.ac.uk!cdsm From: cdsm@doc.ic.ac.uk (Chris Moss) Newsgroups: comp.lang.prolog Subject: Prolog Standardization Message-ID: <220@gould.doc.ic.ac.uk> Date: 3 Mar 88 16:36:55 GMT Sender: news@doc.ic.ac.uk Reply-To: cdsm@doc.ic.ac.uk (Chris Moss) Organization: Dept. of Computing, Imperial College, London, UK. Lines: 136 I realised recently that many people who've commented about the Prolog standard have never actually read it. I'm therefore posting a description of the latest proposal--it only covers syntax, but that is what most people get worried about first of all! It is much closer to "Edinburgh" than some previous proposals, but does try to regularize some of the obvious defects. (The S-Expression syntax isn't covered here; if you want to know about it, ask!) -------------------------------------------------------------------------- The following is a brief description of the syntax proposed for the Prolog standard. It is designed to clarify the syntactic issues by concentrating on the points of difference between it and other implementations rather than introducing it for those with no experience of the language. It should be readable by users of any current dialect of Prolog. It does not in general attempt to justify the choices: for this refer to the issue sheets. Prolog uses a number of rather arbitrary symbols for "if", "and", "or", etc. By and large the standard follows those used by the most widely copied implementation: DEC-10 Prolog from Edinburgh. e.g. likes(rabbit, mole). likes(rabbit, owl) :- likes(owl, rabbit) ; obeys(owl, rabbit), helps(owl, rabbit). i.e. if is ":-", and is ",", or is ";" and the clause terminator is ".". The alternative symbols '&' and '|' may be used for 'and' and 'or' respectively and are mapped into the same symbols if used in a term. Thus the clause above may be written: likes(rabbit, mole). likes(rabbit, owl) :- likes(owl, rabbit) | obeys(owl, rabbit) & helps(owl, rabbit). To indicate variables, two conventions may be used: a variable may start with a capital (large) letter or the character "_" (underscore) and it may contain any other digits or letters or underscores. Thus A, Person, _person, _123 and _1_2 are variables, but *A, t-lit and _one-more are not. Lists use exclusively the square bracket convention: [a,b,c] is a proper list and [a,b,c|X] is a list of which the fourth member is a variable. For reasons stated below, a.b cannot be written in the language, though the cons pair [a|b] unifies with the normal functional term '.'(a,b) (i.e. the functor of a list is '.'). An atomic symbol may always be placed in single quotes (e.g. 'an-atomic-symbol') but the two most common ways of writing it use no extra marks: an identifier starting with a small letter and consisting only of letters, digits and the symbol "_"; and a graphic symbol composed of any number of other symbols excluding those (e.g. ()][,;) used for special purposes in the syntax. Thus +, //, <=> are all graphic symbols which can abut directly to identifiers and special symbols such as ",". For historical reasons, ! is a special symbol, so the sequence :-!. is treated as three symbols not one, though ! is an atom not a punctuation mark. The status of "." is special. It is the clause terminator and thus has a key role in parsing and error recovery, and also occurs in real numbers. However it can be used in graphical symbols as long as it does not appear alone (e.g. in =..). In many implementations, the clause terminator is identified by the presence of white space (space/new line) after it, so that (with appropriate declarations) the sequence a(b).c(d). is interpreted as a single clause with principle functor '.' (which is, as noted above, the list functor). In standard Prolog the presence or absence of white space has been reduced as far as possible, so this distinction is not made. A consequence of this is that the infix dot notation for lists is not available, which anyway has ambiguity problems when real numbers are considered (consider 1.2.3.4). Operators may be declared in a way that is close to that provided in DEC-10 Prolog. This does not apply to the level of system operators such as ',' and ';' which are fixed in the syntax and cannot be changed. The aim of this is to allow better error detection and reporting than is possible in a fully general system. Prefix, infix and postfix operators may be declared but the combination of operators that may be declared has been limited so that the parsing of any expression is unambiguous and may be parsed without backtracking. The rules that ensure this are as follows: 1. A right-binding operator binds more strongly than a left-binding operator with the same declared priority number. 2. A symbol which has an operator declaration in force may not be used as the operand of an expression though it may appear as an isolated symbol. e.g. "a + +" is illegal, though the 'normal' form of this expression as it is interpreted by DEC-10 Prolog, +(a,+), would be allowed. 3. If a prefix operator appears before an open bracket symbol it must be separated by clear space otherwise it will be interpreted as the normal functional form. This only affects the interpretation of a few expressions and one is not bound in general to place a functor next to its opening bracket. Thus the common practice of leaving a space between function or predicate symbol and opening bracket is allowable, unless there is a prefix declaration for the function symbol. In the latter case the interpretation will be identical if there is only one argument, but - (a,b) would be interpreted as -(','(a,b)). 4. One cannot have two operator definitions for the same symbol current at the same time, with the exception of combination of the pair: prefix and infix. This is to allow the such uses as unary minus, which are firmly embedded in mathematical notation. The effect of these rules is that it is impossible to construct any ambiguous expressions and all Prolog implementation will interpret every expression the same. The notation for operator declarations is, as currently proposed, similar to DEC-10 with the exception that only operators in the range 0-999 may be declared. i.e. 999 is the least binding, or most dominating, and 0 is the most binding or "highest priority" or "first to be applied". Prefix infix and postfix operators are represented by the symbols fx, xfx, and xf respectively for non-associative operators and fy, yfx, xfy and fy for associative operators. Other notations have been proposed but do not seem preferable for other than stylistic reasons, and style has generally been subordinated to common practice in formulating the standard. Strings are represented in a program between double quotes, but a "C"-like convention has been adopted for special characters. e.g. "\n" represents a newline symbol and "\"a quotation\"" represents a string which includes the quotation marks. The semantics of strings are different to DEC-10; a string is a separate data type which does not unify with atoms or lists and several built-in predicates are provided to process them efficiently, including converting them to lists of small integers. The advantage of this is the freedom allowed to the implementer to provide efficient storage and manipulation of longer strings (atoms have a limited length). There is also an "S Expression" Syntax proposed, which is not dealt with here. This is designed to be compatible with normal LISP usage, but mapping semantically into the same structures so that programs may be easily transferred from one syntax to the other. ---------------------------------------------------------------------------- Chris Moss. March 1988.