Path: utzoo!utgpu!water!watmath!clyde!rutgers!sri-unix!quintus!ok
From: ok@quintus.UUCP (Richard A. O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Prolog Grammar Rules
Keywords: DCGs tutorial
Message-ID: <646@cresswell.quintus.UUCP>
Date: 13 Feb 88 06:31:30 GMT
Organization: Quintus Computer Systems, Mountain View, CA
Lines: 366

One of the things I really like about Prolog is grammar rules.
Since some commercial Prologs do not support grammar rules (such as
Turbo Prolog -- do correct me if I'm wrong, I'm used to that...),
and since many Prolog text-books do not give them the emphasis they
deserve, I thought it might be worth saying a few words here about
them.

    There's a little joke which goes something like this: Prolog was
    invented by Kowalski in 1974 and implemented by Colmerauer in
    1973.  (The dates are wrong, but their *order* is right!)
    What Colmerauer implemented was (*very* roughly speaking) grammar rules.
    They came first!

We can approach grammar rules from two angles:
 -- parsing
 -- building lists.

From the parsing perspective, the key idea is that we can regard a
non-terminal such as 'date' as a relation on sequences.
For example, we can regard a grammar rule such as

	date --> day, "-", month, "-", year.

as *meaning*

	'date' is true of a sequence ABCDE if
	    there exist sequences A, B, C, D, E such that
	    ABCDE = A ^ B ^ C ^ D ^ E and
	    'day' is true of A and
	    B is "-" and
	    'month' is true of C and
	    D is "-" and
	    'year' is true of E.

It turns out to be more convenient to represent a non-terminal as a
relation between pairs of positions in a sequence.  To boot-strap
this interpretation, we need a predicate
	connects(X, S0, S)
which means "S0 and S are positions in the same sequence, S0 is one
element to the left of S, and the element between them is X".  If
we are working with lists,
	connects(X, [X|S], S).
Any rate, given this one basic predicate, we can translate the
grammar rule above to

	date(S0, S) :-		%  S0 = A^B^C^D^E^S
		day(S0, S1),	%  S0 = A^S1,  S1 = "-"^S2
		connects(`-`, S1, S2),
		month(S2, S3),	%  S2 = C^S3,  S3 = "-"^S4
		connects(`-`, S3, S4),
		year(S4, S).	%  S4 = E^S

Now the fact that we can so easily transliterate a grammar rule into
Prolog doesn't mean a lot.  What *is* interesting is that the simple
top-to-bottom left-to-right execution strategy of Prolog gives us a
parser, and not just any old parser, but the familiar old recursive
descent parser which is so thoroughly understood.  (For example, if
the grammar you want to parse is LL(k) for some smallish k, and you
are parsing ground lists, you can add "green" cuts to your Prolog
code to let Prolog know that it is determinate, and the theory of
recursive descent parsing tells us exactly where to put those cuts.)

It is worth stressing that there is nothing about grammar rules which
forces them to be based on lists.  The grammar rule translators in the
Prolog systems which have them are typically based on lists, but you
can easily write your own translator.

It's also worth stressing that although Prolog grammar rules work by
adding two extra arguments at the end, there's nothing sacred about
that either.  Fernando Pereira's "eXtraposition Grammars" (XGs) add
four extra parameters.

Now let's approach grammar rules from the second angle:  building lists.
While it would be a bad idea to use lists for everything in Prolog,
lists are very important, and you often write code that generates lists.

For example, suppose we have a problem where
	units are made of boards
	boards contain components
	some components are resistors
and we want a predicate that will collect all the resistor descriptions
in the description of a given unit.  We might write something like this:

	unit_resistors(unit(_,_,_,Boards,_,_,_), Resistors) :-
		boards_resistors(Boards, Resistors, []).

	boards_resistors([], Rs, Rs).
	boards_resistors([Board|Boards], Rs0, Rs) :-
		board_resistors(Board, Rs0, Rs1),
		boards_resistors(Boards, Rs1, Rs).

	board_resistors(board(_,_,Components,_,_,_), Rs0, Rs) :-
		components_resistors(Components, Rs0, Rs).

	components_resistors([], Rs, Rs).
	components_resistors([Component|Components], [Component|Rs1], Rs) :-
		resistor(Component),
		!,
		components_resistors(Components, Rs1, Rs).
	components_resistors([_|Components], Rs0, Rs) :-
		components_resistors(Components, Rs0, Rs).

I don't know about you, but I get awfully tired of writing stuff like
that over and over again.  Can't something be done?  It can.  We are
describing a sequence.  How do you describe sequences?  With grammar rules!
We get

	boards_resistors([]) --> [].
	boards_resistors([Board|Boards]) -->
		board_resistors(Board),
		board_resistors(Boards).

	board_resistors(board(_,_,Components,_,_,_)) -->
		components_resistors(Components).

	components_resistors([]) --> [].
	components_resistors([Component|Components]) -->
		{ resistor(Component) },
		!,
		[ Component ],
		components_resistors(Components).
	components_resistors([_|Components]) -->
		components_resistors(Components).

Note that the way the results are stitched together into a list is (a)
exactly what we want, (b) boring, (c) invisible.  Best of all, since we
didn't type it in, we couldn't get it wrong.

Once again, it is important to realise that although lists are the
most important application of this idea, the principle applies to
anything.  In fact, we can look at programs like this as transforming
a state (the current position in a list) into another state (the next
position in a list), and we can do any transformation we like.

As an example, suppose that instead of collecting the resistor
descriptions, you just want to count them.  We would change the
top-level call and the second clause of components_resistors//1:

	unit_resistor_count(unit(_,_,_,Boards,_,_,_), Count) :-
		boards_resistors(Boards, 0, Resistors).


	...
	components_resistors([Component|Components]) -->
		{ resistor(Component) },
		!,
		add(1),
		components_resistors(Components).
	...

where
	add(Addend, Augend, Result) :-
		Result is Augend+Addend.


Again, there is nothing magic about this, and nothing sacred about
having a single state parameter.  A good Prolog system will let
you define your own translations.

So that's why grammar rules are interesting.

What, however, do they look like?  Which of the things your Prolog
system offers are features, and which are bugs?

Here is a Prolog program which recognises valid grammar rules.
Add a few extra arguments, chant the magic phrase "partial
execution of a meta-interpreter", and you'll have a grammar rule
translator.  (Well, there's error reporting to worry about too.
That's probably the hardest part.)  I take clause_body/2 as given.
The code has been written to provide you with another test at the
same time:  everything here is valid DEC-10 Prolog syntax and is
accepted by Quintus Prolog, and by the public-domain tokeniser and
parser.  If your Prolog complains, it's broken.

	grammar_rule(-->(Head,Body)) :-		/* Note 1 */
		grammar_rule_head(Head),
		grammar_rule_body(Body, yes).

	grammar_rule_head(','(NonTerminal,PushBack)) :- !,
		nonterminal(NonTerminal),
		proper_list(PushBack).		/* Note 2 */
	grammar_rule_head(NonTerminal) :-	/* Note 3 */
		nonterminal(NonTerminal).

	nonterminal(NonTerminal) :-
		nonvar(NonTerminal),
		functor(NonTerminal, Symbol, _),
		atom(Symbol).

	proper_list(List) :-
		nonvar(List),
		proper_list_1(List).

	proper_list_1([]).
	proper_list_1([_|List]) :-
		proper_list(List).

	grammar_rule_body(Var, _) :-		/* Note 4 */
		var(Var),
		!.
	grammar_rule_body(!, CutsOk) :- !,	/* Note 5 */
		CutsOk = yes.			/* Note 6 */
	grammar_rule_body(','(And,Then), CutsOk) :- !,
		grammar_rule_body(And, CutsOk),
		grammar_rule_body(Then, CutsOk).
	grammar_rule_body(;(IfThen,Else), CutsOk) :-
		nonvar(IfThen),
		IfThen = ->(If,Then),
		!,
		grammar_rule_body(If, no),
		grammar_rule_body(Then, CutsOk),
		grammar_rule_body(Else, CutsOk).
	grammar_rule_body(;(Or,Else), CutsOk) :- !,
		grammar_rule_body(Or, CutsOk),
		grammar_rule_body(Else, CutsOk).
	grammar_rule_body(->(If,Then), CutsOk) :- !,
		grammar_rule_body(If, no),
		grammar_rule_body(Then, CutsOk).
	grammar_rule_body({ }(Goal), CutsOk) :- !,	/* Note 7 */
		clause_body(Goal, CutsOk).
	grammar_rule_body([], _) :- !.			/* Note 8 */
	grammar_rule_body([_|Tail], _) :- !,		/* Note 9 */
		proper_list(Tail).
	grammar_rule_body(NonTerminal, _) :-		/* Note 10 */
		nonterminal(NonTerminal).

Notes.
    1.	Unlike a clause, a grammar rule *must* have the arrow there.
	The analogue of
		p(X, Y) :- true.
	is
		p(X, Y) --> [].
	Don't forget the arrow or the empty sequence of literals.
	This is a mistake I have to watch out for.

    2.	A rule can look like
		head --> body
	or like
		head, [t1,...,tn] --> body.
	The pushback list is meant for handling extraposition in
	natural language parsers, and is definitely for advanced
	players (who disdain it).  The grammar rule translator in
	the first two editions of Clocksin & Mellish got pushback
	wrong, which shows how often the feature is used!

    3.	A NonTerminal is exactly the kind of thing that you could
	write as the head of a clause.  Two arguments will be
	added to it.  So if you wrote
		date(D, M, Y) --> .....
	you'll get
		date(D, M, Y, S0, S) :- ...
	Two arguments are added to every nonterminal in a grammar rule.

	Here's a convention of mine which you might like to adopt.
	(Better yet:  suggest an improvement!)
	We identify a predicate with predicate symbol P and arity N
	by writing the term P/N.  Now a non-terminal with symbol S
	and arity M corresponds to a predicate with predicate symbol
	S and arity N+2, but you have to think twice before you
	realise that the nonterminal
		month(X)
	you see in a program corresponds to the predicate month/3.
	So I've taken to writing S//M, with the interpretation that
	this means the same as S/(M+2), and the convention that it
	is only used when S/(M+2) is in fact defined by grammar rules.
	So I'd refer to boards_resistors//1 in the example above.
	If you don't find this helpful, ignore it.

    4.	There are two "correct" things a grammar rule translator can
	see if it sees a variable when it is expecting a grammar rule
	body.  Just as a variable in a clause body should be treated
	as something which will call the GOAL the variable is bound
	to, so a variable in a grammar rule body should be treated as
	something which will pass the right arguments to the NONTERMINAL
	the variable is bound to.  For example, you might want to write

	:- op(100, xf, *).

	NT* --> [].
	NT* --> NT, NT*.

	You would be rightly upset if the translator quietly did the
	wrong thing with this.  So the two correct things are
	- to translate a variable to something like
		phrase(TheVariable, S1, S2)
	- to report a translation fault when the translation is done,
	  and plant code to report the error at run-time, e.g.
		format(user_error, '~N! NT variable executed~n'), fail
	The first translation is preferred.

    5.	Yes, cuts are allowed in grammar rule bodies.  So are
	and-then, if-then-else, or-else, and if-then.  How can you
	predict which things are allowed?  All the basic control
	structures are allowed.  What about negation?  Well, no.
	In this context, negation can't possibly be sound.
	The control structures, and only the control structures,
	are transparent.

	What should happen if your Prolog system has other control
	structures, such as once/1, forall/2, or Arity's NIH "snips"?
	(If you are converting from Arity Prolog to another dialect,
	[! SnippedStuff !] ==> ( SnippedStuff -> true) should do it.)
	The honest answer is that nobody has thought about it much.
	But a good answer would be that if nonterminals embedded in
	the control structure could extend the list, the control
	structure should be transparent.  Otherwise it should probably
	be reported by the translator.  For example, things like
	forall/2, findall/3, and so on, where all the solutions found
	by the embedded goals are failed over, would not be candidates
	for transparency, but "soft cuts" and "cases" would be.

    6.	The "CutsOk" business is simply pointing out that you
	shouldn't have cuts inside the If part of an If->Then;Else
	or If->Then.

    7.	To include a test in a grammar rule, a test that doesn't
	match any of the input sequence, you write it inside curly
	braces.  Any clause body can appear there: {a ; b} is
	legal and means the same as {(a ; b)}.

    8.	The empty list matches the empty sequence.  It is the
	grammar rule body analogue of 'true'.

    9.	A list of n terms matches n terms in the sequence if the
	unifications go through.  Whether the sequence being matched
	has to be a list or not depends on how the 'connects'
	operation is defined.  If you don't use this construct, and
	don't use pushback, your code should not depend on sequences
	being represented by lists.

   10.	Anything other than a control structure, a list, or a clause
	body inside curly braces, is taken to be a nonterminal.
	Note that there is supposed to be no connexion whatsoever
	between the built-in predicate integer/1 and the non-terminal
	integer//1 (the predicate integer/3).  If I want to define

	integer(X) --> .....

	there is no reason for the system to stop me.  On the other
	hand, it *would* be rather odd.  So a grammar rule translator
	would do well to report non-terminals which look like built-in
	predicates (though it needn't), but on no account should it
	fail to treat even such an odd goal as a non-terminal.

	There is a Prolog system around which will take a rule like

	constant(X) --> integer(X).

	and treat it as the equivalent of

	constant(X) --> {integer(X)}.

	***without warning you***.  Now if you had *meant* that, you
	could have written the curly braces, and the system in question
	is perfectly happy to let you *define* integer//1, it just
	quietly stops you using it.  This is a bug.  Printing a
	warning message about the oddity and producing the correct
	translation would be a definite feature.

I said that omitting the "--> []" in the base case of a non-terminal
is a mistake I have to watch out for.  This is probably the first
thing to look for if you have a grammar which is mysteriously failing.
If you have a cross-referencer, such as the one Quintus provide, you
should watch out for predicates which are defined but not called.

The book by Fernando Pereira and Stuart Shieber may be of interest.