Path: utzoo!utgpu!water!watmath!clyde!rutgers!sri-unix!quintus!ok From: ok@quintus.UUCP (Richard A. O'Keefe) Newsgroups: comp.lang.prolog Subject: Prolog Grammar Rules Keywords: DCGs tutorial Message-ID: <646@cresswell.quintus.UUCP> Date: 13 Feb 88 06:31:30 GMT Organization: Quintus Computer Systems, Mountain View, CA Lines: 366 One of the things I really like about Prolog is grammar rules. Since some commercial Prologs do not support grammar rules (such as Turbo Prolog -- do correct me if I'm wrong, I'm used to that...), and since many Prolog text-books do not give them the emphasis they deserve, I thought it might be worth saying a few words here about them. There's a little joke which goes something like this: Prolog was invented by Kowalski in 1974 and implemented by Colmerauer in 1973. (The dates are wrong, but their *order* is right!) What Colmerauer implemented was (*very* roughly speaking) grammar rules. They came first! We can approach grammar rules from two angles: -- parsing -- building lists. From the parsing perspective, the key idea is that we can regard a non-terminal such as 'date' as a relation on sequences. For example, we can regard a grammar rule such as date --> day, "-", month, "-", year. as *meaning* 'date' is true of a sequence ABCDE if there exist sequences A, B, C, D, E such that ABCDE = A ^ B ^ C ^ D ^ E and 'day' is true of A and B is "-" and 'month' is true of C and D is "-" and 'year' is true of E. It turns out to be more convenient to represent a non-terminal as a relation between pairs of positions in a sequence. To boot-strap this interpretation, we need a predicate connects(X, S0, S) which means "S0 and S are positions in the same sequence, S0 is one element to the left of S, and the element between them is X". If we are working with lists, connects(X, [X|S], S). Any rate, given this one basic predicate, we can translate the grammar rule above to date(S0, S) :- % S0 = A^B^C^D^E^S day(S0, S1), % S0 = A^S1, S1 = "-"^S2 connects(`-`, S1, S2), month(S2, S3), % S2 = C^S3, S3 = "-"^S4 connects(`-`, S3, S4), year(S4, S). % S4 = E^S Now the fact that we can so easily transliterate a grammar rule into Prolog doesn't mean a lot. What *is* interesting is that the simple top-to-bottom left-to-right execution strategy of Prolog gives us a parser, and not just any old parser, but the familiar old recursive descent parser which is so thoroughly understood. (For example, if the grammar you want to parse is LL(k) for some smallish k, and you are parsing ground lists, you can add "green" cuts to your Prolog code to let Prolog know that it is determinate, and the theory of recursive descent parsing tells us exactly where to put those cuts.) It is worth stressing that there is nothing about grammar rules which forces them to be based on lists. The grammar rule translators in the Prolog systems which have them are typically based on lists, but you can easily write your own translator. It's also worth stressing that although Prolog grammar rules work by adding two extra arguments at the end, there's nothing sacred about that either. Fernando Pereira's "eXtraposition Grammars" (XGs) add four extra parameters. Now let's approach grammar rules from the second angle: building lists. While it would be a bad idea to use lists for everything in Prolog, lists are very important, and you often write code that generates lists. For example, suppose we have a problem where units are made of boards boards contain components some components are resistors and we want a predicate that will collect all the resistor descriptions in the description of a given unit. We might write something like this: unit_resistors(unit(_,_,_,Boards,_,_,_), Resistors) :- boards_resistors(Boards, Resistors, []). boards_resistors([], Rs, Rs). boards_resistors([Board|Boards], Rs0, Rs) :- board_resistors(Board, Rs0, Rs1), boards_resistors(Boards, Rs1, Rs). board_resistors(board(_,_,Components,_,_,_), Rs0, Rs) :- components_resistors(Components, Rs0, Rs). components_resistors([], Rs, Rs). components_resistors([Component|Components], [Component|Rs1], Rs) :- resistor(Component), !, components_resistors(Components, Rs1, Rs). components_resistors([_|Components], Rs0, Rs) :- components_resistors(Components, Rs0, Rs). I don't know about you, but I get awfully tired of writing stuff like that over and over again. Can't something be done? It can. We are describing a sequence. How do you describe sequences? With grammar rules! We get boards_resistors([]) --> []. boards_resistors([Board|Boards]) --> board_resistors(Board), board_resistors(Boards). board_resistors(board(_,_,Components,_,_,_)) --> components_resistors(Components). components_resistors([]) --> []. components_resistors([Component|Components]) --> { resistor(Component) }, !, [ Component ], components_resistors(Components). components_resistors([_|Components]) --> components_resistors(Components). Note that the way the results are stitched together into a list is (a) exactly what we want, (b) boring, (c) invisible. Best of all, since we didn't type it in, we couldn't get it wrong. Once again, it is important to realise that although lists are the most important application of this idea, the principle applies to anything. In fact, we can look at programs like this as transforming a state (the current position in a list) into another state (the next position in a list), and we can do any transformation we like. As an example, suppose that instead of collecting the resistor descriptions, you just want to count them. We would change the top-level call and the second clause of components_resistors//1: unit_resistor_count(unit(_,_,_,Boards,_,_,_), Count) :- boards_resistors(Boards, 0, Resistors). ... components_resistors([Component|Components]) --> { resistor(Component) }, !, add(1), components_resistors(Components). ... where add(Addend, Augend, Result) :- Result is Augend+Addend. Again, there is nothing magic about this, and nothing sacred about having a single state parameter. A good Prolog system will let you define your own translations. So that's why grammar rules are interesting. What, however, do they look like? Which of the things your Prolog system offers are features, and which are bugs? Here is a Prolog program which recognises valid grammar rules. Add a few extra arguments, chant the magic phrase "partial execution of a meta-interpreter", and you'll have a grammar rule translator. (Well, there's error reporting to worry about too. That's probably the hardest part.) I take clause_body/2 as given. The code has been written to provide you with another test at the same time: everything here is valid DEC-10 Prolog syntax and is accepted by Quintus Prolog, and by the public-domain tokeniser and parser. If your Prolog complains, it's broken. grammar_rule(-->(Head,Body)) :- /* Note 1 */ grammar_rule_head(Head), grammar_rule_body(Body, yes). grammar_rule_head(','(NonTerminal,PushBack)) :- !, nonterminal(NonTerminal), proper_list(PushBack). /* Note 2 */ grammar_rule_head(NonTerminal) :- /* Note 3 */ nonterminal(NonTerminal). nonterminal(NonTerminal) :- nonvar(NonTerminal), functor(NonTerminal, Symbol, _), atom(Symbol). proper_list(List) :- nonvar(List), proper_list_1(List). proper_list_1([]). proper_list_1([_|List]) :- proper_list(List). grammar_rule_body(Var, _) :- /* Note 4 */ var(Var), !. grammar_rule_body(!, CutsOk) :- !, /* Note 5 */ CutsOk = yes. /* Note 6 */ grammar_rule_body(','(And,Then), CutsOk) :- !, grammar_rule_body(And, CutsOk), grammar_rule_body(Then, CutsOk). grammar_rule_body(;(IfThen,Else), CutsOk) :- nonvar(IfThen), IfThen = ->(If,Then), !, grammar_rule_body(If, no), grammar_rule_body(Then, CutsOk), grammar_rule_body(Else, CutsOk). grammar_rule_body(;(Or,Else), CutsOk) :- !, grammar_rule_body(Or, CutsOk), grammar_rule_body(Else, CutsOk). grammar_rule_body(->(If,Then), CutsOk) :- !, grammar_rule_body(If, no), grammar_rule_body(Then, CutsOk). grammar_rule_body({ }(Goal), CutsOk) :- !, /* Note 7 */ clause_body(Goal, CutsOk). grammar_rule_body([], _) :- !. /* Note 8 */ grammar_rule_body([_|Tail], _) :- !, /* Note 9 */ proper_list(Tail). grammar_rule_body(NonTerminal, _) :- /* Note 10 */ nonterminal(NonTerminal). Notes. 1. Unlike a clause, a grammar rule *must* have the arrow there. The analogue of p(X, Y) :- true. is p(X, Y) --> []. Don't forget the arrow or the empty sequence of literals. This is a mistake I have to watch out for. 2. A rule can look like head --> body or like head, [t1,...,tn] --> body. The pushback list is meant for handling extraposition in natural language parsers, and is definitely for advanced players (who disdain it). The grammar rule translator in the first two editions of Clocksin & Mellish got pushback wrong, which shows how often the feature is used! 3. A NonTerminal is exactly the kind of thing that you could write as the head of a clause. Two arguments will be added to it. So if you wrote date(D, M, Y) --> ..... you'll get date(D, M, Y, S0, S) :- ... Two arguments are added to every nonterminal in a grammar rule. Here's a convention of mine which you might like to adopt. (Better yet: suggest an improvement!) We identify a predicate with predicate symbol P and arity N by writing the term P/N. Now a non-terminal with symbol S and arity M corresponds to a predicate with predicate symbol S and arity N+2, but you have to think twice before you realise that the nonterminal month(X) you see in a program corresponds to the predicate month/3. So I've taken to writing S//M, with the interpretation that this means the same as S/(M+2), and the convention that it is only used when S/(M+2) is in fact defined by grammar rules. So I'd refer to boards_resistors//1 in the example above. If you don't find this helpful, ignore it. 4. There are two "correct" things a grammar rule translator can see if it sees a variable when it is expecting a grammar rule body. Just as a variable in a clause body should be treated as something which will call the GOAL the variable is bound to, so a variable in a grammar rule body should be treated as something which will pass the right arguments to the NONTERMINAL the variable is bound to. For example, you might want to write :- op(100, xf, *). NT* --> []. NT* --> NT, NT*. You would be rightly upset if the translator quietly did the wrong thing with this. So the two correct things are - to translate a variable to something like phrase(TheVariable, S1, S2) - to report a translation fault when the translation is done, and plant code to report the error at run-time, e.g. format(user_error, '~N! NT variable executed~n'), fail The first translation is preferred. 5. Yes, cuts are allowed in grammar rule bodies. So are and-then, if-then-else, or-else, and if-then. How can you predict which things are allowed? All the basic control structures are allowed. What about negation? Well, no. In this context, negation can't possibly be sound. The control structures, and only the control structures, are transparent. What should happen if your Prolog system has other control structures, such as once/1, forall/2, or Arity's NIH "snips"? (If you are converting from Arity Prolog to another dialect, [! SnippedStuff !] ==> ( SnippedStuff -> true) should do it.) The honest answer is that nobody has thought about it much. But a good answer would be that if nonterminals embedded in the control structure could extend the list, the control structure should be transparent. Otherwise it should probably be reported by the translator. For example, things like forall/2, findall/3, and so on, where all the solutions found by the embedded goals are failed over, would not be candidates for transparency, but "soft cuts" and "cases" would be. 6. The "CutsOk" business is simply pointing out that you shouldn't have cuts inside the If part of an If->Then;Else or If->Then. 7. To include a test in a grammar rule, a test that doesn't match any of the input sequence, you write it inside curly braces. Any clause body can appear there: {a ; b} is legal and means the same as {(a ; b)}. 8. The empty list matches the empty sequence. It is the grammar rule body analogue of 'true'. 9. A list of n terms matches n terms in the sequence if the unifications go through. Whether the sequence being matched has to be a list or not depends on how the 'connects' operation is defined. If you don't use this construct, and don't use pushback, your code should not depend on sequences being represented by lists. 10. Anything other than a control structure, a list, or a clause body inside curly braces, is taken to be a nonterminal. Note that there is supposed to be no connexion whatsoever between the built-in predicate integer/1 and the non-terminal integer//1 (the predicate integer/3). If I want to define integer(X) --> ..... there is no reason for the system to stop me. On the other hand, it *would* be rather odd. So a grammar rule translator would do well to report non-terminals which look like built-in predicates (though it needn't), but on no account should it fail to treat even such an odd goal as a non-terminal. There is a Prolog system around which will take a rule like constant(X) --> integer(X). and treat it as the equivalent of constant(X) --> {integer(X)}. ***without warning you***. Now if you had *meant* that, you could have written the curly braces, and the system in question is perfectly happy to let you *define* integer//1, it just quietly stops you using it. This is a bug. Printing a warning message about the oddity and producing the correct translation would be a definite feature. I said that omitting the "--> []" in the base case of a non-terminal is a mistake I have to watch out for. This is probably the first thing to look for if you have a grammar which is mysteriously failing. If you have a cross-referencer, such as the one Quintus provide, you should watch out for predicates which are defined but not called. The book by Fernando Pereira and Stuart Shieber may be of interest.