Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!quintus!ok From: ok@quintus.UUCP (Richard A. O'Keefe) Newsgroups: comp.lang.prolog Subject: Re: Prolog Grammar Rules Message-ID: <667@cresswell.quintus.UUCP> Date: 19 Feb 88 10:47:13 GMT References: <646@cresswell.quintus.UUCP> <489@ashton.UUCP> Organization: Quintus Computer Systems, Mountain View, CA Lines: 112 Keywords: DCGs tutorial Summary: unfolding connects/3 In article <489@ashton.UUCP>, dwiggins@ashtate (Don Dwiggins) writes: > In the translation from DCG rules to clauses, O'Keefe has terminals being > translated to goals of the form "connects(Terminal, Pos1, Pos2)", ... > ... Actually, this predicate can be > "folded into" the translator, so that no calls to connects/3 remain in the > translated clauses (this is the translation presented by Clocksin & Mellish). This is true, but you have to be *extremely* careful, and I'm afraid the version in Clocksin & Mellish isn't. The trouble is cuts. Consider p(1) --> !, [a]. p(_) --> []. What happens if you call | ?- p(1, [], []). The answer, in DEC-10 Prolog, C Prolog, Quintus Prolog, and some others, is that, as you would expect by analogy with ordinary Prolog rules, it fails. If you use the Clocksin & Mellish translator, it succeeds. To use terminology I introduced a while back: the Clocksin & Mellish translator will turn a steadfast predicate into one which is not steadfast. This is the basic problem with macro-expansion in Prolog: you must take care not to push bindings back over side-effective operations. Don Dwiggins also suggests that it would sometimes be useful to put the list/state arguments first so that they can be indexed on. Recall that many Prolog implementations (DEC-10 Prolog included) only index on principal functors, and that in any case indexing doesn't buy you much if you have clauses with variables there as well. Let's look at an example from Michael McCord's parser. topic(Type, Topsubj, hold(Top), _, _, Qaux, [Top|Mods], Mods) --> ( nounphr(Top) ; there(Top) ), topic1(Type, Topsubj, Top, Qaux). topic(Type, Topsubj, nil, C, X, Qaux, [Top|Mods], Mods) --> adverbial(C, X, Top), topic1(Type, Topsubj, Top, Qaux). topic(q, f, nil, _, _, pre(V), [syn([yesno],applyto(Y),yesno(Y),[])|Mods], Mods) --> [ V ], { finiteaux(V) }. The list/state arguments for these rules would be S0, S S0, S [V|S1], S respectively. Indexing really isn't going to pay off much here. This seems to be typical of natural language parsers: here's a nonterminal from a grammar for (a fragment of) Maaori: after_interjection(-, -) --> !. after_interjection(koia, Mod) --> [ Mod ], { positional_particle(Mod) }, !. after_interjection(_, anoo) --> [ anoo ]. Here the list/state arguments would be S0, S { note that S0=S must be postponed to AFTER the cut! } [Mod|S1], S [anoo|S1], S It happens that 'anoo' is not a positional particle, but the indexer doesn't know that... Since early arguments often carry agreement information, there is some hope of indexing off them. The conclusion is that wherever we put the list/state arguments, we can't expect much help from indexing in parsers for natural langauges, so we might as well keep those arguments out of harm's way. Now programming languages are a different story: they tend to be rather over-endowed with keywords. Suppose you want to recognise statements in C, and would like to exploit indexing. You might start with this (where I've suppressed irrelevant detail and a lot of rules): statement --> ['{'], rest_of_block. statement --> [if], rest_of_if. statement --> [return], rest_of_return. statement --> [goto], rest_of_goto. statement --> [id(X),:], statement. statement --> expression, [';']. statement --> [';']. This isn't going to be indexed in most Prolog implementations. Note that the indexer doesn't know what an expression can start with. The list is rather long. A program which computed FIRST sets of definite clause grammars would be handy here...) What can we do? We can read ahead one token. statement --> expression, [;], !. statement --> [First], statement(First). statement('{') --> rest_of_block. statement(if) --> rest_of_if. statement(return) --> rest_of_return. statement(goto) --> rest_of_goto. statement(id(X)) --> [:], statement. statement(';') --> []. Note that if you are parsing expressions, the cleanest way of handling operator precedence involves consulting a table of operators, so once again there really isn't anything to index on. Given that there is a general convention in Prolog that "extra" arguments are added on the right (for example, maplist(p(1), [a,b], [X,Y]) will call p(1,a,X) and then p(1,b,Y)) there doesn't seem to be any compelling reason for grammar rules to be different.