Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!amdcad!ames!elroy!mahendo!jplgodo!wlbr!scgvaxd!ashtate!dwiggins From: dwiggins@ashtate (Don Dwiggins) Newsgroups: comp.lang.prolog Subject: Re: Prolog Grammar Rules Message-ID: <489@ashton.UUCP> Date: 18 Feb 88 01:14:08 GMT References: <646@cresswell.quintus.UUCP> Reply-To: dwiggins@ashton.UUCP (Don Dwiggins) Organization: Ashton-Tate, Torrance, CA Lines: 95 Keywords: DCGs tutorial Summary: More thoughts on DCGs This posting is prompted by Richard O'Keefe's typically excellent exposition on DCGs in his recent message. I'd like to share my own enthusiasm for DCGs and add another perspective. A DCG whose top-level predicate looks like pooh(T) --> ... can be viewed as a complex relation between a sequence of terms and a term T. As with Prolog in general, this relation can be designed to create the term from the list or vice versa (or even just to compare the two); a relatively simple grammar, carefully constructed, can even work both as a parser and generator. Given a program represented as a tree structure, one could generate code (i.e. a sequence of terms representing instructions for some machine) using a DCG; the article by Stan Szpakowicz in the August 1987 Byte magazine gives an example of this. An example of more general uses of DCGs: some time ago, I used a DCG to analyze texts that contained embedded tables, that is chunks like this where the text was lined up in columns. I represented the text as a list of characters, encoding each character as a term of the form "char(ASCII,Line,Column)" and did some pattern recognition (mostly in Prolog predicates called from the DCG) to detect the lines comprising tables. In the translation from DCG rules to clauses, O'Keefe has terminals being translated to goals of the form "connects(Terminal, Pos1, Pos2)", and gives the definition connects([X|L],X,L). to be used when the positions are represented as lists (by the way, this is a somewhat disguised use of difference lists). Actually, this predicate can be "folded into" the translator, so that no calls to connects/3 remain in the translated clauses (this is the translation presented by Clocksin & Mellish). As an exercise, you might like to "partially execute" connects/3 in the clause given by O'Keefe for date/2 (or date//0), giving a clause with the dashes embedded in the position arguments. While DCGs are often presented solely in terms of lists, the representation used (and the definition of connects/3) can be more general. For instance, consider a tabular version of connects such as: connects(1,time+noun,2). connects(1,time+verb,2). connects(2,flies+noun,3). connects(2,flies+verb,3). connects(3,like+prep,4). connects(4,an+det,5). connects(5,arrow+noun,6). Here the positions are simply integers. With a simple English grammar in DCG form, we can get the various parsings of the sentence "time flies like an arrow" with the call ?- sentence(Tree,1,6). (Another exercise: given the call ?- noun_phrase(Tree,1,S). what would Tree and S be?) Note that the table of "connects" facts above represents, not a string, but an acyclic graph. Considering this, we can change the earlier characterization of DCGs to "...complex relation between a GRAPH of terms and a term T". "Parsing" such a graph is finding one or more paths through it that satisfy a recursively specified set of conditions on the nodes. If you have a graph searching or pattern-matching problem for which depth-first search is reasonable (or whatever kind of search is supported by your logic programming system), consider using a DCG for it. Taking it a bit further, you could give an arbitrary definition of connects/3; the graph would not have to be explicitly stored, but could be computed on demand. One use for this would be to parse longer texts than could conveniently be stored in working memory; the characters could be read from a file as needed, keeping an encoded file position pointer in the position arguments. There's only one small problem with this more general use of DCGs; in some Prolog systems, connects is a built-in predicate, defined in terms of list arguments (in Quintus Prolog, it's called "'C'"); this keeps you from redefining it, or even using other forms of position arguments. What's doubly frustrating about this is that, when using lists, the connects predicate isn't actually needed at all! Other systems' built-in translators take this approach, creating smaller and faster clauses, but eliminating the possibility of using DCGs on graphs, etc. One other point: when generating terms from nonterminals, where in the sequence of arguments should the position arguments be placed? O'Keefe puts them last, as do most of the systems I've seen. When parsing, however, things would go faster in general if they were first (at least the initial, "input" position argument). For generating, of course, one would want them elsewhere. My personal DCG translator references two user-supplied predicates that indicate how terminals and nonterminals are to be translated, given the term to be translated and the positions as arguments; this allows customizing the placement of the position arguments and the form of the goals for terminals to the task at hand. (I should admit that it also gets a couple of things wrong that O'Keefe's "recogniser" handles; thanks, I'll fix it!). -- Don Dwiggins {scgvaxd,crash}!ashtate!dwiggins