Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!usc!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!ira.uka.de!rusmv1!rusux1!mark
From: mark@adler.philosophie.uni-stuttgart.de (Mark Johnson)
Newsgroups: comp.lang.prolog
Subject: Re: Efficiency of DCGs and chart parsers
Message-ID: <MARK.91Jan26184457@adler.philosophie.uni-stuttgart.de>
Date: 26 Jan 91 17:44:57 GMT
References: <8014@castle.ed.ac.uk>
	<MARK.91Jan24122000@adler.philosophie.uni-stuttgart.de>
	<1549@gufalet.let.rug.nl>
Sender: zrf80385@rusux1.rus.uni-stuttgart.de
Organization: IMS, University of Stuttgart
Lines: 49
In-reply-to: gosse@gufal02.let.rug.nl's message of 25 Jan 91 16:09:02 GMT

Gosse Bouma askes:

> How do you pass the chart around? as a list of chart items? I can hardly
> believe that going through a list with, say, a few hundred elements is
> more efficient in Prolog than just exploiting the data-base.

No, it shouldn't just be an unstructured list, of course.  
In general you want to index inactive edges on their
left-most string-positions (vertex) and active edges on their
right-most string-positions.  If you want to get n^3
behaviour (with CFGs) then you must index on both left
and right string-positions (if I remember Earley's algorithm
correctly).  Prolog textbooks discuss a variety of data structures 
for implementing such indexing.

(Macho types not fazed by circular structures caused by
e.g. ``empty'' edges dominating the null string can
avoid some of this indexing by representing a vertex 
as a set of edges, perhaps indexed by type and category, 
active edges as pairs consisting of a dotted rule and the
vertex at the left string-position, and inactive edges consisting of 
pairs of a category and the vertex at the right string-position.  
In both cases, an edge's "other" string-position is that of
the vertex of which it is member.
But be warned - I've had one Prolog which supposedly
could handle circular structures go into an infinite loop
when checking if vertices V1 and V2 satisfy V1==V2.  As
I've said elsewhere, Prolog needs either efficient arrays
or circular structures and preferably both, but that's another story!)

> Also, the copying, which is avoided if you assert items, seems rather
> expensive.

Of course, copying (renaming of variables) is one of the
things that happens during an assertion and a following
call.  I believe it is usually faster to copy a term
than it is to assert it into the database, especially if
the language has a copy_term/2 primitive (as does SICStus).
 
> Also, given the fact that in unification grammars, the size of the chart items
> tends to be rather large, using a list of items as additional argument
> seems practically impossible to me.

The chart itself would not be stored as part of a category labelling
an edge, so I'm not sure exactly what you have in mind here.

> Gosse Bouma, Alfa-informatica, RUG, Groningen                  gosse@let.rug.nl

Mark Johnson