Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!usc!ucsd!ames!uhccux!virtue!comp.vuw.ac.nz!munnari.oz.au!goanna!ok
From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: Novice RETRACT question (now SETOF/BAGOF question)
Message-ID: <3505@goanna.cs.rmit.oz.au>
Date: 3 Aug 90 02:16:55 GMT
References: <1947@engage.enet.dec.com>
Organization: Comp Sci, RMIT, Melbourne, Australia
Lines: 285

> I'm still trying to understand SETOF/BAGOF.

It will help youu a lot if you stop writing the names as if they
were variable names.  The operations are setof/3 and bagof/3.

> Perhaps I've had too many years of structured
> (3rd generation) programming,

What a strange use of the word "structured".  setof/3 and bagof/3 are
just loops that collect a result.  Having a loop that accumulates some
kind of result is just about the commonest thing there is in old-style
languages (apart from assignment statements and mistakes).

Start with findall/3:

	findall(Template, Generator, ListOfInstances) :-
		for each *proof* of Generator,
		    save away the corresponding instance of Template
		when no more proofs of Generator can be found,
		    pick up all the saved instances of Template
		    as a list, the elements occurring in the order
		    in which they were found,
		    and unify ListOfInstances with the result.

That's the operational end.
Or start with setof/3:

	setof(Template, Generator, Set) :-
		Set = {T | T is an instance of Template
			   which corresponds to a solution of Generator }
		and Set is not {}

That's the (nearly) logical end.
bagof/3 is a rather peculiar hybrid of findall/3 and setof/3 which should
only be used when you know that each proof of Generator is going to yield
a DIFFERENT instance of Template.

> One proffered example was from Bart Demoen:
> 	reduce_objects :-
>		bagof(object(I,X,Y),
>			J^( object(I,X,Y), object(J,X,Y), I < J ),
>			ListOfObjectInstances),
>		delete_objects_in_list(ListOfObjectInstances).
>
> 	delete_objects_in_list([]).
> 	delete_objects_in_list([Obj|T]) :-
>		retract(Obj),
>		delete_objects_in_list(T).

Given that you are using C Prolog, you should *NOT* use
delete_objects_in_list/1.  I'm sure it works *perfectly* in BIM Prolog,
but I can tell you that it will be very inefficient in C Prolog.

(1) When you call retract(object(137,12,10)), for example, C Prolog does
    no indexing at all, so C Prolog will do a LINEAR search through the
    whole table of object/3 facts looking for this one.

(2) What's more, because it doesn't do any indexing, C Prolog doesn't
    notice that there is only one fact that looks like that.  (Indeed,
    you have given _us_ no reason to believe that there is only one
    such fact.  You may possibly _intend_ that there should be only
    one such fact, but without seeing your program we can't be sure
    that it doesn't mistakenly create duplicate facts.)  So C Prolog
    leaves a choice point behind.  For EVERY fact that you delete this
    way, C Prolog is going to leave a choice point behind, ready to
    resume searching for other similar facts to delete.  You can fix
    that by doing

	retract_facts([]).
	retract_facts([Fact|Facts]) :-
		retract_first(Fact),
		retract_facts(Facts).

	retract_first(Fact) :-
		retract(Fact),		% *MUST* set up a choice point
		!.			% which *we* know isn't needed.

(3) Even if C Prolog _did_ do perfect indexing, and _did_ notice that
    there was only one matching fact, it would STILL have to leave a
    choice point behind, ready to resume its search and destroy task.
    Why?  Because of the semantics of assert/1 and retract/1 in DEC-10
    Prolog and C Prolog.  In those dialects, it is DEFINED that
	if you have a call to a dynamic predicate
	and that predicate is changed by assert/1 or retract/1
	then the predicate will notice the change AT ONCE
    So, if I do

	q(a).

	p :-
		retract(q(a)),
		write(retracted), nl,
		assert(q(a)),
		fail.

    and call

	?- p.

    DEC-10 Prolog and C Prolog are REQUIRED write "retracted" TWICE.
    The call to retract/1 is going to sit there and wait IN CASE another
    matching clause is SUBSEQUENTLY added.  So in C Prolog, *every*
    successful call to retract/1 *must* leave a choice point behind,
    however many potential matches for its argument happen to remain
    right at the moment.

(4) There's worse.  I posted in this newsgroup a solution similar to
    Bart Demoen's, except that I used erase/1.  There is an extremely
    good reason for that.

    DEC-10 Prolog and C Prolog use an implementation method called
    "structure sharing".  That means that when you pick up an instance
    of a clause (by calling the predicate, or by calling clause/[2,3]
    on it, OR BY CALLING retract/1 ON IT) the data structure which
    represents this instance is a mixture of information held on the
    stacks AND information in the clause itself (a so-called skeleton).
    Because part of the representation of this instance comes from the
    clause itself,

	in a structure-sharing implementation of Prolog,
	the space for a clause cannot be reclaimed as long as
	there is a term in the stacks which refers to it

    and

	retract/1 always creates a reference to the clause it is
	marking as unwanted

    so

	in DEC-10 Prolog and C Prolog, the success of

		retract(Fact)

	means that the clause which unified with Fact is *NAILED DOWN*
	and the space CANNOT be reclaimed until this call is failed back
	over.  This is because finding the clause to be marked involves
	UNIFYING Fact with the clause, and such a unification creates a
	reference to the clause.

    This means that if you use delete_objects_in_list/1 in C Prolog
							===========
    you are GUARANTEED that the space for these clauses CAN'T be
    reclaimed until you fail back over the call to delete_objects_in_list/1.

    So here is how to code that part of it so that it doesn't leave behind
    any useless choicepoints AND doesn't leave the clauses nailed down so
    that they can't be reclaimed:

	retract_facts([]).
	retract_facts([Fact|_]) :-		% the "fail" here means no
		retract(Fact),			% choice point is left AND
		fail.				% the clause be be reclaimed
	retract_facts([_|Facts]) :-
		retract_facts(Facts).

We're left with the top level call:

> 	reduce_objects :-
>		bagof(object(I,X,Y),
>			J^( object(I,X,Y), object(J,X,Y), I < J ),
>			ListOfObjectInstances),
>		retract_facts(ListOfObjectInstances).

Here we observe an interesting point.  Suppose the object/3 table
contains
	object(1, 0, 0).
	object(2, 0, 0).
	object(3, 0, 0).

Let's find ALL the solutions to the query

	object(I, X, Y), object(J, X, Y), I < J

a.	I = 1, X = 0, Y = 0, J = 2
b.	I = 1, X = 0, Y = 0, J = 3
c.	I = 2, X = 0, Y = 0, J = 3

Basically, when you put J^( ... ) around that goal in the call to bagof/3,
you are telling bagof/3 to ignore bindings for that variable.  So the
bindings bagof/3 is going to pay heed to are those for I, X, and Y.  We get

	bindings		ignored		instance
a.	I = 1, X = 0, Y = 0	J = 2	     {	object(1,0,0)
b.	I = 1, X = 0, Y = 0	J = 3	     {	object(1,0,0)
c.	I = 2, X = 0, Y = 0	J = 3	     {	object(2,0,0)
                                             /
bagof/3 collects these answers--------------- and puts them in a list,
so the final answer we get from the call to bagof/3 is that
	I, J, X, Y are left unbound
	ListOfObjectInstances = [object(1,0,0),object(1,0,0),object(2,0,0)]

Notice something about that?  object(1,0,0) occurs twice in it.  (A copy of)
each object is going to occur as many times in the list as there are
more recent objects at the same place; if we have
	object(1, x, y).  object(2, x, y). ... object(N, x, y).
we are going to get (N-1) + (N-2) + ... + 0 = (1/2).N.(N-1) copies of
object(?, x, y) terms in the list.

Now, suppose we call
	?- delete_objects_in_list(object(1,0,0),object(1,0,0),object(2,0,0)]
That's going to do
	retract(object(1,0,0))	-- succeeds and removes fact
	retract(object(1,0,0))	-- FAILS as the fact has already gone!
	retract(object(2,0,0))	-- never reached
So if the data base can look like
	object(1, 0, 0).
	object(2, 0, 0).
	object(3, 0, 0).
(and we have been given no reason to suppose that it *can't*) then the
code using bagof/3 WILL NOT WORK because bagof/3 allows duplicate terms
in the list of answers.

The right thing to do is to use setof/3:

	reduce_objects :-
		setof(object(I,X,Y),
			J^( object(I,X,Y), object(J,X,Y), I < J ),
			SetOfObjectInstances),
		retract_facts(SetOfObjectInstances).

This will find the same solutions as bagof/3, but will then sort the
list to remove duplicates (sorting is the means, removing duplicates
is the purpose), so retract_facts/1 is not going to be given duplicate
facts to remove.  (As it happens, retract_facts/1 doesn't mind duplicate
facts, but that was good luck.)

This illustrates one of the themes of using setof/3 and bagof/3:

	IT IS VERY SELDOM A GOOD IDEA TO USE bagof/3.

Something which still makes me unhappy about the whole thing is the
high cost of finding the object/3 facts to be deleted in the first place.
If there are N object/3 facts, the search as written is like
	for i := 1 to N do
	    unify object(I,X,Y) with <object.3.table>[i]
	    for j := 1 to N do
		unify object(J,X,Y) with <object.3.table>[j]
		if unification succeeded then
		    if I < J then
			save a copy of object(I,X,Y)
which is clearly O(N**2) cost.

Let's try something else.  For each X,Y, let's find the set
	XYIs = {I | object(I,X,Y) is true}
Then every element of that set is to be deleted, except the maximum.

	reduce_objects :-
	    forall(
		setof(J, object(J,X,Y), [I|Is])
	    ,	delete_objects(Is, I, X, Y)
	    ).

	%   Use the next three clauses in a structure-sharing system
	%   or if an object/3 fact may be duplicated

	delete_objects([], _, _, _).	% maximum; leave it
	delete_objects([_|_], I, X, Y) :-
		retract(object(I,X,Y)),
		fail.
	delete_objects([I|Is], _, X, Y) :-
		delete_objects(Is, I, X, Y).

	%   Use the next two clauses in a structure-copying system
	%   when object/3 facts cannot be duplicated.

	delete_objects([], _, _, _).	% maximum; leave it
	delete_objects([I|Is], J, X, Y) :-
		retract(object(J,X,Y)),	% some Prologs index enough,
		!,			% but some don't,
		delete_objects(Is, I, X, Y).

How much does this cost?  Well, it picks up every object/3 clause once,
at a cost of O(N).  What it actually picks up is pairs (X,Y)-I.  It
then sorts these pairs using keysort/2, so that it can group them into
blocks with the same values of X and Y.  The sort costs O(N.lg N) time.
Grouping the sorted solutions into blocks X, Y, Is costs O(N) time, and
then deleting the unwanted clauses costs O(N) time, so the grand total
is O(N.lg N).  Yes, the sort dominates the cost here, but N.lg N is
quite a lot better than N**2.

-- 
Distinguishing between a work written in Hebrew and one written in Aramaic
when we have only a Latin version made from a Greek translation is not easy.
(D.J.Harrington, discussing pseudo-Philo)