Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ames!mailrus!cornell!uw-beaver!teknowledge-vaxc!sri-unix!quintus!ok
From: ok@quintus.uucp (Richard A. O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: Some comments on term comparison
Message-ID: <377@quintus.UUCP>
Date: 9 Sep 88 23:56:39 GMT
References: <25985@ucbvax.BERKELEY.EDU>
Sender: news@quintus.UUCP
Reply-To: ok@quintus.UUCP (Richard A. O'Keefe)
Organization: Quintus Computer Systems, Inc.
Lines: 112

In article <25985@ucbvax.BERKELEY.EDU> vanroy@ji.Berkeley.EDU (Peter Van Roy) writes:
>Consider the following predicate:
>	a(X,X).
>	a(Y,Z).
>	a(W,T).
>	a(a,b).
>The query
>	| ?- setof(a(X,Y), a(X,Y), S).
>gives the result (slightly edited for readability):
>	S = [a(A,B),a(C,D),a(E,E),a(a,b)]
>
>It would be better if one of a(A,B) and a(C,D) were deleted from this result.
>In order to do this, two terms such as a(X,X) and a(Y,Y) which have the
>same pattern of bound variables should be considered equal.
>This requires a new meaning for the equality of two terms.

[Shouldn't that be "same pattern of UNbound variables"?]

The basic problem here is not really a problem with term comparison.

There are two ways of looking at these variables (A,B,C,D,E).

(1) Since the variables are really "copies" of the original X,Y,
    we should have delayed until X and Y were sufficiently instantiated
    to ensure that duplicate elimination never had to compare a variable
    with a variable.  In this view, the mistake is returning any answer
    at all:  if we were to say
	| ?- all_solutions(a(X,Y), a(X,Y), S), X=foo, Y=baz.
    the answer should be S=[a(foo,baz)].

(2) The other view is that the variables which came back are METAVARIABLES,
    and the list S is really at the meta-level:
	is_a_solution(X) <-> (exists E) member(E, S) & instance_of(X, E)
    In this view, the set S *really* means
	[a($1,$1),a($1,$2),a(a,b)]
    But note that in order to reduce redundancy, we should not just drop
    an element E when it is *equivalent to* another element of S, but we
    should also drop E if it is *subsumed by* another element of S.
    That is to say, a($1,$1) should ALSO be dropped, and a non-redundant
    result is [a($1,$2),a(a,b)].

Note that in neither case do we have any reason to retain a(E,E).
Since a(p,p) is an instance of both a($1,$1) and a($1,$2), it is
implicitly present *twice* in [a(A,B),a(E,E),a(a,b)].  If we are
willing to tolerate that, we have lost our excuse for demanding
that a(C,D) be deleted.

Van Roy's proposal (which I heard a couple of days before from Herve'
Touati) is simply to compare terms by comparing "standardised"
alphabetic variants of them.

>Structural comparison is easy to implement.  One possibility is to instantiate
>all variables in each term temporarily to unique atoms in a sequence, do a
>term comparison between the resulting ground terms, and then uninstantiate the
>variables.

Note that you have to standardise the terms *separately*, because they
might share variables.  {Instead of van Roy's "inst_vars/2", you would
do well to use number_vars/3, which binds variables to '$VAR'(Int) terms.}

Note that the desired effect can be obtained for the special case of
setof/3 by doing

	touati_setof(Template, Generator, Set) :-
		quantify_and_number(Generator, Template, Goal),
		bagof(Template, Goal, Bag),
		sort(Bag, Temp),
		varnumbers_list(Temp, Set).

	quantify_and_number(Q^Generator, Template, Q^Goal) :- !,
		quantify_and_number(Generator, Template, Goal).
	quantify_and_number(Goal, Template, N^(Goal,numbervars(Template,0,N))).

	varnumbers_list([], []).
	varnumbers_list([T|Ts], [S|Ss]) :-
		varnumbers(T, S),
		varnumbers_list(Ts, Ss).

where varnumbers/2 is an inverse of numbervars(_, 0, _).  The query
	| ?- touati_setof(a(X,Y), a(X,Y), S).
gave	S = [a(a,b),a(A,A),a(B,C)]
as the answer, with X and Y still unbound.

>Some comments about this method:
>	2. Two terms that are equal will still be equal if they are unified
>	   together.
>	3. Two terms that are equal can become unequal if one of them becomes
>	   more instantiated.  This is never true for ==.

Note that unifying two terms which are alphabetic variants of each other
can change the relative order of two OTHER terms.  
Part of sort/2 looks like this:

sort(2, [X,Y|Rest], Answer, Rest) :- !,
	compare(R, X, Y),
	sort2(R, X, Y, Answer).

sort2(<, X, Y, [X,Y]).
sort2(>, X, Y, [Y,X]).
sort2(=, X, X, [X]).		<-----

The last clause may indifferently be written sort2(=, X, X, [X]),
sort(=, X, _, [X]), or sort(=, _, X, [X]), because with the standard
ordering, unifying two identical terms has no effect.  With the
"alphabetic variant" ordering, unifying term equivalent terms may
change the relative order of OTHER terms, so it MUST NOT be done by
sort/2.  So one of the other alternative versions of the last clause
must be used.  But there are two choices, and they yield distinguishable
results.

Note also that maintaining two binding environments during term
comparison (should there be any variables which have to be compared)
is intrinsically more costly than maintaining none.