Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!munnari.oz.au!goanna!ok
From: ok@goanna.oz.au (Richard O'keefe)
Newsgroups: comp.lang.prolog
Subject: Re: the great split up
Message-ID: <2858@goanna.oz.au>
Date: 12 Feb 90 10:10:01 GMT
References: <EIVERSON.90Feb11221732@hades.nmsu.edu>
Distribution: comp
Organization: Comp Sci, RMIT, Melbourne, Australia
Lines: 129

In article <EIVERSON.90Feb11221732@hades.nmsu.edu>, eiverson@nmsu.edu (Eric Iverson) writes:
(Here is a "split" routine, how can it be done better in Quintus Prolog?) 

Unfortunately, calling it a "split" routine says very little.  It is not
at _all_ clear to me what this thing is supposed to do.  Before we start,
a coding tip:  *NEVER* put a semicolon at the end of a line, it looks so
much like a comma that it is confusing.  Treat ";" like "else" in Ada.

As far as I can see, split(Cat, Str, A, B) buzzes around copying elements
from Str to A until it finds an element which is identical (==) to the 1st
element of Cat.  It then hands over to split2, which demands that the rest
of Cat should follow in Str.  split2 is determinate (in fact it has rather
more cuts than it needs, but that is another story).  split/4 was not
determinate, however, so if the first element of Cat is not followed by
the rest of Cat in Str, split/4 will try the second disjunct, thus
looking for another occurrence of the first element of Cat in Str.

It follows that
	if split(Cat, Str, A, B) succeeds
	then there are lists Alpha, Omega such that
	Str = Alpha @ Cat @ Omega
	and A = Alpha @ <something>

How this constitutes "binding Cat onto String" is beyond me; note that
the use of the == test means that Cat and Str should both be bound before
split/4 is called.

After much puzzling, I finally decided that the intent is that
	split(Cat, Str, A, B)
	is to be true when Cat, Str, A, and B are all lists, and
	there exist lists Alpha, Omega such that
	(1) Str = Alpha @ Cat @ Omega
	(2) A = Alpha @ take(floor(N/2), Cat)
	(3) B = drop(floor(N/2), Cat) @ Omega
where @ is concatenation and take, drop are as in APL.

The first observation is that this is a rather strange thing to do.
What is it to be used for?  If we knew that, perhaps some other data
structure entirely might be a better choice.  We might also know then
whether all of the logical results were needed or whether the first
would suffice (for example, it might be known that Cat occurs precisely
once in Str).

The second observation is that the search is intrinsically an
O(|Cat|.|Str|) operation if we use the naive string search algorithm;
since we're using lists it would be difficult to use the Boyer-Moore
algorithm, but it might be possible to adapt the Knuth-Morris-Pratt
one.  Does anyone know of any published results on the complexity of
string search in linked lists?  This may or may not matter, depending
on how big Cat is and how big the alphabet size is.   (Can anyone tell
me where to find Galil's algorithm?  I had a copy but lost it in a move.)

The third observation is that we can precompute the back of A and the
front of B because we _know_ what they must be like: they must be like
Cat.  The trick I use is to work with a second copy of the list and
walk down it two steps at a time for every one step we take down the
original; when we've reached the end of the second copy we're half way
down the first.

%   halve(Counter, List, Front, Back) is true when
%   append(Front, Back, List) and
%   length(Front) = floor(length(Counter)/2)
%   Counter must be a proper list but the others may be uninstantiated.
%   (this can be speeded up by unrolling; see library(basics) for some
%   examples of that technique).

halve([_,_|N], [X|List], [X|Front], Back) :- !,
	halve(N, List, Front, Back).
halve([_], List, [], List).
halve([],  List, [], List).

Then we can encode split/4 directly:

split(Cat, Str, A, B) :-
    /*  precompute the last part of A (determinate)  */
	halve(Cat, Cat, FrontOfCat, BackOfCat),
    /*  precompute the front part of B (determinate)  */
	append(BackOfCat, Omega, B),
    /*  enumerate candidates for Alpha (non-determinate)  */
	append(Alpha, Cat_Omega, Str, FrontOfCat, A),
    /*  check that Cat follows Alpha in Str (determinate)  */
	append(Cat, Omega, Cat_Omega).

where append/5 is, as in library(lists):

append([], R, R, S, S).
append([H|T], L, [H|R], M, [H|S]) :-
	append(T, L, R, M, S).


I have _not_ timed this, because I have no idea what realistic data
would look like, and for all I know it might well be slower.  But I
think it's a wee bit clearer (append/5 has been in the library for
years and halve/4 should have been, it's a known technique).
It's interesting that this version obviously reduces to
	append(A, B, Str)
when Cat is empty, which seems appropriate.

We can optimise this split/4 by making the test happen a little sooner.
(The original posting had done this, at the price of forcing Cat to be
non-empty always.)
Here's how:

We start from the specification
	foo(Target, Source, Alpha, Omega) :-
		append(Alpha, Target_Omega, Source),
		append(Target, Omega, Target_Omega).
special casing several lengths of Target and unfolding gives us

foo([], S, A, C) :- !,
	append(A, C, S).
foo([X], S, A, C) :- !,
	append(A, [X|C], S).
foo([X,Y], S, A, C) :- !,
	append(A, [X,Y|C], S).
foo([X,Y,Z], S, A, C) :- !,
	append(A, [X,Y,Z|C], S).
foo([X,Y,Z|L], S, A, C) :-
	append(A, [X,Y,Z|LC], S),
	append(L, C, LC).

I leave it as an exercise for the reader to apply this optimisation to
split/4.  (For the known length cases we can optimise the first two goals
of split/4 away completely.  How much this helps depends on how common
Cats of length < 4 are.)  

Again, it all depends on what the actual data are like, and it seems more
than likely to me that the best way to do this operation is to use some
other data structure so th at it doesn't _need_ to be done.