Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!munnari.oz.au!goanna!ok From: ok@goanna.oz.au (Richard O'keefe) Newsgroups: comp.lang.prolog Subject: Re: the great split up Message-ID: <2858@goanna.oz.au> Date: 12 Feb 90 10:10:01 GMT References: Distribution: comp Organization: Comp Sci, RMIT, Melbourne, Australia Lines: 129 In article , eiverson@nmsu.edu (Eric Iverson) writes: (Here is a "split" routine, how can it be done better in Quintus Prolog?) Unfortunately, calling it a "split" routine says very little. It is not at _all_ clear to me what this thing is supposed to do. Before we start, a coding tip: *NEVER* put a semicolon at the end of a line, it looks so much like a comma that it is confusing. Treat ";" like "else" in Ada. As far as I can see, split(Cat, Str, A, B) buzzes around copying elements from Str to A until it finds an element which is identical (==) to the 1st element of Cat. It then hands over to split2, which demands that the rest of Cat should follow in Str. split2 is determinate (in fact it has rather more cuts than it needs, but that is another story). split/4 was not determinate, however, so if the first element of Cat is not followed by the rest of Cat in Str, split/4 will try the second disjunct, thus looking for another occurrence of the first element of Cat in Str. It follows that if split(Cat, Str, A, B) succeeds then there are lists Alpha, Omega such that Str = Alpha @ Cat @ Omega and A = Alpha @ How this constitutes "binding Cat onto String" is beyond me; note that the use of the == test means that Cat and Str should both be bound before split/4 is called. After much puzzling, I finally decided that the intent is that split(Cat, Str, A, B) is to be true when Cat, Str, A, and B are all lists, and there exist lists Alpha, Omega such that (1) Str = Alpha @ Cat @ Omega (2) A = Alpha @ take(floor(N/2), Cat) (3) B = drop(floor(N/2), Cat) @ Omega where @ is concatenation and take, drop are as in APL. The first observation is that this is a rather strange thing to do. What is it to be used for? If we knew that, perhaps some other data structure entirely might be a better choice. We might also know then whether all of the logical results were needed or whether the first would suffice (for example, it might be known that Cat occurs precisely once in Str). The second observation is that the search is intrinsically an O(|Cat|.|Str|) operation if we use the naive string search algorithm; since we're using lists it would be difficult to use the Boyer-Moore algorithm, but it might be possible to adapt the Knuth-Morris-Pratt one. Does anyone know of any published results on the complexity of string search in linked lists? This may or may not matter, depending on how big Cat is and how big the alphabet size is. (Can anyone tell me where to find Galil's algorithm? I had a copy but lost it in a move.) The third observation is that we can precompute the back of A and the front of B because we _know_ what they must be like: they must be like Cat. The trick I use is to work with a second copy of the list and walk down it two steps at a time for every one step we take down the original; when we've reached the end of the second copy we're half way down the first. % halve(Counter, List, Front, Back) is true when % append(Front, Back, List) and % length(Front) = floor(length(Counter)/2) % Counter must be a proper list but the others may be uninstantiated. % (this can be speeded up by unrolling; see library(basics) for some % examples of that technique). halve([_,_|N], [X|List], [X|Front], Back) :- !, halve(N, List, Front, Back). halve([_], List, [], List). halve([], List, [], List). Then we can encode split/4 directly: split(Cat, Str, A, B) :- /* precompute the last part of A (determinate) */ halve(Cat, Cat, FrontOfCat, BackOfCat), /* precompute the front part of B (determinate) */ append(BackOfCat, Omega, B), /* enumerate candidates for Alpha (non-determinate) */ append(Alpha, Cat_Omega, Str, FrontOfCat, A), /* check that Cat follows Alpha in Str (determinate) */ append(Cat, Omega, Cat_Omega). where append/5 is, as in library(lists): append([], R, R, S, S). append([H|T], L, [H|R], M, [H|S]) :- append(T, L, R, M, S). I have _not_ timed this, because I have no idea what realistic data would look like, and for all I know it might well be slower. But I think it's a wee bit clearer (append/5 has been in the library for years and halve/4 should have been, it's a known technique). It's interesting that this version obviously reduces to append(A, B, Str) when Cat is empty, which seems appropriate. We can optimise this split/4 by making the test happen a little sooner. (The original posting had done this, at the price of forcing Cat to be non-empty always.) Here's how: We start from the specification foo(Target, Source, Alpha, Omega) :- append(Alpha, Target_Omega, Source), append(Target, Omega, Target_Omega). special casing several lengths of Target and unfolding gives us foo([], S, A, C) :- !, append(A, C, S). foo([X], S, A, C) :- !, append(A, [X|C], S). foo([X,Y], S, A, C) :- !, append(A, [X,Y|C], S). foo([X,Y,Z], S, A, C) :- !, append(A, [X,Y,Z|C], S). foo([X,Y,Z|L], S, A, C) :- append(A, [X,Y,Z|LC], S), append(L, C, LC). I leave it as an exercise for the reader to apply this optimisation to split/4. (For the known length cases we can optimise the first two goals of split/4 away completely. How much this helps depends on how common Cats of length < 4 are.) Again, it all depends on what the actual data are like, and it seems more than likely to me that the best way to do this operation is to use some other data structure so th at it doesn't _need_ to be done.