Path: utzoo!utgpu!attcan!uunet!seismo!sundc!pitstop!sun!quintus!ok From: ok@quintus.uucp (Richard A. O'Keefe) Newsgroups: comp.lang.prolog Subject: Re: Soundex revisited Message-ID: <334@quintus.UUCP> Date: 1 Sep 88 01:42:48 GMT References: <1178@tuhold> <816@kuling.UUCP> Sender: news@quintus.UUCP Reply-To: ok@quintus.UUCP (Richard A. O'Keefe) Organization: Quintus Computer Systems, Inc. Lines: 61 In article <816@kuling.UUCP> waldau@kuling.UUCP (Mattias Waldau) writes: >Thom Fruewirth and R.O'Keefe have both specified following Soundex >algorithm with Horn clauses. Following the tradition of Keith Clark, >Sten-Ake Tarnlund, Zohar Manna, Richard Waldinger, and Alan Bundy (see >his last lecture at the logic conference in Seattle) I have tried to >specify the same algorithm in first order predicate logic with >equality. > >The idea is that that specification should be easier to understand, >and therefore more likely to be correct. The question to the newsgroup >is whether this is true or not. Excluding comments, blank lines, and the table which maps letters to digits (and which are to be dropped), the last version I presented had 24 lines, 82 "words", 3.4 "words"/line. With the same exclusions, Waldau's version has 32 lines, 201 "words", 6.3 "words"/line. {I really *must* write an editor command to count tokens rather than "words".} Other things being equal (which they aren't) I would be surprised if a text containing two and a half times as many "words" were easier to understand. Note that the version I presented *is* first-order logic except for the use of if->then;else, and eliminating that by adding an appropriate number of inequalities adds a mere 6 "words" to the total. Easily the shortest specification is the one which Knuth provides; as I typed it in it took a mere 11 non-blank lines of English, and that includes the tables. It is interesting to note that Waldau's comment explaining "A % A A (e Before)(e After)(e Element) > {B=Before*Element*After & (A=Before*After | A A*B=C <-> A=[] & B=C | (e H)(e T){A=[H|T] & C=[H|T*B]} The code defines A < B when A is a *subsequence* of B, so that although [a] < [b,c] according to the comment, not so according to the code. If I've understood correctly, the rest of the program wants the definition in the comment. I offer for comparison three other definitions of the relation, using "<<" rather than "<" because I want arithmetic comparison as well. A version in a functional language like ML or Miranda: len([]) = 0. len([_|T]) = 1+len(T). X << Y = len(X) < len(Y). A version in English: X << Y iff X and Y are lists and X has fewer elements than Y. A version in Prolog: [] << [_|_]. [_|X] << [_|Y] :- X << Y.