Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!decwrl!labrea!sri-unix!quintus!ok From: ok@quintus.uucp (Richard A. O'Keefe) Newsgroups: comp.lang.prolog Subject: Re: Soundex revisited Message-ID: <341@quintus.UUCP> Date: 2 Sep 88 03:36:26 GMT References: <1190@tuhold> Sender: news@quintus.UUCP Reply-To: ok@quintus.UUCP (Richard A. O'Keefe) Organization: Quintus Computer Systems, Inc. Lines: 49 In article <1190@tuhold> thom@tuhold (Thom Fruehwirth) writes: >In a recent article ok@quintus.uucp (Richard A. O'Keefe) writes: >> Since the specification of the Soundex function requires that >> duplicate characters be discarded, it is possible to construct >> strings of arbitrary length _all_ of which must be inspected >> even if only 3 codes are produced. (Stick "sss.......sss" in >> front of any word you like.) >But what sense does this make ? The point is that for any N I can construct a list of length N which any implementation of the Soundex function must examine in its entirety: stopping after generating 4 elements of *output* doesn't mean that you can avoid looking at all of the *input*. Stopping early doesn't improve the worst case at all. >I pleased to see that Richard O'Keefe's code follows my initial >suggestions on how to transform Soundex(-like) specifications. Well, no. My code doesn't have all those cuts in it... >Only one little transformation is still missing: That of transforming >zeros/2 away. It's a minor change, but it avoids going the roundabout >way of counting the number of character-codes produced so far: It's a very pretty method, which I have often used. But it doesn't avoid counting. The technique is to exploit the epimorphism from lists to natural numbers (length/2), using [] -> 0 [_|X] -> X'+1 Minor correction: my code didn't count the number of character codes produced so far, but the number *NOT* produced (as does Fruewirth's latest version). The point of my postings about Soundex is that I deny the strong distinction Fruewirth draws bewteen "specification" and "implementation". In particular, I was challenging the idea implicit in his original pair of versions that the "implementation" version was licensed to be ugly. I think the series of postings so far has demonstrated quite clearly that "specification" and "implementation" are *directions* rather than *places*. We're still left with the rather unfortunate point that the clearest specification for Soundex that we've seen so far is Knuth's English text. I was going to include the cleanest functional specification I could come up with, but excluding the code table it is 14 lines, 84 "words", so isn't really an improvement on the Prolog "implementation". If this specification/implementation topic is of interest, perhaps we should find another example.