Path: utzoo!utgpu!water!watmath!clyde!cbosgd!ucbvax!husc6!cca!g-rh From: g-rh@cca.CCA.COM (Richard Harter) Newsgroups: comp.unix.questions Subject: Re: Unix diff(1) algorithm Message-ID: <23307@cca.CCA.COM> Date: 16 Jan 88 07:31:29 GMT References: <11229@brl-adm.ARPA> <1571@aurora.UUCP> <3219@psuvax1.psu.edu> Reply-To: g-rh@CCA.CCA.COM.UUCP (Richard Harter) Organization: Computer Corp. of America, Cambridge, MA Lines: 33 In article <3219@psuvax1.psu.edu> schwartz@gondor.cs.psu.edu (Scott E. Schwartz) writes: >>also investigate the webb miller / eugene myers code presented in > >Webb is actually at Penn State. Gene Myers is at Arizona, except that >he is here this year. > >Recently they came up with an enhancement to their algorythm >("Haslock's refinement") that doubles (in the worst case) the speed >of the original algorythm. Best case (in which all edits are insertions or >are all deletions) runs in linear time. > >To be fair, there are cases where diff is faster. Diff does best when >the longest common subsequence is short, i.e. lots of diffs. Fcomp >(as they call it) does best when the edit script is short, i.e. few diffs. But, but, but... Finding the longest common subsequence is a linear process. Or at least it is sufficiently close to being linear that it makes no difference. I don't know how diff does it or fcomp does it, but the algorithm that I use builds a ordered list of intermediate subsequence heads. Each new item goes onto a list head. The trick to making it quasi-linear is to do an interpolation search when placing a new item into the list of intermediates. [This isn't really linear, but, given the conditions, might as well be.] As I remarked earlier, my experience is that the time to find the longest common subsequence is dominated by the sort time, even if a linear sort if being used. (By about a factor of 4 to 1, as I recall.) Am I missing something? Do diff and/or fcomp bypass the sort? -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.