Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!gorodish!guy From: guy@gorodish.Sun.COM (Guy Harris) Newsgroups: comp.unix.questions Subject: Re: Unix diff(1) algorithm Message-ID: <39802@sun.uucp> Date: 23 Jan 88 07:02:10 GMT References: <11229@brl-adm.ARPA> <7068@brl-smoke.ARPA> <153@ateng.UUCP> Sender: news@sun.uucp Lines: 26 > >The trouble with diff(1) is that it assumes that the only changes are > >insertions and deletions. So if you swap two functions, it reports > >two changes. The method which appeared in CACM can spot moves as well > >as insertions and deletions. > > The "hdiff" program does move detection. It's in the comp.sources.unix > archives. That's because, according to the comments, it implements the algorithm described in the CACM paper, which was: "A Technique for Isolating Differences Between Files", Paul Heckel, Interactive Systems Consultants, CACM, Volume 21, Number 4, April 1978, pp. 264-268. The "h" stands for "Heckel". The paper mentions the Hunt/McIlroy paper in passing, saying: The only implementation of (Longest Common Subsequence) for file comparison of which the author is aware takes linear time and space for most practical cases but takes worst-case O(m*n*log n) time and O(m*n) space. As a practical LCS implementation, it uses several clever techniques and is neither concise nor easy to understand. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com