Newsgroups: comp.theory Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!ira.uka.de!rusmv1!rusmv1!mark From: mark@adler.philosophie.uni-stuttgart.de (Mark Johnson) Subject: Re: Longest repeated substring problem. In-Reply-To: ajay@cs.Buffalo.EDU's message of 15 Apr 91 18:08:55 GMT Message-ID: Sender: news@rusmv1.rus.uni-stuttgart.de (USENET News System) Organization: IMSV, University of Stuttgart, Germany References: <70943@eerie.acsu.Buffalo.EDU> Date: 16 Apr 91 12:33:39 Lines: 81 Algorithm for finding longest common substring (I don't know if this is new or not). Input: An array A (of characters) of length n. Output: Integers p, q, and l such that A[p] .. A[p+l] = A[q] .. A[q+l] and there do not exist p', q' and l' > l such that A[p'] .. A[p'+l'] = A[q'] .. A[q'+l'] Its running time is O(l n lg n). The algorithm rests on the observation that the required substrings of A have the property that they are adjacent in a sorted sequence of all substrings of A. This algorithm uses an idea of Martin Kay's for space- and time-efficiently sorting all of the substrings of a string. The algorithm uses two arrays of integers P and L, both of length n. We use P as an array of indices into the array A, whose value is the suffix of A beginning at P[i]. 1. For 0 <= i < n set P[i] = i. P now contains indices to every suffix in A. 2. Sort the array P, where P[i] < P[j] iff the suffix of A beginning at P[i] precedes the suffix beginning at P[j] is standard lexicographic order. P now contains indices to every suffix in A in lexicographic order. 3. For each 0 <= i < n-1, let L[i] be the length of the common prefix of the suffixes of A beginning at P[i] and P[i+1]. 4. Find a maximum m such that for all 0 <= i < n-1, L[m] >= L[i]. Set p = P[m], q = P[m+1], l = L[m] and stop. We estimate the time complexity as follows. Step 1 takes O(n) time. Step 2 takes O(l n lg n) (since each comparision may need to compare at most l characters) Step 3 takes O(l n) time. Step 4 takes O(n) time. Comments? Mark Johnson -- Mark Johnson Institut fuer maschinelle Sprachverarbeitung - Computerlinguistik Universitaet Stuttgart Keplerstrasse 17 D-7000 Stuttgart 1 Germany. phone: 0711 - 121 3132. On leave until mid July 1991 from: Cognitive and Linguistic Sciences, Box 1978 Brown University Providence, RI 02912 USA email addresses: mark@adler.philosophie.uni-stuttgart.de mj@cs.brown.edu markj@ai.mit.edu johnson@csli.stanford.edu mark@ego.mit.edu