Newsgroups: comp.unix.misc Path: utzoo!utgpu!watserv1!watmath!att!cbnewsh!wcs From: wcs@cbnewsh.att.com (Bill Stewart 201-949-0705 erebus.att.com!wcs) Subject: Re: Fast file scan (long) Organization: AT&T Bell Labs Government Communications Date: Sun, 21 Oct 90 00:25:04 GMT Message-ID: <1990Oct21.002504.24057@cbnewsh.att.com> References: <299@lysator.liu.se> <1990Oct2.192028.29731@sco.COM> <1990Oct17.183310.29684@celebr.uucp> Keywords: some examples Lines: 33 In article <1990Oct17.183310.29684@celebr.uucp>, jbm@celebr.uucp (John B. Milton) writes: > >In article <299@lysator.liu.se> pen@lysator.liu.se (Peter Eriksson) writes: > >>I`d like to know how to scan all files in a directory (and it's sub- > >>directories) for a specific string (without regular expressions) as fast > >>as possible with standard Unix tools and/or some special programs. > And the winner is: > find . -type -f -print | xargs grep SEARCH > Use whichever grep works best (fastest, does what you want, etc.) For this case, you probably want a grep based on the Boyer-Moore pattern-matching algorithm; several were posted to *.sources a few years ago. Essentially, what B-M does is preprocess the search pattern, and start looking in the text where the END of the pattern would be, e.g. Pat.: foobarbaz Ptr: | Text: Now is the time for all good parties to foobarbaz to an end. foobarbaz The 'h' in the text doesn't match the 'z', and since there are no 'h's anywhere in the pattern, you can advance the pointer 9 positions, avoiding a lot of comparisons. The next look gets you: Pat: foobarbaz Ptr: | Text: Now is the time for all good parties to foobarbaz to an end. 'o' doesn't match 'z', but IS in the pattern, so you advance by 6. Pat: foobarbaz Ptr: | Text: Now is the time for all good parties to foobarbaz to an end. And so on, until you find it or reach the end. -- Thanks; Bill # Bill Stewart 908-949-0705 erebus.att.com!wcs AT&T Bell Labs 4M-312 Holmdel NJ Government is like an elephant on drugs: It's very confused, makes lots of noise, can't do anything well, stomps on anyone in its way, and it sure eats a lot.