Newsgroups: comp.sys.amiga.programmer Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!snorkelwacker.mit.edu!ira.uka.de!news From: S_ASCHMIDT@iravcl.ira.uka.de (|S| Angela Schmidt) Subject: Re: Source code for search In-Reply-To: Lee's message of Friday, 7 Jun 1991 18: 06:42 EDT Message-ID: <1991Jun10.123235.7395@ira.uka.de> Sender: news@ira.uka.de (USENET News System) Organization: University of Karlsruhe, FRG (Informatik Rechnerabteilung) References: <492@regina.uregina.ca> <91158.180643UH2@psuvm.psu.edu> Date: Mon, 10 Jun 1991 13:05:18 GMT X-News-Reader: VMS NEWS 1.07 Lines: 58 In <91158.180643UH2@psuvm.psu.edu> Lee writes: > >> Would anyone happen to have source code for a (very) fast search routine? > >> Basically, what I need is something like strstr(), but I need to be able > >> to modify it to make it non-case sensitive. I'm basically looking through > >> a small (512-2048 bytes) buffer for an ASCII string. > >> > > If you mean what I think you mean, you probably want Boyer-Moore search. > There is probably source for it buried deep in a copy of egrep, or > somewhere, but I'm too lazy to dig it out. You could look > in a good data structures book---I think Sedgewick covers it. > > Another likely source would be a compiler book. Look in the chapter that > covers token parsing. There are techniques where you compile your > search string into a FSM, and then run it using the buffer as the tape. I've already implemented Boyer & Moore. It works very fast with long searching-strings. Let me describe what Boyer & Moore does: "A" shall be the string you are searching for, "B" the string you are searching in. 1) Create a field (here called h) with 256 elements, all set to FALSE. 2) for (i=strlen (A)-1; i>=0; i--) h[A[i]]=TRUE; So afterwards, if there appears the character "A" in your searchingstring, the field 65 will be set to TRUE and so you'll get a table, which tells you wether there exists a special character or not. Set k to 0. 3) Now say, "A" is "a" characters long. Let c be B[a-1+k] ;;;;;;; (String A) => a=7 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,... (B) ^ ^ k c=B[a-1+k] (0) (6) Now look if the field hv[c] is FALSE. If it's FALSE, the string must not touch B[a-1+k], so let k=k+a, and continue with (3) else there might be any touch. Test this: for (i=0; i