Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!wuarchive!psuvax1!husc6!m2c!umvlsi!cedar.ecs.umass.edu!srivasta From: srivasta@umvlsi.ecs.umass.edu (Manoj Srivastava) Newsgroups: comp.lang.c Subject: strstr sources: a summary of responses Keywords: ANSI lib, strstr Message-ID: <1158@umvlsi.ecs.umass.edu> Date: 25 Aug 90 20:37:13 GMT Sender: news@umvlsi.ecs.umass.edu Reply-To: srivasta@umvlsi.ecs.umass.edu Organization: University of Massachusetts at Amherst Lines: 161 I want to thank all the people who replied to my request for a srtstr function, I really appreciated the response I got. Most of the answers were some variation of the linear (brute force) sub-string match algorithm O(NM) in the worst case [where N = strlen (text) and M = strlen (substring)]. Here is a gem: --------cut here-------- #include char *strstr(register char const *s, register char const *t) { do { register char const *ss = s; register char const *tt = t; do { if (*tt == '\0') return ((char *)s); } while (*ss++ == *tt++); } while (*s++ != '\0'); return (NULL); } --------cut here-------- There were other examples, but since all of them are a variation of the same theme, I shall not waste net bandwidth by posting them all. In the mean while I had hacked up the following based on the Knuth-Morris-Pratt method [I guess I should have made it clear that I was looking for "safeguards" which I (naively) assume are built into library routines as opposed to code by "mere" programmers ;-) From the lib excerpts which were posted I guess lib programmers are human too.;-) ;-) ]. This is O(N+M) in the worst case. --------cut here-------- #include char * strstr (char * cs, char * ct) { register int i; register int j; int M = strlen (ct); int N = strlen (cs); int * next = (int *) malloc ((M + 1) * sizeof(int)); /* initialize the next array */ next[0] = -1; for (i = 0, j = -1; i < M; i++, j++, next[i] = (ct[i] == ct[j]) ? next[j] : j) while ((j >= 0) && (ct[i] != ct[j])) j = next[j]; /*now for the pattern match */ for (i = 0, j = 0; j < M && i < N - M + j + 1; i++, j++) while ((j >= 0) && (cs[i] != ct[j])) j = next[j]; free (next); if (j == M) return & cs [i - M]; else return NULL; } --------cut here-------- And finally, an implementation of the Boyer-Moore algorithm. I am not sure offhand, but while the worst case complexity remains O(N+M), the avg case behaviour is O(N/M) ??? ------8-<---Cut here---8-<----- /* strstr, Copyright (c) 1990 by John Lacey. -*- C -*- This program is the ANSI C strstr() string routine. It is basically a string search routine. This implementation uses the Boyer-Moore algorithm. Some rough timings found this implementation about twice as fast as the following straight string search. char * strstr( const char *text, const char * pattern ) { int i = 0, j = 0; while ( text[i] && pattern[j] ) { i++; j = ( text[i] == pattern[j] ) ? j + 1 : 0; } return (char *) ( ( pattern[j] ) ? NULL : text + i - j + 1 ); } So, while the algorithm itself is more complicated, that complexity is worth it. */ #include #include #include char * strstr( const char *text, const char * pattern ) { int N = strlen( text ); int M = strlen( pattern ); int i = M - 1; int i0, j, k; int ch, d[SCHAR_MAX + 1]; for ( ch = 0; ch <= SCHAR_MAX; ch++ ) d[ch] = M; for ( j = 0; j < M-1; j++ ) d[pattern[j]] = M - j - 1; do { j = M - 1; k = i; while ( j >= 0 && pattern[j] == text[k] ) { k--; j--; } i0 = i; i += d[text[i]]; } while ( j >= 0 && i < N ); return (char *)(( j < 0 ) ? text + i0 - M + 1 : NULL); } #ifdef TEST main( int argc, char *argv[] ) { char *status = NULL; if ( argc != 3 ) { puts( "Try again, with a text string and a pattern string." ); exit( 1 ); } else { status = strstr( argv[1], argv[2] ); if ( status != NULL ) printf( "Found substring: %s\n", status ); else printf( "Nothing doing, boss. I can't see a thing.\n" ); } } #endif /* TEST */ ------8-<---Cut here---8-<----- Man weeps to think that he will die so soon; woman, that she was born so long ago. -- H. L. Mencken