Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uwm.edu!wuarchive!emory!stiatl!srchtec!johnb
From: johnb@srchtec.UUCP (John Baldwin)
Newsgroups: comp.lang.c
Subject: Re: strstr sources: a summary of responses
Keywords: ANSI lib, strstr
Message-ID: <183@srchtec.UUCP>
Date: 29 Aug 90 16:18:08 GMT
References: <1158@umvlsi.ecs.umass.edu> <11246@alice.UUCP>
Organization: search technology, inc.
Lines: 40

In article <11246@alice.UUCP> andrew@alice.UUCP (Andrew Hume) writes:
>
>people interested in using REAL algorithms for strstr should check
>out the fast substring algorithm in the august CACM

Great article.  But I digress before I even start.  :-)

As a result of the comments being tossed around, I went home last night and
checked the source code for strstr() in the Microsoft C ver 5.10 standard
library.  Not that I'm claiming that "the Microsoft way is the Right Way"
(in fact, I think I'd have to argue *against* it! :) :) :)), but I figured
its production code in a heavily-used production compiler, and they would
want their library to perform well in benchmarks.

They used the brute-force algorithm.

Just to double-check, I looked at several other library functions at random.
All seemed to be very good choices as far as algorithms go, so it seems like
they *didn't* just use whatever was familiar, but instead put a good bit
of thought into what they were doing.  To be sure, strstr() was written in
assembler, and they used a machine instruction to scan the "search-in" string
for the next character of the "search-for" string.

Probably, for the most-general case, letting the hardware do the looking,
and avoiding the setup costs of the arrays necessary for Boyer-Moore or
Knuth-et-al, ends up being faster.

In fact, I can think of another good illustration of this effect: a colleague
was trying to figure out what sorting and storage methods to use for certain
information in his part of a very large program.  We talked about whether
the information should be maintained in sorted order, or sorted just before
use, sorted-on-entry, et cetera ad infinitum.  Then it occurred to me to ask
about the scale of the problem.  No more than a dozen items would ever be
"active" at any one time.  The fastest mechanism turned out to be the most
simple: store things FIFO (unordered) and do a linear search!
-- 
John T. Baldwin                      |  johnb%srchtec.uucp@mathcs.emory.edu
Search Technology, Inc.              | 
                                     | "... I had an infinite loop,
My opinions; not my employers'.      |  but it was only for a little while..."