Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uwm.edu!wuarchive!emory!stiatl!srchtec!johnb From: johnb@srchtec.UUCP (John Baldwin) Newsgroups: comp.lang.c Subject: Re: strstr sources: a summary of responses Keywords: ANSI lib, strstr Message-ID: <183@srchtec.UUCP> Date: 29 Aug 90 16:18:08 GMT References: <1158@umvlsi.ecs.umass.edu> <11246@alice.UUCP> Organization: search technology, inc. Lines: 40 In article <11246@alice.UUCP> andrew@alice.UUCP (Andrew Hume) writes: > >people interested in using REAL algorithms for strstr should check >out the fast substring algorithm in the august CACM Great article. But I digress before I even start. :-) As a result of the comments being tossed around, I went home last night and checked the source code for strstr() in the Microsoft C ver 5.10 standard library. Not that I'm claiming that "the Microsoft way is the Right Way" (in fact, I think I'd have to argue *against* it! :) :) :)), but I figured its production code in a heavily-used production compiler, and they would want their library to perform well in benchmarks. They used the brute-force algorithm. Just to double-check, I looked at several other library functions at random. All seemed to be very good choices as far as algorithms go, so it seems like they *didn't* just use whatever was familiar, but instead put a good bit of thought into what they were doing. To be sure, strstr() was written in assembler, and they used a machine instruction to scan the "search-in" string for the next character of the "search-for" string. Probably, for the most-general case, letting the hardware do the looking, and avoiding the setup costs of the arrays necessary for Boyer-Moore or Knuth-et-al, ends up being faster. In fact, I can think of another good illustration of this effect: a colleague was trying to figure out what sorting and storage methods to use for certain information in his part of a very large program. We talked about whether the information should be maintained in sorted order, or sorted just before use, sorted-on-entry, et cetera ad infinitum. Then it occurred to me to ask about the scale of the problem. No more than a dozen items would ever be "active" at any one time. The fastest mechanism turned out to be the most simple: store things FIFO (unordered) and do a linear search! -- John T. Baldwin | johnb%srchtec.uucp@mathcs.emory.edu Search Technology, Inc. | | "... I had an infinite loop, My opinions; not my employers'. | but it was only for a little while..."