Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!oliveb!pyramid!prls!philabs!ttidca!alter From: alter@ttidca.TTI.COM (Steve Alter) Newsgroups: comp.sources.d Subject: The fastest "gmatch" Summary: FSF wins Message-ID: <3607@ttidca.TTI.COM> Date: 21 Dec 88 09:54:27 GMT Organization: Citicorp/TTI, Santa Monica Lines: 51 A couple of months ago I posted a request for public-domain (or at least legally copyable) versions of the "gmatch" or "glob" routine and got a very satisfactory number of responses. So many, in fact, that I decided to compare them all and pick the fastest for use in a program of my own, which will be posted. I've completed my tests of all those gmatch-lookalikes. As it turns out, only one routine had really crippling bugs in it; the rest only differed in handling of marginal or degenerate cases such as when the pattern and/or the subject-string are null. They also differed on the character used to indicate the negation of a character-class, but that's no big deal to change. As far as the time-tests are concerned, I really put them through their paces. I wrote a program that feeds a specified pattern and subject-string to each routine and repeating for a specified number of iterations. I then wrote a shell-script that runs this test program using 69 different pattern/string pairs, and 100000 (one hundred thousand) iterations of each. I then ran this shell script through 50 complete passes (that's a total of five million iterations) on each of two different varieties of Vax processors: 11/785 (4.3 BSD) and a VaxStation 3200 (Ultrix 1.2). The 3200 ran the whole set in two days whereas the 11/785 needed three WEEKS (it actually stayed up the whole time!) Disclaimer: I don't claim that my 69 test-pairs represent any kind of "normal" situation; I just tried to catch lots of the syntax combinations. Why five million iterations? To completely flatten out any discrepancies caused by fluctuations in system-load and obtain a statistically significant average. I was aiming for a very high signal-to-noise ratio. It worked because in each test, the standard deviation turned out to be a very tiny fraction of the mean (all less than 2% on the VaxStation, most around 0.7%) so I believe that I got very accurate overall timings. So who is the fastest? WE HAVE A WINNER! THE WINNER IS... (the envelope, please)... The Free Software Foundation with the "glob_match" function! Now I don't feel right in posting this routine because it's not really public-domain (we don't see source to gcc on the net, do we?) but I'll send it to anyone who wants it, as allowed by the FSF license. If anybody wants a copy of the test environment (I can't imagine why, but who knows?) then I'll send that out too. Great thanks to David MacKenzie for sending me the gnuglob package in the first place. -- Steve Alter {csun,philabs,psivax,pyramid,quad1,rdlvax,retix}!ttidca!alter Citicorp/TTI, Santa Monica CA (213) 452-9191 x2541