Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!husc6!uwvax!crystal!solomon From: solomon@crystal.UUCP Newsgroups: comp.sources.d Subject: Re: "look" utility? Message-ID: <271@crys.WISC.EDU> Date: Sat, 7-Mar-87 08:07:14 EST Article-I.D.: crys.271 Posted: Sat Mar 7 08:07:14 1987 Date-Received: Sun, 8-Mar-87 15:04:17 EST References: <1916@cbdkc1.UUCP> <837@maynard.BSW.COM> <1145@ihlpf.ATT.COM> <846@maynard.BSW.COM> Organization: U of Wisconsin CS Dept Lines: 88 Summary: Some hard data on look, grep, and friends. Look is much faster than any of the greps, since it assumes the file is sorted and uses binary search, for a run time proportional to the log of the number of lines rather than the number of lines (16 vs 24K in the case of /usr/dict/words). What's more surprising is the relative speeds of the various greps. I ran a few tests (4.3BSD on an unloaded VAX 780) and reproduce the data below. Fgrep is always the slowest (it really should be called 'dgrep' for 'dumb-grep'). Egrep is always the fastest. A few years ago, I had a visit from Al Aho (author of egrep). He was quite proud of egrep (as well he should be). He had me type on my terminal time wc /usr/dict/words time egrep -c '^' /usr/dict/words Egrep was the clear winner. He said he considered writing the inner loop in assembler, but decided he'd gone far enough. In most cases, the program is I/O bound anyhow. If only egrep understood all the options of the others (-w and -i from grep, and -x from fgrep), we could dump the others. (The comment in the man page about exponential behavior is a red herring. The exponential behavior is a theoretical worst case that never really comes up in practice, and besides, we're talking about *size* which is exponential in the length of the *pattern*, which is usually quite short. Here are my test results. Warning: These figures are for comparison only; actual milage may vary. first word + time grep -l a /usr/dict/words 4.8 real 4.1 user 0.3 sys + time fgrep -l a /usr/dict/words 0.2 real 0.0 user 0.1 sys + time egrep -l a /usr/dict/words 0.2 real 0.0 user 0.1 sys + time grep -l ^a /usr/dict/words 4.7 real 3.7 user 0.4 sys + time egrep -l ^a /usr/dict/words 0.2 real 0.0 user 0.1 sys + time look a 1.1 real 0.4 user 0.2 sys middle word + time grep -l jade /usr/dict/words 4.1 real 3.4 user 0.3 sys + time fgrep -l jade /usr/dict/words 3.0 real 2.4 user 0.3 sys + time egrep -l jade /usr/dict/words 2.7 real 1.3 user 0.3 sys + time grep -l ^jade /usr/dict/words 4.4 real 3.7 user 0.3 sys + time egrep -l ^jade /usr/dict/words 1.8 real 1.2 user 0.3 sys + time look jade 0.4 real 0.0 user 0.1 sys last word + time grep -l zygote /usr/dict/words 4.0 real 3.4 user 0.3 sys + time fgrep -l zygote /usr/dict/words 5.7 real 4.8 user 0.4 sys + time egrep -l zygote /usr/dict/words 3.6 real 2.5 user 0.5 sys + time grep -l ^zygote /usr/dict/words 5.1 real 3.7 user 0.3 sys + time egrep -l ^zygote /usr/dict/words 4.1 real 2.5 user 0.4 sys + time look zygote 0.2 real 0.0 user 0.1 sys missing word + time grep -l 2 /usr/dict/words 4.1 real 3.3 user 0.4 sys + time fgrep -l 2 /usr/dict/words 5.6 real 4.8 user 0.4 sys + time egrep -l 2 /usr/dict/words 3.7 real 2.5 user 0.5 sys + time grep -l ^2 /usr/dict/words 4.7 real 3.7 user 0.4 sys + time egrep -l ^2 /usr/dict/words 3.3 real 2.5 user 0.4 sys + time look 2 0.5 real 0.0 user 0.2 sys -- Marvin Solomon Computer Sciences Department University of Wisconsin, Madison WI solomon@gjetost.wisc.edu or seismo!uwvax!solomon