Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!brutus.cs.uiuc.edu!psuvax1!news From: flee@shire.cs.psu.edu (Felix Lee) Newsgroups: comp.lang.perl Subject: Re: speed: V2 verses V3 Message-ID: <1989Dec18.112735.4443@psuvax1.cs.psu.edu> Date: 18 Dec 89 11:27:35 GMT References: <1808@uvaarpa.virginia.edu> <6609@jpl-devvax.JPL.NASA.GOV> <4047@convex.UUCP> <1989Dec18.032836.16434@psuvax1.cs.psu.edu> <4055@convex.UUCP> Sender: news@psuvax1.cs.psu.edu (Usenet) Organization: Penn State University Computer Science Lines: 48 Tom Christiansen wrote: > Anybody want to post a better benchmark? I'm having trouble finding > something that'll actually run for a long time. You guys aren't really seriously into text processing, are you. :-) Here's timings for a perl script that counts word frequencies. % time perl-2 wf.pl /etc/termcap >/dev/null 13.3u + 0.9s = 0:15 (95%); (0k+864k)/92k (0+0)io (0f+80r)pg+0sw % !! 13.4u + 0.7s = 0:14 (98%); (0k+872k)/92k (0+0)io (0f+79r)pg+0sw % !! 13.3u + 0.8s = 0:14 (100%); (0k+872k)/92k (0+0)io (0f+79r)pg+0sw % time perl-3 wf.pl /etc/termcap >/dev/null 18.6u + 1.0s = 0:20 (95%); (0k+944k)/84k (0+0)io (0f+73r)pg+0sw % !! 18.7u + 0.9s = 0:20 (95%); (0k+944k)/84k (0+0)io (0f+72r)pg+0sw % !! 18.7u + 0.9s = 0:20 (94%); (0k+944k)/84k (0+0)io (0f+73r)pg+0sw This is on a Sun-4. /etc/termcap is 146k, about 32000 total words, about 2000 different words, average word length is 3 chars. If you want worse behavior, try /usr/dict/words. About 24000 words, every one unique, average length 7 chars. I get 103.0u for perl-2 and 158.2u for perl-3. If you eliminate the simple arithmetic in the script, perl-3 performs a little better, but still worse than perl-2. Here's the script. #!/usr/bin/perl # Count word frequency. while (<>) { foreach $k (split(/[^a-zA-Z]+/)) { $k =~ tr/A-Z/a-z/, ++$freq{$k} if ($k); } } foreach $k (sort downfreq keys(freq)) { printf "%5d %s\n", $freq{$k}, $k; } sub downfreq { ($freq{$b} - $freq{$a}) || ($a gt $b); } -- Felix Lee flee@shire.cs.psu.edu *!psuvax1!flee