Xref: utzoo alt.sources:1360 comp.lang.perl:100 Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!hp4nl!mhres!jv From: jv@mh.nl (Johan Vromans) Newsgroups: alt.sources,comp.lang.perl Subject: Re: Perl version of from (Was: Re: from.sed (v1.2)) Message-ID: Date: 3 Jan 90 21:59:19 GMT References: <1989Dec20.222732.5633@trigraph.uucp> Sender: jv@mhres.mh.nl Followup-To: comp.lang.perl Organization: Multihouse Gouda, the Netherlands Lines: 91 In-reply-to: jv@mh.nl's message of 22 Dec 89 05:11:43 GMT Note: I have redirected follow-ups to comp.lang.perl. In article jv@mh.nl (Johan Vromans) writes: | In article <1989Dec20.222732.5633@trigraph.uucp> john@trigraph.uucp (John Chew) writes: | Here's a new version of from.sed, my sed script that does the job | of from(1) better and faster. It now truncates long subjects, | correctly handles messages without subjects and From lines with % | or @foo: routing. | | Yes, I tried writing this in Perl. I'm not an expert Perl programmer, | but I couldn't get it to run faster than about 70% slower than sed. To which I replied: | I've been using a perl version of 'from' for a long time, so I trow it | in. [...] | It runs about as fast as the sed version. Typical times for a large | mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl. Script fragment: while ( $line = <> ) { chop ($line); # scan until "From_" header found next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; I was pointed out by John J. Chew that tightening the search for "From " would speed up the program by 30%. He suggested: while ( <> ) { next unless /^From /; chop ($line); next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; Well, I tried it, and -NOT to my surprise- I found out that the major speedup is caused by leaving out the assignment to the variable $line and postponing the chop. I couldn't imagine (knowing how Larry likes optimisation) that next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; would take more time to fail than next unless /^From /; With the speedups, the perl script beats the sed script on both large and small mailboxes: ~ > wc -lc INBOX 163 6927 INBOX ~ > dotime 5 perl src/perl.pl INBOX Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 0.2 0.4 0.2 0.2 0.2 0.2 user 0.0 0.0 0.0 0.0 0.0 0.0 sys 0.1 0.1 0.1 0.1 0.1 0.1 ~ > dotime 5 sed -f from.sed INBOX Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 0.5 0.7 0.4 0.5 0.4 0.4 user 0.1 0.1 0.1 0.1 0.1 0.1 sys 0.2 0.2 0.2 0.3 0.2 0.2 ~ > wc -lc maildir/pax 46585 1240000 maildir/pax ~ > dotime 5 perl src/from.pl maildir/pax Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 21.9 21.9 20.3 21.1 25.7 20.7 user 14.0 14.4 14.3 14.1 13.7 13.6 sys 5.9 5.8 4.9 5.7 7.4 5.9 ~ > dotime 5 sed -f from.sed maildir/pax Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 23.1 23.4 22.7 22.9 23.1 23.5 user 14.8 14.8 14.9 14.8 14.3 15.2 sys 7.4 7.4 7.1 7.3 7.8 7.2 I have posted the "dotime" program to alt.sources, for whoever thinks she/he can use it. Have fun! Johan -- Johan Vromans jv@mh.nl via internet backbones Multihouse Automatisering bv uucp: ..!{uunet,hp4nl}!mh.nl!jv Doesburgweg 7, 2803 PL Gouda, The Netherlands phone/fax: +31 1820 62944/62500 ------------------------ "Arms are made for hugging" -------------------------