Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!elroy.jpl.nasa.gov!jpl-devvax!lwall
From: lwall@jpl-devvax.jpl.nasa.gov (Larry Wall)
Newsgroups: comp.editors
Subject: Re: perl script request
Message-ID: <1991May11.012824.5139@jpl-devvax.jpl.nasa.gov>
Date: 11 May 91 01:28:24 GMT
References: <2083@qusunc.queensu.CA>
Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Organization: Jet Propulsion Laboratory, Pasadena, CA
Lines: 65

In article <2083@qusunc.queensu.CA> prastowo@qucis.queensu.CA (Bambang Nurcahyo Prastowo) writes:
: Could any perl guru help me rewrite the following sh script in perl ?
: The script is a mail sorter that cleans incoming mails (simplifies
: headers and removes duplicated blank lines) and save them in mailboxes
: named after the first words of email addresses appear in "From:...".

Sure.  It'd go something like this:
------------------------------------------------------
#!/usr/bin/perl

$MAILDIR = "/grad/prastowo/MAIL";

$/ = '';			# paragraph mode
$* = 1;				# enable multiline matching for ^

# parse header into associative array

%hdr = ('FROM', split(/^([-\w]+):[ \t]*/, <>));

# open appropriate file

$sd = $hdr{'From'};
$sd =~ s/^.*<\s*(.*)>.*$/$1/;	# handle <> form
$sd =~ s/[\s@%.!].*//;		# delete trailing stuff
$sd =~ tr/a-z/A-Z/;		# canonicalize to upper case

open(STDOUT, ">>$MAILDIR/$sd") || die "Can't open $sd: $!\n";

# write the header (pick your order)

print		   $hdr{'FROM'}		if $hdr{'FROM'};
print 'From: ',    $hdr{'From'}		if $hdr{'From'};
print 'Date: ',    $hdr{'Date'}		if $hdr{'Date'};
print 'Subject: ', $hdr{'Subject'}	if $hdr{'Subject'};
print "\n";

# now do each remaining paragraph

while (<>) {
    $* = 0;
    s/^\n+//;			# delete extra blank lines
    $* = 1;
    s/^(Date|From|Subject):/>$1:/g;
    print;
}
print "_____________________________________\n\n";

------------------------------------------------------

I translated it fairly literally, including the fact that addresses like

	foo!bar@fiddle

save to file FOO rather than BAR.  I chose to process the message in
paragraph mode because it makes it easy to find extra newlines (they
show up at the front of the paragraph).  The only tricky think is that
you have to turn on and off whether ^ matches the beginning of each line
in the string or just the beginning of the string.

This could fairly easily be restructured to handle multiple messages
on the input stream by putting the header code into the final loop
and executing it whenever the current paragraph looks like a header.

Larry Wall
lwall@netlabs.com