Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!uunet!wuarchive!bcm!convex!news
From: tchrist@convex.COM (Tom Christiansen)
Newsgroups: comp.lang.perl
Subject: Re: Email parsing in perl?
Message-ID: <1991Feb16.185313.1789@convex.com>
Date: 16 Feb 91 18:53:13 GMT
References: <16798@venera.isi.edu>
Sender: news@convex.com (news access account)
Reply-To: tchrist@convex.COM (Tom Christiansen)
Distribution: comp
Organization: CONVEX Software Development, Richardson, TX
Lines: 41
Nntp-Posting-Host: pixel.convex.com

From the keyboard of jas@ISI.EDU (Jeff Sullivan):
:Does anyone have some code that parses email messages, extracting all
:of the useful info in them by field? (e.g., To: From: Reply-TO:,
:Subject: cc:, and the rest as body)?
:
:I'm sure someone's done this; don't want to reinvent the wheel.

If you want something pretty spiffy, see Chip Salzenburg's deliver
package.  If you just want to roll your own for some other purpose,
Larry and Randal have a nice example on p 183 of their Camel Book;

    $* = 1;
    $header =~ s/\n\s+/ /g;      # Merge continuation lines.
    %head = ('FRONTSTUFF', split(/^([-\w]+):/, $header));

Which puts the so-called UNIX From_ line as $head{'FRONTSTUFF'}
etc.  It does not process multiple headers as you might want it to.
Off the top of my head, you should be able to munge this into use:

    $/ = ''; # paramode
    $* = 1;
    $_ = <>; # read header

    @hdrs = split( /^([-\w]+):\s*/ );
    shift @hdrs;  # don't need leading stuff

    while ( ($name, $text) = splice(@hdrs,0,2) ) {
	$text =~ s/\n/ /g;  # maybe don't want multlines
	$Header{$name} .= ", " if $Headers{$name};
	$Header{$name} .= $text;
    }

    for $header (sort keys %Header) {
	print "<$header>: $Header{$header}\n";
    }


--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
 "All things are possible, but not all expedient."  (in life, UNIX, and perl)