Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!uunet!wuarchive!bcm!convex!news From: tchrist@convex.COM (Tom Christiansen) Newsgroups: comp.lang.perl Subject: Re: Email parsing in perl? Message-ID: <1991Feb16.185313.1789@convex.com> Date: 16 Feb 91 18:53:13 GMT References: <16798@venera.isi.edu> Sender: news@convex.com (news access account) Reply-To: tchrist@convex.COM (Tom Christiansen) Distribution: comp Organization: CONVEX Software Development, Richardson, TX Lines: 41 Nntp-Posting-Host: pixel.convex.com From the keyboard of jas@ISI.EDU (Jeff Sullivan): :Does anyone have some code that parses email messages, extracting all :of the useful info in them by field? (e.g., To: From: Reply-TO:, :Subject: cc:, and the rest as body)? : :I'm sure someone's done this; don't want to reinvent the wheel. If you want something pretty spiffy, see Chip Salzenburg's deliver package. If you just want to roll your own for some other purpose, Larry and Randal have a nice example on p 183 of their Camel Book; $* = 1; $header =~ s/\n\s+/ /g; # Merge continuation lines. %head = ('FRONTSTUFF', split(/^([-\w]+):/, $header)); Which puts the so-called UNIX From_ line as $head{'FRONTSTUFF'} etc. It does not process multiple headers as you might want it to. Off the top of my head, you should be able to munge this into use: $/ = ''; # paramode $* = 1; $_ = <>; # read header @hdrs = split( /^([-\w]+):\s*/ ); shift @hdrs; # don't need leading stuff while ( ($name, $text) = splice(@hdrs,0,2) ) { $text =~ s/\n/ /g; # maybe don't want multlines $Header{$name} .= ", " if $Headers{$name}; $Header{$name} .= $text; } for $header (sort keys %Header) { print "<$header>: $Header{$header}\n"; } --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "All things are possible, but not all expedient." (in life, UNIX, and perl)