Path: utzoo!mnetor!tmsoft!torsqnt!hybrid!scifi!bywater!uunet!convex!usenet From: tchrist@convex.COM (Tom Christiansen) Newsgroups: comp.unix.questions Subject: Re: Pattern matching with awk Message-ID: <1991Mar04.051048.5864@convex.com> Date: 4 Mar 91 05:10:48 GMT References: <9103040310.AA16551@cs.wmich.edu> Sender: usenet@convex.com (news access account) Reply-To: tchrist@convex.COM (Tom Christiansen) Organization: CONVEX Software Development, Richardson, TX Lines: 48 Nntp-Posting-Host: pixel.convex.com From the keyboard of lin@CS.WMICH.EDU (Lite Lin): : This is a simple question, but I don't see it in "Freqently Asked :Questions", so... : I'm trying to identify all the email addresses in email messages, i.e., :patterns with the format user@node. Now I can use grep/sed/awk to find :those lines containing user@node, but I can't figure out from the manual :how or whether I can have access to the matching pattern (it can be :anywhere in the line, and it doesn't have to be surrounded by spaces, :i.e., it's not necessarily a separate "field" in awk). If there is no :way to do that in awk, I guess I'll do it with lex (yytext holds the :matching pattern). Well, I wouldn't try to do it in awk, but that doesn't mean we have to jump all the way to a C program! perl -ne 's/([-.\w]+@[-.\w]+)/print "$1\n"/ge;' that does a fair good job, but there are a lot of duplicates, so let's not print any we've already seen: perl -ne 's/([-.\w]+@[-.\w]+)/print "$1\n" unless $seen{$1}++/ge;' A more sordid approach might be: #!/usr/bin/perl while (<>) { s/([-%:.\w]+@[-@%:.\w]+)/$seen{$1}++/ge; } print join("\n", sort keys %seen), "\n"; But you've got a basic problem in that you can't distinguish message-ids from real addresses. A message_id@host looks a lot (in some cases indistinguishably so) from a user_id@host. Here's a half-hearted attempt to weed out a few strays: #!/usr/bin/perl while (<>) { s/([a-zA-Z][-%:.\w]+@[-@%:.\w]+)/$seen{$1}++/ge; } print join("\n", grep(!/^(AA)?\d/, sort keys %seen)), "\n"; --tom ps: dunno what all this ``node'' talk is. My manual talks about nodes in the filesystem section, hosts in the networking section. Or do you mail directly to i-nodes? :-) -- "UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things." -- Doug Gwyn Tom Christiansen tchrist@convex.com convex!tchrist