Path: utzoo!utgpu!watserv1!watmath!mks.com!tslwat!louk
From: louk@tslwat.UUCP (Lou Kates)
Newsgroups: comp.unix.questions
Subject: Re: Pattern matching with awk
Message-ID: <345@tslwat.UUCP>
Date: 6 Mar 91 02:55:10 GMT
References: <9103040310.AA16551@cs.wmich.edu> <1991Mar04.051048.5864@convex.com>
Reply-To: louk@tslwat.UUCP (Lou Kates)
Organization: Teleride Sage, Ltd., Waterloo
Lines: 29

In article <1991Mar04.051048.5864@convex.com> tchrist@convex.COM (Tom Christiansen) writes:
>From the keyboard of lin@CS.WMICH.EDU (Lite Lin):
>:  I'm trying to identify all the email addresses in email messages, i.e.,
>:patterns with the format user@node.  Now I can use grep/sed/awk to find
>:those lines containing user@node, but I can't figure out from the manual
>:how or whether I can have access to the matching pattern (it can be
>:anywhere in the line, and it doesn't have to be surrounded by spaces,
>:i.e., it's not necessarily a separate "field" in awk).  If there is no
>:way to do that in awk, I guess I'll do it with lex (yytext holds the
>:matching pattern).
>
>Well, I wouldn't try to do it in awk, but that doesn't mean we have to 
>jump all the way to a C program!  
>
>    perl -ne 's/([-.\w]+@[-.\w]+)/print "$1\n"/ge;'

The following   awk  program looks   for expressions of the  form
word@word where word contains only letters, numbers  and dots and
the field separator is anything except letters, numbers, dots and
@. You  can  change the regular  expressions in order to vary the
effect:

BEGIN { FS = "[^.a-zA-Z0-9@]+"; 
	word = "[.a-zA-Z0-9]+";  
	addr = "^" word "@" word "$" 
      }
{ for(i=1; i<=NF; i++) if ($i ~ addr) print $i }

Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu