Path: utzoo!utgpu!watserv1!watmath!mks.com!tslwat!louk From: louk@tslwat.UUCP (Lou Kates) Newsgroups: comp.unix.questions Subject: Re: Pattern matching with awk Message-ID: <345@tslwat.UUCP> Date: 6 Mar 91 02:55:10 GMT References: <9103040310.AA16551@cs.wmich.edu> <1991Mar04.051048.5864@convex.com> Reply-To: louk@tslwat.UUCP (Lou Kates) Organization: Teleride Sage, Ltd., Waterloo Lines: 29 In article <1991Mar04.051048.5864@convex.com> tchrist@convex.COM (Tom Christiansen) writes: >From the keyboard of lin@CS.WMICH.EDU (Lite Lin): >: I'm trying to identify all the email addresses in email messages, i.e., >:patterns with the format user@node. Now I can use grep/sed/awk to find >:those lines containing user@node, but I can't figure out from the manual >:how or whether I can have access to the matching pattern (it can be >:anywhere in the line, and it doesn't have to be surrounded by spaces, >:i.e., it's not necessarily a separate "field" in awk). If there is no >:way to do that in awk, I guess I'll do it with lex (yytext holds the >:matching pattern). > >Well, I wouldn't try to do it in awk, but that doesn't mean we have to >jump all the way to a C program! > > perl -ne 's/([-.\w]+@[-.\w]+)/print "$1\n"/ge;' The following awk program looks for expressions of the form word@word where word contains only letters, numbers and dots and the field separator is anything except letters, numbers, dots and @. You can change the regular expressions in order to vary the effect: BEGIN { FS = "[^.a-zA-Z0-9@]+"; word = "[.a-zA-Z0-9]+"; addr = "^" word "@" word "$" } { for(i=1; i<=NF; i++) if ($i ~ addr) print $i } Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu