Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!hellgate.utah.edu!caen!zaphod.mps.ohio-state.edu!rpi!sci.ccny.cuny.edu!dsndata!tssi!nolan From: nolan@tssi.UUCP (Michael Nolan) Newsgroups: comp.unix.questions Subject: Re: Pattern matching with awk Message-ID: <1994@tssi.UUCP> Date: 4 Mar 91 17:56:54 GMT References: <9103040310.AA16551@cs.wmich.edu> Reply-To: tssi!nolan Organization: Tailored Software Svcs., Lincoln, Neb. Lines: 39 lin@CS.WMICH.EDU (Lite Lin) writes: > This is a simple question, but I don't see it in "Freqently Asked >Questions", so... > I'm trying to identify all the email addresses in email messages, i.e., >patterns with the format user@node. Now I can use grep/sed/awk to find >those lines containing user@node, but I can't figure out from the manual >how or whether I can have access to the matching pattern (it can be >anywhere in the line, and it doesn't have to be surrounded by spaces, >i.e., it's not necessarily a separate "field" in awk). If you have nawk or gawk, use the match function, which sets two variables: RSTART - the first position in the string matched by the pattern. RLENGTH - the length of the string matching the pattern A pattern to match any single mail address might be rather ugly, though. If you assume all the following: 1. Upper case and lower case letters are permitted 2. Dash, underscore, and period are permitted 3. There is only one @ [I'm not sure this assumption is valid, though!] 4. There may be several ! or % in the 'user' portion 5. No commas or spaces Then that gives a pattern something like this [a-zA-Z0-9.\-_%!]+@[a-zA-Z0-9.\-_]+ I've escaped the dash, I suppose it might be necessary to escape other characters as well. Have I left anything out that might occur in strange but otherwise valid mail addresses? ------------------------------------------------------------------------------ Michael Nolan "Software means never having Tailored Software Services, Inc. to say you're finished." Lincoln, Nebraska (402) 423-1490 --J. D. Hildebrand in UNIX REVIEW UUCP: tssi!nolan (or try sparky!dsndata!tssi!nolan) Internet: nolan@helios.unl.edu (if you can't get the other address to work)