Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!ncar!gatech!purdue!haven.umd.edu!uvaarpa!mmdf From: marc@athena.mit.edu (Marc Horowitz) Newsgroups: comp.lang.perl Subject: Re: regular expressions Message-ID: <1991Jun6.053324.19841@uvaarpa.Virginia.EDU> Date: 6 Jun 91 05:33:24 GMT Sender: mmdf@uvaarpa.Virginia.EDU (Uvaarpa Mail System) Reply-To: Marc Horowitz Organization: The Internet Lines: 43 |> I would like to talk with and and all regular expression gurus. It |> seems that the regular expressions that are supported by perl (and |> unix) do not accept any and all regular languages. If this is not |> true, I would love to learn how to do some of the things that I am |> trying to do. According to "Elements of the Theory of Computation," by Lewis and Papadimitriou (the textbook for 6.045 at MIT), regular languages can be completely described by the following things: 1) the empty string 2) any character in the alphabet 3) the concatenation of two regular expressions 4) the union of two regular expressions 5) the Kleene star of a regular expression (the * in x* is a Kleene star. There, now you probably learned something :-) Regular expressions are always defined over a finite alphabet. Under unix, this alphabet is \000-\377. perl regexps (and all other unix regexps I know) are all just ways of stating the above rules for the convenience of the programmer. Constant strings are concatenations of characters. Specials like \s and \w are unions of charaters. [abc0-9] is a union of 13 characters. (re)? is the union of the empty string and re. (re){m,n} can be written as (re) concatenated m times, concatenated with (re)? concatenated (n-m) times. Every other regexp expression in unix can similarly be written as some combination of the five operations listed above. L&D also points out that regular expressions are closed under intersection and complementation. So, your example of a regular expression matching all strings not containing "bar" is possible, but it could be really hairy. So, the regexp you proposed would be the concatenation of not bar, foo, and not bar. This will be even hairier. Tom Wicklund has posted such a regular expression. It is easy to show that a regular expression exists; it is often not easy to actually construct that regular expression. In short, "if (/foo/ && !/bar/)" is the easiest (and fastest, and clearest) way to say what you want to say. Marc