Path: utzoo!attcan!uunet!lll-winken!ames!mailrus!ulowell!apollo!arnold From: arnold@apollo.COM (Ken Arnold) Newsgroups: comp.unix.wizards Subject: POSIX Regular Expression Funnyness Keywords: Regular Expression,POSIX Message-ID: <4118f7b1.ae48@apollo.COM> Date: 26 Jan 89 16:20:00 GMT Reply-To: arnold@apollo.COM (Ken Arnold) Organization: Apollo Computer, Chelmsford, MA Lines: 46 References: The POSIX proposal [] has a rework of regular expressions. In particular, the character set expresions (things like "[a-z]") have had a few new things added, but they way they have been added seems passing strange. I was wondering if I was alone in thinking the following suboptimal: The have added a new set of bracket expressions which stand for pre-defined sets of characters. For example, "[:alpha:]" is all alphabetic characters, "[.ch.]" is the character string ch treated as a single character (which is useful for sorting in many languages), and "[=a=]" refers to all variants of a, i.e., a, a with a circumflex, a with an umlaut, etc. Well, this sounds fine and dandy. Being able to express C variables as "[[:alpha:]_][[:alnum:]_]*" is reasonably descriptive. Being able to say "I don't care if the 'o' has any diacritical marks" is also fine. The problem is that, for some reason, if you want to simply match any alphabetic character, you *cannot* say "[:alpha:]". Or, to be more precise, that expression means exactly what it does now. If you say grep "+[:alnum:]+" file ... you will print any line which has a "+" followed by one of :, a, l, n, u, or m, followed by another "+". If you want to match what it *looks* like that expression would match, you have to say. grep "+[[:alnum:]]+" file ... In other words, these new bracket expressions only have their new meaning inside outer brackets. Why? The only existing expressions you would break if you allowed "top level" [::] expressions (or [..] or [==] expressions) would be expressions which currently existed that contained *two* colons (or dots or equals), on either side. Since this is currently pointless redundancy, I can't believe this is a serious problem. What seems like a serious problem to me is that the required nesting makes the new expressions more difficult to use. Further, misuse of them in this kind of obvious way leads to silent misbehavior from which it is difficult to surmise the bug. Is it just me, or is this wrong? Ken