Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!nrl-cmf!ames!amdcad!sun!pitstop!sundc!seismo!uunet!mcvax!ukc!stl!stc!root44!gwc From: gwc@root.co.uk (Geoff Clare) Newsgroups: comp.unix.wizards Subject: Re: POSIX Regular Expression Funnyness Summary: It's worse Keywords: Regular Expression,POSIX Message-ID: <683@root44.co.uk> Date: 1 Feb 89 11:02:22 GMT References: <4118f7b1.ae48@apollo.COM> Reply-To: gwc@root.co.uk (Geoff Clare) Organization: UniSoft Ltd, London, England Lines: 44 In article <4118f7b1.ae48@apollo.COM> arnold@apollo.COM (Ken Arnold) writes: >The POSIX proposal [] has a rework of regular expressions. >(stuff deleted) > >They have added a new set of bracket expressions which stand for >pre-defined sets of characters. For example, "[:alpha:]" is all >alphabetic characters, "[.ch.]" is the character string ch treated as a >single character (which is useful for sorting in many languages), and >"[=a=]" refers to all variants of a, i.e., a, a with a circumflex, a >with an umlaut, etc. > >(stuff deleted)... these new bracket expressions only have their new >meaning inside outer brackets. > >Why? The only existing expressions you would break if you allowed "top >level" [::] expressions (or [..] or [==] expressions) would be >expressions which currently existed that contained *two* colons (or >dots or equals), on either side. Since this is currently pointless >redundancy, I can't believe this is a serious problem. There are more serious problems with the new expressions than just the obscure syntax. A short while ago I had to design some verification tests for these new regular expressions as part of the X/Open verification suite (the latest X/Open standard incorporates POSIX). I found some ambiguity in the area of 2 to 1 character mappings. For example, if ch collates between c and d, which of the following REs should match the string "xchy"? x[a-[.ch.]]y x[a-[.ch.]]hy The simple answer would be to create some rule about 2 to 1 character mappings to eliminate the ambiguity. However, whichever rule is decided, there will be many cases where the actual behaviour is non-intuitive, resulting in users not getting the results they expect. We have informed X/Open of the problem, and are waiting to see what they come up with. Geoff. -- Geoff Clare UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH gwc@root.co.uk ...!mcvax!ukc!root44!gwc +44-1-606-7799 FAX: +44-1-726-2750