Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!usenix!std-unix From: karl@IMA.ISC.COM (Karl Heuer) Newsgroups: comp.std.unix Subject: ambiguous match with multiple-character collating elements Keywords: international regexp/gmatch Message-ID: <487@usenix.ORG> Date: 5 Sep 90 22:20:45 GMT Sender: std-unix@usenix.ORG Reply-To: karl@IMA.ISC.COM (Karl Heuer) Organization: Interactive Systems, Cambridge, MA 02138-5302 Lines: 15 Approved: jsq@usenix.org (Moderator, John Quarterman) X-Submissions: std-unix@uunet.uu.net From: karl@IMA.ISC.COM (Karl Heuer) In an environment where the digraph "ch" collates as a single element, what happens if an attempt is made to match the subject string "chi" with the pattern "[c[.ch.]]i" or "[c[.ch.]]hi"? Is the implementation required to report a successful match in both cases? If so, it would seem necessary to use a nondeterministic finite automaton or equivalent, thus making simple regexp matching and filename globbing as complex as egrep pattern matching. If you have an answer that's based on something other than your own intuition, please specify which (draft) standard you're referencing. Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint Volume-Number: Volume 21, Number 82