Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!usc!elroy.jpl.nasa.gov!jpl-devvax!lwall From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Newsgroups: comp.lang.perl Subject: Re: User Definable Character Classes Message-ID: <7420@jpl-devvax.JPL.NASA.GOV> Date: 14 Mar 90 23:14:50 GMT References: <15253@bfmny0.UU.NET> Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Organization: Jet Propulsion Laboratory, Pasadena, CA Lines: 28 In article <15253@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes: : A feature I'd like to see added: a few user definable character classes. : Call them \X, \Y, \Z. These could be used in regexp's without the : additional overhead or confusion of using $vars. : : Example: I define \X to be [\w-$_] -- with whatever syntax. : : Now I can have complex substitutions : : s/<([.\X]+!)+(\X+\.[.\X]+!)/<$2/; : : with good performance. It seems straightforward to modify regcomp.c : to use \X thru \Z if defined. I'd probably make that \x, \y and \z, with \X, \Y and \Z being the negations. I'm looking carefully at ways to integrate better argument parsing with regular expressions. Right now it's not possible to, say, swap the second and third arguments of a function call, at least not with complete generality. You need to have some means of tokenizing, and rejecting commas and right parens that are inside parens or quotes or comments, or, depending on the language, after backslashes or dollar signs. I've got some ideas, but I'm open to suggestions. I don't think mere syntax tables ala emacs are good enough. Larry