Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!usc!elroy.jpl.nasa.gov!jpl-devvax!lwall
From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Newsgroups: comp.lang.perl
Subject: Re: User Definable Character Classes
Message-ID: <7420@jpl-devvax.JPL.NASA.GOV>
Date: 14 Mar 90 23:14:50 GMT
References: <15253@bfmny0.UU.NET>
Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Organization: Jet Propulsion Laboratory, Pasadena, CA
Lines: 28

In article <15253@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes:
: A feature I'd like to see added: a few user definable character classes.
: Call them \X, \Y, \Z.  These could be used in regexp's without the
: additional overhead or confusion of using $vars.
: 
: Example: I define \X to be [\w-$_] -- with whatever syntax.
: 
: Now I can have complex substitutions
: 
: 		s/<([.\X]+!)+(\X+\.[.\X]+!)/<$2/;
: 	
: with good performance.  It seems straightforward to modify regcomp.c
: to use \X thru \Z if defined.


I'd probably make that \x, \y and \z, with \X, \Y and \Z being the negations.

I'm looking carefully at ways to integrate better argument parsing with
regular expressions.  Right now it's not possible to, say, swap the
second and third arguments of a function call, at least not with complete
generality.  You need to have some means of tokenizing, and rejecting
commas and right parens that are inside parens or quotes or comments, or,
depending on the language, after backslashes or dollar signs.

I've got some ideas, but I'm open to suggestions.  I don't think mere
syntax tables ala emacs are good enough.

Larry