Path: utzoo!attcan!uunet!salt.acc.com!ucsd!ogicse!hakanson From: hakanson@ogicse.ogi.edu (Marion Hakanson) Newsgroups: comp.lang.perl Subject: Re: User Definable Character Classes Message-ID: <8084@ogicse.ogi.edu> Date: 17 Mar 90 18:58:05 GMT References: <15253@bfmny0.UU.NET> <7420@jpl-devvax.JPL.NASA.GOV> <260176EA.C58@tct.uucp> Organization: Oregon Graduate Institute (formerly OGC), Beaverton, OR Lines: 52 In article <260176EA.C58@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes: >What we could use here is a full-blown lexical analyis engine >integrated into the Perl language. > >For example, the RCS file format is an ASCII stream with "@" >delimeters, where "@@" means a literal "@". I've often wondered how >to write Perl to interpret such a file without calling getc thousands >of times. Take it from someone who's been down this road. You DON'T want to call getc lots of times. I converted my perl "dnslex" to C, and the C version ran probably 300 times faster. But putting the lex-er in a separate program works quite well, esp. with Perl's nice way of opening pipes. However, the @/@@ problem isn't so tough, as long as you don't have to worry about backslash-escapes (where the backslash could be escaped). Even that can be done in absence of other quoting mechanisms, but it is not pretty (see below). Here's a routine I wrote to split on a comma (instead of an @), with it doubled for a literal comma. Maybe it will give an idea.... # The arg may be of the form 'part1,part2', where ',' is # the first un-doubled comma (later commas are not processed). sub commasplit { local ($_) = @_; local ($first,$secnd); $first = ''; $secnd = ''; commasplit: while ( /,/ ) { $first .= $`; # before the comma $_ = $'; # and after it if ( s/^,// ) { # turn double into a single & continue $first .= ','; } else { # make the split $secnd = $_; $_ = ''; # remainder goes above last commasplit; } } $first .= $_; # in case no single comma was found ($first,$secnd); } -- Marion Hakanson Domain: hakanson@cse.ogi.edu UUCP : {hp-pcd,tektronix}!ogicse!hakanson