Xref: utzoo comp.unix.questions:22878 comp.lang.perl:1446
Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!rpi!uupsi!sunic!dkuug!freja.diku.dk!skinfaxe.diku.dk!thorinn
From: thorinn@skinfaxe.diku.dk (Lars Henrik Mathiesen)
Newsgroups: comp.unix.questions,comp.lang.perl
Subject: Re: Regular Expression tool
Message-ID: <1990Jun12.185041.19059@diku.dk>
Date: 12 Jun 90 18:50:41 GMT
References: <1990Jun8.174056.15313@icc.com> <8353@jpl-devvax.JPL.NASA.GOV>
Sender: news@diku.dk (The Netnews System)
Organization: Department Of Computer Science, University Of Copenhagen
Lines: 53

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>In article <1990Jun8.174056.15313@icc.com> wdm@icc.com (Bill Mulert) writes:
>: Consider the following statements containing regular expressions:
>: ...
>: Fortunately, we have cdecl to help create and decode the C declarations.
>: 
>: I wish there were something similar for regular expressions.

>It's not likely to be too practical, for a couple of reasons.

>...

>Second, your big problem is not so much the regular expressions themselves
>as it is all the quoting you have to put around them because of the paucity of
>quoting mechanisms.

What we really need is a shell script explainer. It would know Bourne
shell syntax; when you run a script through it, any shell
single-command which uses more than one level of quoting will be
explained in excruciating detail. It would also know enough about
expr, sed, egrep etc. to recognize regular expressions, and they would
be converted to a standard form (perl's, maybe). (Perl, of course, is
self-explanatory (and much too hard to parse)).
Example of possible output: 

echo "`expr \"$1\" : \"^[^=]*=\(.*\)\"`"
#is taken as: echo "@1"
#where @1 is: `@2`
#where @2 is: expr "$1" : "@3"
#where @3 is: ^[^=]*=(.*)		Literal: "^[^=]*=\\(.*\\)"

df_usr=`df | sed -n '/^\/usr[   ]/s/[^)]*):[    ]*\([^  ]*\).*/\1/p'`
#is taken as: df_usr=`@1`
#where @1 is: df | sed -n '@2'
#where @2 is: /@3/s/@4/@5/p
#where @3 is: ^/usr\s			Literal: "^\\/usr[ \t]"
#where @4 is: [^)]*\):\s*(\S*).*	Literal: "[^)]*):[ \t]*\\([^ \t]*\\).*"
#where @5 is: $1			Literal: "\\1"

The Literal: strings (which I have written as C strings) should be
present whenever an argument to a command contains tabs or control
characters, or when it is converted as a regular expression.

The thing doesn't really have to parse shell language: Just cut at
newline, ';', ';;', '|', '||', ... (when unescaped), repeatedly strip
'if', 'for', '{', ... from the beginning of strings, and the
single-commands are left. The ``parsers'' for the regexp commands just
have to find the regexps; they can probably be just as simple. It
could probably be implemented in perl fairly easily.

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcsun!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn@diku.dk