Xref: utzoo comp.unix.questions:22878 comp.lang.perl:1446 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!rpi!uupsi!sunic!dkuug!freja.diku.dk!skinfaxe.diku.dk!thorinn From: thorinn@skinfaxe.diku.dk (Lars Henrik Mathiesen) Newsgroups: comp.unix.questions,comp.lang.perl Subject: Re: Regular Expression tool Message-ID: <1990Jun12.185041.19059@diku.dk> Date: 12 Jun 90 18:50:41 GMT References: <1990Jun8.174056.15313@icc.com> <8353@jpl-devvax.JPL.NASA.GOV> Sender: news@diku.dk (The Netnews System) Organization: Department Of Computer Science, University Of Copenhagen Lines: 53 lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes: >In article <1990Jun8.174056.15313@icc.com> wdm@icc.com (Bill Mulert) writes: >: Consider the following statements containing regular expressions: >: ... >: Fortunately, we have cdecl to help create and decode the C declarations. >: >: I wish there were something similar for regular expressions. >It's not likely to be too practical, for a couple of reasons. >... >Second, your big problem is not so much the regular expressions themselves >as it is all the quoting you have to put around them because of the paucity of >quoting mechanisms. What we really need is a shell script explainer. It would know Bourne shell syntax; when you run a script through it, any shell single-command which uses more than one level of quoting will be explained in excruciating detail. It would also know enough about expr, sed, egrep etc. to recognize regular expressions, and they would be converted to a standard form (perl's, maybe). (Perl, of course, is self-explanatory (and much too hard to parse)). Example of possible output: echo "`expr \"$1\" : \"^[^=]*=\(.*\)\"`" #is taken as: echo "@1" #where @1 is: `@2` #where @2 is: expr "$1" : "@3" #where @3 is: ^[^=]*=(.*) Literal: "^[^=]*=\\(.*\\)" df_usr=`df | sed -n '/^\/usr[ ]/s/[^)]*):[ ]*\([^ ]*\).*/\1/p'` #is taken as: df_usr=`@1` #where @1 is: df | sed -n '@2' #where @2 is: /@3/s/@4/@5/p #where @3 is: ^/usr\s Literal: "^\\/usr[ \t]" #where @4 is: [^)]*\):\s*(\S*).* Literal: "[^)]*):[ \t]*\\([^ \t]*\\).*" #where @5 is: $1 Literal: "\\1" The Literal: strings (which I have written as C strings) should be present whenever an argument to a command contains tabs or control characters, or when it is converted as a regular expression. The thing doesn't really have to parse shell language: Just cut at newline, ';', ';;', '|', '||', ... (when unescaped), repeatedly strip 'if', 'for', '{', ... from the beginning of strings, and the single-commands are left. The ``parsers'' for the regexp commands just have to find the regexps; they can probably be just as simple. It could probably be implemented in perl fairly easily. -- Lars Mathiesen, DIKU, U of Copenhagen, Denmark [uunet!]mcsun!diku!thorinn Institute of Datalogy -- we're scientists, not engineers. thorinn@diku.dk