Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!apple!agate!ucbvax!pasteur!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.editors Subject: Re: vi and emacs Message-ID: <11036@dog.ee.lbl.gov> Date: 17 Mar 91 21:32:59 GMT References: <1991Mar06.150359.13516@chinet.chi.il.us> <1991Mar7.214419.17515@m.cs.uiuc.edu> <1991Mar07.232206.8438@convex.com> <1991Mar8.164415.14087@alchemy.chem.utoronto.ca> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 53 X-Local-Date: Sun, 17 Mar 91 13:32:59 PST In article <1991Mar8.164415.14087@alchemy.chem.utoronto.ca> mroussel@alchemy.chem.utoronto.ca (Marc Roussel) writes: >Anil's point was that many Unix utilities use different flavours of >regular expressions. I for one find that to be a nuisance too. Why >should grep use different syntax for its regular expressions than ex? This is fixed in 10th Edition Unix. It is just an accident of history. There *is* a reason for some difference, actually: Editors must have a way of marking matches for substitutions, while pure matchers (grep) do not need this. Things might be in much better shape if Ken Thompson had originally designed `ed' with a single special character that introduced all regular expression metasequences; then existing programs (and people!) that use the various RE matchers would already quote that character and it would not be such a problem to change them all to have some new feature. For instance, if `_' were the magic character, then: grep stdio.h *.c would do what you meant; you would have to ask for grep '#_ _*include_ _*<_._*>' to match `#, followed by any amount of space, followed by include, followed by any amount of space, followed by <, followed by any amount of anything except newline, followed by >'. Here `_ ' means `space or tab' and is just shorthand for _[ _]. `Remembering' for replacements could be done with _(foo_) and back-references with _1 through _9. You could also get rid of the positional requirements for - and ^ within character classes (I got rid of the one for ] already, above, by using _]) by using _- and _^: _[^a_-zA_-Z-_] would mean `class: caret, or a through z, or A through Z, or hyphen', which matches things like `word' or `hyphenated-word' and includes words spelled `r^ole'. To match everything but that, throw _^ in somewhere. (Now someone will argue for something other than `_'... :-) Backslash would be good, but the shell uses it; that is why I picked _ here.) (Actually, if I were designing an RE syntax, I think I would make Kleene `*' matching use a prefix, not a postfix; it means something like foo\* bar would match `foobar' and `foo bar'. I think of * as `zero or more', so this means "foo, zero or more o's, bar". \+X for 1 or more X, and \X for `between m and n inclusive' X's, would also be useful.) -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov