Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!uwm.edu!src.honeywell.com!msi.umn.edu!cs.umn.edu!uc!shamash!midway!ellis.uchicago.edu!goer
From: goer@ellis.uchicago.edu (Richard L. Goerwitz)
Newsgroups: comp.lang.misc
Subject: Re: Icon (was Re: Survey Results : Perl vs Icon vs ... ))
Message-ID: <1991Apr2.061448.8287@midway.uchicago.edu>
Date: 2 Apr 91 06:14:48 GMT
References: <BEVAN.91Mar29162211@panda.cs.man.ac.uk> <1991Apr1.043321.11251@midway.uchicago.edu> <BEVAN.91Apr1125048@panda.cs.man.ac.uk>
Sender: bcareful@midway.uchicago.edu
Distribution: comp
Organization: University of Chicago
Lines: 71

Stephen J Bevan writes:
>>> E.2.4 Add a regular expression data type.  Modify the functions find
>>>       and match to perate appropriately when their first argument is a
>>>       regular expression.
>>
>>I'd modify this to say, add findre() and matchre() to the list of
>>builtin functions.  Most C libraries have regexp routines that can be
>>drafted to serve in these capacities.
>
>Well after spending a day adding regular expressions to ELK, I
>wouldn't be so sure about the regexp facilies of C libraries.  For
>example the regexp library with SunOS 4.1 only has ed/grep style
>regular expression, not egrep ones.

The findre() prototype I wrote for Icon (slow) does egrep-style pat-
terns.  In fact, to test it I generally run it through Henry Spencer's
egrep test suite.

Clinton Jeffery wrote to me a day or two ago pointing out that one
big reason why Icon has no regular expressions is that no one can agree
on what they should look like and how they should be used.  I can only
speak from practical experience.  I need both a find() and a match()
function that can take regular expressions as their string arguments.
Of course, strictly speaking the find() one isn't necessary, since we
can just tab(&pos to *&subject+1), attempting a match at each position.

If we're talking egrep-stype expressions, then things like anyRE() and
uptoRE() would be wholly unnecessary.

I have several things I'm still wondering about:  If there were a
findRE() function, how would the ^ be interpreted when it referred to
the beginning of a line?  Would it refer to &subject[&pos] or to
&subject[1]?  I would see the former as more useful.  If someone
really wants to match a line beginning, then pos(1) is quite suf-
ficient.  One other thing I wonder about is how, if we have a findRE
function, would we tab past the substring matched by the regular ex-
pression?  It would certainly be possible to use findRE() and matchRE()
in concert (as in tab((findRE(pattern), matchRE(pattern)))).  This
seems quite inelegant, though.  I'd rather see a keyword such as
&endpoint used to retrieve the next position after the match.

>...[H]ow easy would it be to add extensions like, for
>example, dbm support?  (note. that was a rhetorical question)

I'm not sure what it is about the dbm routines that you need that
you can't get from Icon.  Just recently I posted a set of routines
called "gettext," which implement some dbm-type functionality,
though in a somewhat crude fashion.  I'll happily mail out the
latest version to anyone who wants to look it over.  Full-blown
dbm-type emulation without all the nutty restrictions (e.g. one
database open at a time) would not be all that hard to do in Icon
(using much less space than the dbm routines do).

I just wrote a small package called "retrieve," which offers the
basic tools not only for dbm-type key/value accesses, but for
regexp pattern matching and boolean search specifications with
ranges as well.  It really wasn't too hard to do, and I'll be
happy to pass these on to anyone who wants them.  I still
have a few things to do to the "retrieve" package, mostly adding
docs and demos (it's part of a larger project of mine).  If there
is a demand, I'll post it when I'm done.

Indexing and retrieval is really one of Icon's fortes.  Adding
native regular expression pattern matching facilities would only
put an extra layer of icing on the cake.

-Richard (goer@sophist.uchicago.edu)
-- 

   -Richard L. Goerwitz              goer%sophist@uchicago.bitnet
   goer@sophist.uchicago.edu         rutgers!oddjob!gide!sophist!goer