Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!uwm.edu!src.honeywell.com!msi.umn.edu!cs.umn.edu!uc!shamash!midway!ellis.uchicago.edu!goer From: goer@ellis.uchicago.edu (Richard L. Goerwitz) Newsgroups: comp.lang.misc Subject: Re: Icon (was Re: Survey Results : Perl vs Icon vs ... )) Message-ID: <1991Apr2.061448.8287@midway.uchicago.edu> Date: 2 Apr 91 06:14:48 GMT References: <1991Apr1.043321.11251@midway.uchicago.edu> Sender: bcareful@midway.uchicago.edu Distribution: comp Organization: University of Chicago Lines: 71 Stephen J Bevan writes: >>> E.2.4 Add a regular expression data type. Modify the functions find >>> and match to perate appropriately when their first argument is a >>> regular expression. >> >>I'd modify this to say, add findre() and matchre() to the list of >>builtin functions. Most C libraries have regexp routines that can be >>drafted to serve in these capacities. > >Well after spending a day adding regular expressions to ELK, I >wouldn't be so sure about the regexp facilies of C libraries. For >example the regexp library with SunOS 4.1 only has ed/grep style >regular expression, not egrep ones. The findre() prototype I wrote for Icon (slow) does egrep-style pat- terns. In fact, to test it I generally run it through Henry Spencer's egrep test suite. Clinton Jeffery wrote to me a day or two ago pointing out that one big reason why Icon has no regular expressions is that no one can agree on what they should look like and how they should be used. I can only speak from practical experience. I need both a find() and a match() function that can take regular expressions as their string arguments. Of course, strictly speaking the find() one isn't necessary, since we can just tab(&pos to *&subject+1), attempting a match at each position. If we're talking egrep-stype expressions, then things like anyRE() and uptoRE() would be wholly unnecessary. I have several things I'm still wondering about: If there were a findRE() function, how would the ^ be interpreted when it referred to the beginning of a line? Would it refer to &subject[&pos] or to &subject[1]? I would see the former as more useful. If someone really wants to match a line beginning, then pos(1) is quite suf- ficient. One other thing I wonder about is how, if we have a findRE function, would we tab past the substring matched by the regular ex- pression? It would certainly be possible to use findRE() and matchRE() in concert (as in tab((findRE(pattern), matchRE(pattern)))). This seems quite inelegant, though. I'd rather see a keyword such as &endpoint used to retrieve the next position after the match. >...[H]ow easy would it be to add extensions like, for >example, dbm support? (note. that was a rhetorical question) I'm not sure what it is about the dbm routines that you need that you can't get from Icon. Just recently I posted a set of routines called "gettext," which implement some dbm-type functionality, though in a somewhat crude fashion. I'll happily mail out the latest version to anyone who wants to look it over. Full-blown dbm-type emulation without all the nutty restrictions (e.g. one database open at a time) would not be all that hard to do in Icon (using much less space than the dbm routines do). I just wrote a small package called "retrieve," which offers the basic tools not only for dbm-type key/value accesses, but for regexp pattern matching and boolean search specifications with ranges as well. It really wasn't too hard to do, and I'll be happy to pass these on to anyone who wants them. I still have a few things to do to the "retrieve" package, mostly adding docs and demos (it's part of a larger project of mine). If there is a demand, I'll post it when I'm done. Indexing and retrieval is really one of Icon's fortes. Adding native regular expression pattern matching facilities would only put an extra layer of icing on the cake. -Richard (goer@sophist.uchicago.edu) -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer