Newsgroups: comp.archives Path: utzoo!utgpu!news-server.csri.toronto.edu!ox.com!msen.com!emv From: udi@cs.arizona.edu (Udi Manber) Subject: [unix-programmer...] agrep - a new tool for text searching with errors Message-ID: <1991Jun17.082742.12859@ox.com> Followup-To: poster Keywords: approximate string matching, regular expressions Sender: emv@msen.com (Edward Vielmetti, MSEN) Reply-To: udi@cs.arizona.edu (Udi Manber) Organization: (none) X-Original-Date: 17 Jun 91 07:10:19 GMT Date: Mon, 17 Jun 1991 08:27:42 GMT Approved: emv@msen.com (Edward Vielmetti, MSEN) X-Original-Newsgroups: comp.unix.programmer,comp.text,comp.unix.wizards Lines: 71 Archive-name: text/grep/agrep/1991-06-17 Archive-directory: cs.arizona.edu:/agrep/ [192.12.69.5] Original-posting-by: udi@cs.arizona.edu (Udi Manber) Original-subject: agrep - a new tool for text searching with errors Reposted-by: emv@msen.com (Edward Vielmetti, MSEN) We are proud to announce the release of version 1.0 of agrep - a new tool for fast text searching with errors. agrep is similar to egrep (or grep or fgrep), but it is much more general. It is based on an entirely different algorithm. The three most significant features of agrep that are not supported by the grep family are 1) the ability to search for approximate patterns; for example, "agrep -2 homogenos foo" will find homogeneous as well as any other word that can be obtained from homogenos with at most 2 substitutions, insertions, or deletions. 2) agrep is record oriented rather than just line oriented; a record is by default a line, but it can be user defined; for example, "agrep -d '^From ' 'pizza' mbox" outputs all mail messages that contain the keyword "pizza". Another example: "agrep -d '$$' pattern foo" will output all paragraphs (separated by an empty line) that contain pattern. 3) multiple patterns with AND (or OR) logic queries. For example, "agrep -d '^From ' 'burger,pizza' mbox" outputs all mail messages containing at least one of the two keywords (, stands for OR). "agrep -d '^From ' 'good;pizza' mbox" outputs all mail messages containing both keywords. Putting these options together one can ask queries like agrep -d '$$' -2 ';TheAuthor;Curriculum;<198[5-9]>' bib which outputs all paragraphs referencing articles in CACM between 1985 and 1989 by TheAuthor dealing with curriculum. Two errors are allowed (e.g., one in TheAuthor and one in Curriculum, or two in one of them), but they cannot be in either CACM or the year (the <> brackets forbid errors in the pattern between them). Other features include searching for regular expressions (with or without errors), unlimited wild cards, limiting the errors to only insertions or only substitutions or any combination, allowing each deletion, for example, to be counted as, say, 2 substitutions or 3 insertions, restricting parts of the query to be exact and parts to be approximate, and many more. agrep is available by anonymous ftp from cs.arizona.edu (IP 192.12.69.5) as agrep/agrep.tar.Z (or in uncompressed form as agrep/agrep.tar). The tar file contains the source code (in C), man pages (agrep.1), and a postscript file (agrep.ps) of a technical report (TR #91-11) describing the design and implementation of agrep. This is the first version of agrep. There may be some bugs, especially with complicated patterns and a combination of options. Please mail bug reports (or any other comments) to sw@cs.arizona.edu or to udi@cs.arizona.edu. We would appreciate if users notify us (at the address above) of any extensions, improvements, or interesting uses of this software. June 16, 1991. -- comp.archives file verification cs.arizona.edu total 442 -rw-r--r-- 1 23 125467 Jun 11 17:34 agrep.tar.Z -rw-r--r-- 1 23 303104 Jun 11 17:33 agrep.tar -rw-r--r-- 1 23 1568 Jun 11 17:32 README found agrep ok cs.arizona.edu:/agrep/