Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!cs.utexas.edu!swrinde!emory!audfax!arnold From: arnold@audiofax.com (Arnold Robbins) Newsgroups: comp.unix.questions Subject: Re: Proofreading documents with awk Message-ID: <173@audfax.audiofax.com> Date: 20 Dec 89 20:23:19 GMT References: <25@meme.stanford.edu> <6612@jpl-devvax.JPL.NASA.GOV> Reply-To: arnold@audfax.audiofax.com (Arnold Robbins) Organization: AudioFAX Inc., Atlanta Lines: 38 :In article <25@meme.stanford.edu> heit@psych.Stanford.EDU (Evan Heit) writes: :: I am looking for someone who has written a program in awk that will :: will allow me to proofread my papers by by looking for word repetitions. In article <6612@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes: >How about filtering through > > tr -cs "A-Za-z" "\012" | uniq -d > >(Sys V'ers will have to make that [A-Z][a-z]). > >I sincerely doubt that any awk (or perl) solution will do as well. Well, yes and no. The following should work in GNU Awk and possibly the V.4 nawk. It is untested though. Its advantage is that it provides line number and file name information. #! /path/to/gawk -f { gsub(/[^A-Za-z0-9 \t]/, ""); # delete non-alphanumerics $0 = tolower($0) # go to all one case if ($1 == last) printf "Duplicate '%s' line %d, file %s\n", last, FNR, FILENAME for (i = 2; i <= NF; i++) if ($(i-1) == $i) printf "Duplicate '%s' line %d, file %s\n", $i, FNR, FILENAME last = $NF } As Jeff Lee points out, this IS slower than the tr | uniq solution. -- Arnold Robbins -- Senior Research Scientist - AudioFAX | Laundry increases 2000 Powers Ferry Road, #220 / Marietta, GA. 30067 | exponentially in the INTERNET: arnold@audiofax.com Phone: +1 404 933 7600 | number of children. UUCP: emory!audfax!arnold Fax: +1 404 933 7606 | -- Miriam Hartholz