Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/17/84; site mhuxd.UUCP Path: utzoo!decvax!bellcore!petrus!sabre!zeta!epsilon!gamma!ulysses!mhuxr!mhuxd!wolit From: wolit@mhuxd.UUCP (Jan Wolitzky) Newsgroups: net.unix Subject: Re: Unique Word Counter Needed Message-ID: <3699@mhuxd.UUCP> Date: Tue, 10-Dec-85 22:40:17 EST Article-I.D.: mhuxd.3699 Posted: Tue Dec 10 22:40:17 1985 Date-Received: Thu, 12-Dec-85 00:34:42 EST References: <232@ihlpf.UUCP> Distribution: na Organization: AT&T Bell Laboratories, Murray Hill Lines: 19 > I need a way to count unique words in a document. > Does any one have suggestions on a simple way to do this? Try: deroff -w filename | dd conv=lcase 2>/dev/null | sort -u | wc -l "deroff -w" breaks the file up into single words, one per line. "dd" converts everything to lower case (so "word" and "Word" count as the same thing). ("dd" is verbose, so I redirect stderr.) "sort -u" keeps just one copy of each line. "wc -l" counts the lines. If you're going to run this frequently, stick it in a file, make it executable, replace "filename" with "$*" so you can pass it file names as arguments, and you're off. -- Jan Wolitzky, AT&T Bell Labs, Murray Hill, NJ; 201 582-2998; mhuxd!wolit (Affiliation given for identification purposes only)