Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site psivax.UUCP
Path: utzoo!decvax!ittatc!dcdwest!sdcsvax!sdcrdcf!psivax!friesen
From: friesen@psivax.UUCP (Stanley Friesen)
Newsgroups: net.unix
Subject: Re: Unique Word Counter Needed
Message-ID: <894@psivax.UUCP>
Date: Thu, 12-Dec-85 13:47:10 EST
Article-I.D.: psivax.894
Posted: Thu Dec 12 13:47:10 1985
Date-Received: Fri, 13-Dec-85 20:54:36 EST
References: <232@ihlpf.UUCP> <3699@mhuxd.UUCP>
Reply-To: friesen@psivax.UUCP (Stanley Friesen)
Distribution: na
Organization: Pacesetter Systems Inc., Sylmar, CA
Lines: 25

In article <3699@mhuxd.UUCP> wolit@mhuxd.UUCP (Jan Wolitzky) writes:
>> I need a way to count unique words in a document.
>> Does any one have suggestions on a simple way to do this?
>
>Try:
>
>deroff -w filename | dd conv=lcase 2>/dev/null | sort -u | wc -l
>
	This looks quite inefficient, tr will do the case conversion
much more efficiently than dd, and it can also split the file into one
word lines. So try:

tr 'A-Z\011 ' 'a-z\012' < filename | sort -u | wc -l

or

deroff -w filename | tr 'A-Z' 'a-z' | sort -u | wc -l

depending on whether you wish to remove nroff macros or not.
-- 

				Sarima (Stanley Friesen)

UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen
ARPA: ttidca!psivax!friesen@rand-unix.arpa