Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site psivax.UUCP Path: utzoo!decvax!ittatc!dcdwest!sdcsvax!sdcrdcf!psivax!friesen From: friesen@psivax.UUCP (Stanley Friesen) Newsgroups: net.unix Subject: Re: Unique Word Counter Needed Message-ID: <894@psivax.UUCP> Date: Thu, 12-Dec-85 13:47:10 EST Article-I.D.: psivax.894 Posted: Thu Dec 12 13:47:10 1985 Date-Received: Fri, 13-Dec-85 20:54:36 EST References: <232@ihlpf.UUCP> <3699@mhuxd.UUCP> Reply-To: friesen@psivax.UUCP (Stanley Friesen) Distribution: na Organization: Pacesetter Systems Inc., Sylmar, CA Lines: 25 In article <3699@mhuxd.UUCP> wolit@mhuxd.UUCP (Jan Wolitzky) writes: >> I need a way to count unique words in a document. >> Does any one have suggestions on a simple way to do this? > >Try: > >deroff -w filename | dd conv=lcase 2>/dev/null | sort -u | wc -l > This looks quite inefficient, tr will do the case conversion much more efficiently than dd, and it can also split the file into one word lines. So try: tr 'A-Z\011 ' 'a-z\012' < filename | sort -u | wc -l or deroff -w filename | tr 'A-Z' 'a-z' | sort -u | wc -l depending on whether you wish to remove nroff macros or not. -- Sarima (Stanley Friesen) UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen ARPA: ttidca!psivax!friesen@rand-unix.arpa