Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!news.cs.indiana.edu!att!att!ulysses!allegra!fox From: fox@allegra.att.com (David Fox) Newsgroups: comp.compression Subject: Re: word frequency in English Message-ID: Date: 15 Apr 91 17:44:17 GMT References: <13834@adobe.UUCP> Sender: netnews@ulysses.att.com Organization: AT&T Bell Laboratories Lines: 23 In-reply-to: mjward@adobe.COM's message of 9 Apr 91 22:53:12 GMT In article <13834@adobe.UUCP> mjward@adobe.COM (Michael J. Ward) writes: Where can I find a list/dictionary/datafile of English words sorted by relative frequency in various classes of usage. For example, is "a" the most commonly used English word. followed by "the"? How about "that" compared to "sesquipedalianism"? Who's doing binary lookup tables based on word frequency? --Mike Ward Just find a bunch of text and run it through this shell script: #!/bin/sh tr -c '-a-zA-Z' ' ' $* | \ tr 'A-Z' 'a-z' | \ tr -s ' ' '\012' | \ sort | \ uniq -c | \ sort -r Isn't unix wonderful? A very quick test has convinced me that "hotel" is the most common english word. It could be a problem with my sample data, though. -david