Newsgroups: comp.compression Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!ox.com!ox.com!emv From: emv@ox.com (Ed Vielmetti) Subject: Re: word frequency in English In-Reply-To: mjward@adobe.COM's message of 9 Apr 91 22:53:12 GMT Message-ID: Sender: usenet@ox.com (Usenet News Administrator) Organization: OTA Limited Partnership, Ann Arbor MI. References: <13834@adobe.UUCP> Date: Fri, 12 Apr 1991 22:27:47 GMT In article <13834@adobe.UUCP> mjward@adobe.COM (Michael J. Ward) writes: Where can I find a list/dictionary/datafile of English words sorted by relative frequency in various classes of usage. For example, is "a" the most commonly used English word. followed by "the"? How about "that" compared to "sesquipedalianism"? Who's doing binary lookup tables based on word frequency? --Mike Ward The frequency of English words depends a lot on the body of text that you're looking at. As a first pass, it's relatively easy to scan though a representative usenet newsgroup and count word frequencies with something like "wordcount", a perl program on p.39 of the perl book (or on uunet.uu.net:/nutshell/perl/). You've just thrown off the count for "sesquipedalianism", though ... -- Msen Edward Vielmetti /|--- moderator, comp.archives emv@msen.com "With all of the attention and publicity focused on gigabit networks, not much notice has been given to small and largely unfunded research efforts which are studying innovative approaches for dealing with technical issues within the constraints of economic science." RFC 1216