Path: utzoo!attcan!uunet!husc6!psuvax1!rutgers!columbia!close.columbia.edu!jms From: jms@close.columbia.edu (Jonathan M. Smith) Newsgroups: comp.text Subject: Text Statistics, Letter Frequencies Message-ID: <5713@columbia.edu> Date: 17 Jun 88 12:53:31 GMT Sender: nobody@columbia.edu Reply-To: jms@close.columbia.edu (Jonathan M. Smith) Organization: Columbia University Department of Computer Science Lines: 18 Sorry I lost the article that this responds to. A fine source of letter frequencies is the paper: %A L. E. McMahon %A L. L. Cherry %A R. Morris %T Statistical Text Processing %J The Bell System Technical Journal, %D July-August 1978 %P 2137-2154 In addition, it provides methodology for gathering your own, as well as a short reference list that may help you. I have some [1-4]-gram statistics gathered from man pages (/usr/man) and C programs (/usr/src/cmd, Sys V R 2) that I'll provide if you're interested. -Jonathan