Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cmcl2!phri!marob!cowan From: cowan@marob.MASA.COM (John Cowan) Newsgroups: comp.unix.questions Subject: Re: What does "spell" do wrong? Message-ID: <665@marob.MASA.COM> Date: 17 May 89 18:31:16 GMT References: <7084@saturn.ucsc.edu> <1007@kuling.UUCP> Reply-To: cowan@marob.masa.com (John Cowan) Organization: ESCC New York City Lines: 26 In article <1007@kuling.UUCP> irf@kuling.UUCP (Bo Thide') writes: >In article <7084@saturn.ucsc.edu> jaap@chromo.UUCP (Jacob Wilbrink) writes: >>I've been wondering what the program "spell" does, since it >>seems to make very many errors. Some examples of words it thinks >>are spelled correctly are >> >>utomsrr >>mgdesou >>aneorxx > >All these words are caught as misspelled by the HP-UX version of spell(1). > My version of 'spell' catches them also. However, in defense of the program, it is not designed to be 100% reliable. 'Spell' uses a hashing scheme. Each word is stripped of prefixes and suffixes, and the resulting base form is hashed and looked up in a bit table. If the bit is 0, the word is certainly misspelled; if the bit is 1, the word is assumed correct. There are 30,000 1-bits in a 10^27 bit table, so the probability of false positives is about 1/4000. According to Doug McIlroy, the author of 'spell', a typical document contains 20 misspelled words or less. Therefore, about 1% of documents contain a misspelled word that is not reported. Source: Jon Bentley, >Programming Pearls<, ISBN 0-201-10331-1.