Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!cmcl2!kaplan From: kaplan@cmcl2.UUCP (Laurence S. Kaplan) Newsgroups: comp.text,comp.bugs.4bsd Subject: RE: bug in spell Message-ID: <13897@cmcl2.UUCP> Date: Wed, 18-Mar-87 14:36:17 EST Article-I.D.: cmcl2.13897 Posted: Wed Mar 18 14:36:17 1987 Date-Received: Fri, 20-Mar-87 01:18:06 EST Organization: New York University Lines: 53 Keywords: hlist stoplist aniversery Xref: mnetor comp.text:564 comp.bugs.4bsd:234 I finally got around to reading through the responses to my posting about the bug in spell. To refresh your memory I had the word "aniversery" accepted by spell with an hlist that I knew did not contain the word. The best (and only) description of what must have happened follows: ***** start of response ***** From: mcnc.org!ecsvax!dukeac!bet@seismo.UUCP Perhaps (I'm not sure) you've found an instance of a potential failure mode that spell has always had. I had never heard of an example before. Here's the situation: spell(1) wants to have a reasonably huge number of possible words in its dictionary, wants to be able to run in a reasonably small amount of memory, and wants to be FAST. So, they set up a table in memory (50K bytes) and treat it as a bit array (400K bits). Then they compute, for each word, N independant hash functions in the range [0-400K]. For each word, they turn on the N bits at the locations identified by the N hash functions. N is chosen depending on the size of the hash table and the number of words, so that approximately 1/2 of the bits are turned on in the final table (which obviously maximizes the information content of the table). This comes to 11 in the standard UNIX spell, if I recall correctly. This means that the odds of any random string being in the dictionary are 1 in 2**11 == 1/2048 -- right small. Seems like you might have found such an example, however. If I recall the implementation of the full spell(1) command, it is a shell script that calls the spelling check program twice, first with a stop file which is a hash table of words to insist are misspelled, regardless of whether they are in the dictionary, then runs the remaining words through the regular dictionary. The stop list mechanism was put in place, as I recall, as a method of trapping words that would be (mistakenly) accepted due to misbehavior in the prefix/suffix stripper (the example I recall is "thier == thy - y + ier"); however, it seems to me you should be able to add your example word to the hstop word list and rebuild the hstop table, and fix the problem that way. -Bennett -- Bennett Todd, Duke User Services, Durham, NC 27706-7756; +1 919 684 3695 UUCP: ...{philabs,akgua,decvax,ihnp4}!mcnc!ecsvax!dukeac!bet BITNET: DBTODD@TUCC ***** end of response ***** While I did not get around to trying this fix, it does sound good. Good luck to everyone not having this happen to them with other mispelled words! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Laurence S. Kaplan | NYU Ultracomputer Research Project ||| 715 Broadway Rm. 1005 ||||| New York, NY 10003 ||||| (212) 460-7327 --- //\ --- arpa: kaplan@cmcl2 ----/ \ --- uucp: {ihnp4,seismo}!cmcl2!kaplan ---- ----