Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!mimsy!brillig!beth From: beth@brillig (Beth Katz) Newsgroups: comp.unix.questions Subject: Re: Unix Dictionaries Message-ID: <5285@mimsy.UUCP> Date: Wed, 4-Feb-87 14:51:35 EST Article-I.D.: mimsy.5285 Posted: Wed Feb 4 14:51:35 1987 Date-Received: Sat, 7-Feb-87 06:43:50 EST References: <2828@brl-adm.ARPA> <1987Jan22.110150.29415@sq.uucp> Sender: news@mimsy.UUCP Reply-To: beth@brillig.UUCP (Beth Katz) Organization: U of Maryland, Dept. of Computer Science, Lines: 17 I am not a Unix expert, but I have looked at 'spell' and how it accepts garbage. I haven't read the papers mentioned previously. One reason why 'spell' accepts so much garbage is that it uses a hashed list of acceptable words. On many systems I have seen, this list is 50000 bytes. Given all the garbage that can be generated by random combinations of letters, you run out of space in that table very quickly. 'spell' was designed to catch misspelled words rather than filtering out absolute garbage. The stop lists catch words that could be created through transformations but that are misspelled nonetheless. You can do some extra transformations to clean up the lists if you've fed 'spell' real garbage, but for most situations, it doesn't matter all that much. Beth Katz