Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!mimsy!brillig!beth
From: beth@brillig (Beth Katz)
Newsgroups: comp.unix.questions
Subject: Re: Unix Dictionaries
Message-ID: <5285@mimsy.UUCP>
Date: Wed, 4-Feb-87 14:51:35 EST
Article-I.D.: mimsy.5285
Posted: Wed Feb  4 14:51:35 1987
Date-Received: Sat, 7-Feb-87 06:43:50 EST
References: <2828@brl-adm.ARPA> <1987Jan22.110150.29415@sq.uucp>
Sender: news@mimsy.UUCP
Reply-To: beth@brillig.UUCP (Beth Katz)
Organization: U of Maryland, Dept. of Computer Science,
Lines: 17

I am not a Unix expert, but I have looked at 'spell' and how it
accepts garbage.  I haven't read the papers mentioned previously.

One reason why 'spell' accepts so much garbage is that it uses
a hashed list of acceptable words.  On many systems I have seen,
this list is 50000 bytes.  Given all the garbage that can be
generated by random combinations of letters, you run out of space
in that table very quickly.  'spell' was designed to catch misspelled
words rather than filtering out absolute garbage.  The stop lists
catch words that could be created through transformations but that
are misspelled nonetheless.

You can do some extra transformations to clean up the lists if
you've fed 'spell' real garbage, but for most situations, it 
doesn't matter all that much.

				Beth Katz