Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!seismo!ll-xn!ames!oliveb!jerry
From: jerry@oliveb.UUCP
Newsgroups: comp.unix.questions,comp.text
Subject: Re: Problem with spell
Message-ID: <680@oliveb.UUCP>
Date: Fri, 20-Mar-87 13:55:47 EST
Article-I.D.: oliveb.680
Posted: Fri Mar 20 13:55:47 1987
Date-Received: Sun, 22-Mar-87 21:22:53 EST
References: <482@bcsaic.UUCP> <338@hscfvax.UUCP> <14626@sun.uucp>
Reply-To: jerry@oliveb.UUCP (Jerry F Aguirre)
Organization: Olivetti ATC; Cupertino, Ca
Lines: 27
Xref: utgpu comp.unix.questions:1449 comp.text:560

In article <14626@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>It's worth noting that other approaches to spelling checkers have
>been successful, even on small machines.  A company called Proximity
>Technology (formerly Proximity Devices; I presume they're still
>around) build a spelling checker that just looked words up in a
>dictionary.  The first version made one pass over the document to
>gather a sorted list of words with duplicates eliminated.  The second pass
>went through that list and eliminated words not found in the
>dictionary; the dictionary was compressed using several techniques

It is not necessary to prescan the document and eliminate duplicates.  I
ran into the same problems with a spelling checker I wrote.  What made
the most improvement was to add a hashed LRU table to the lookup.

My program kept the last 512 "words" in memory.  Each string was stored
along with a flag indicating whether it was a word.  This had the
greatest performance improvement of any of my changes, including the
addition of an index based on the first few letters.

I did some analysis of "typical" documents and found that the hash was
>90% effective in eliminating duplicate lookups.  Given the elimination
of the first read of the file and the setup delay while it preprocesses,
this is definitely a better solution.

					Jerry Aguirre
					Systems Administration
					Olivetti ATC