Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!seismo!ll-xn!ames!oliveb!jerry From: jerry@oliveb.UUCP Newsgroups: comp.unix.questions,comp.text Subject: Re: Problem with spell Message-ID: <680@oliveb.UUCP> Date: Fri, 20-Mar-87 13:55:47 EST Article-I.D.: oliveb.680 Posted: Fri Mar 20 13:55:47 1987 Date-Received: Sun, 22-Mar-87 21:22:53 EST References: <482@bcsaic.UUCP> <338@hscfvax.UUCP> <14626@sun.uucp> Reply-To: jerry@oliveb.UUCP (Jerry F Aguirre) Organization: Olivetti ATC; Cupertino, Ca Lines: 27 Xref: utgpu comp.unix.questions:1449 comp.text:560 In article <14626@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes: >It's worth noting that other approaches to spelling checkers have >been successful, even on small machines. A company called Proximity >Technology (formerly Proximity Devices; I presume they're still >around) build a spelling checker that just looked words up in a >dictionary. The first version made one pass over the document to >gather a sorted list of words with duplicates eliminated. The second pass >went through that list and eliminated words not found in the >dictionary; the dictionary was compressed using several techniques It is not necessary to prescan the document and eliminate duplicates. I ran into the same problems with a spelling checker I wrote. What made the most improvement was to add a hashed LRU table to the lookup. My program kept the last 512 "words" in memory. Each string was stored along with a flag indicating whether it was a word. This had the greatest performance improvement of any of my changes, including the addition of an index based on the first few letters. I did some analysis of "typical" documents and found that the hash was >90% effective in eliminating duplicate lookups. Given the elimination of the first read of the file and the setup delay while it preprocesses, this is definitely a better solution. Jerry Aguirre Systems Administration Olivetti ATC