Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ll-xn!ames!oliveb!sun!gorodish!guy From: guy%gorodish@Sun.COM (Guy Harris) Newsgroups: comp.unix.questions,comp.text Subject: Re: Problem with spell Message-ID: <14626@sun.uucp> Date: Fri, 6-Mar-87 16:43:53 EST Article-I.D.: sun.14626 Posted: Fri Mar 6 16:43:53 1987 Date-Received: Sun, 8-Mar-87 12:08:54 EST References: <482@bcsaic.UUCP> <338@hscfvax.UUCP> Sender: news@sun.uucp Lines: 43 Xref: mnetor comp.unix.questions:1348 comp.text:518 > There is really no solution, short of a total (and perhaps needed) rewrite > of spell, a program that originated in the dark ages on a PDP without > separate I & D space. It's worth noting that other approaches to spelling checkers have been successful, even on small machines. A company called Proximity Technology (formerly Proximity Devices; I presume they're still around) build a spelling checker that just looked words up in a dictionary. The first version made one pass over the document to gather a sorted list of words with duplicates eliminated. The second pass went through that list and eliminated words not found in the dictionary; the dictionary was compressed using several techniques (common prefix elimination, Huffman-coding, etc.) so an ~80,000 word list took only (if I remember correctly) about 170KB. The third pass went through the document and printed the lines containing words in the remaining list of misspelled words. The bulk of the time was spent in the decompression; the word list was indexed by the first three characters of the word, so lookup was reasonably quick. At CCI we incorporated this code into an interactive spelling checker that was part of our word processor; you could tell it to check from the current cursor position to the end of the document, or mark a block of text and tell it to check that block, and the third pass would stop at each potentially-misspelled word and offer you the chance to correct it. The word list could also be purchased with hyphenation points included; our word processor could be told to insert discretionary hyphens at these points. Other people, including vendors of word processors based on micros (definitely 16-bit micros, perhaps 8-bit ones), also included this code in their products. Proximity then rewrote it completely to use one pass; it just took every word in the document and looked it up in the word list. They did this in order to make it work better when used interactively; the old version took a while to complete passes 1 and 2, so you had to sit there for a while before it started showing potential misspellings. They also used different compressing techniques that weren't so CPU-intensive; when incorporated into CCI's word processor, the new one started showing the errors fairly quickly. I believe this version traded space for time, by entering a match/nomatch indication into an in-memory cache, so I don't know whether it would have worked well on a small-address-space or small-physical-memory machine.