Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!iuvax!cica!gatech!hubcap!hutch%citron.cs.clemson.edu From: hutch%citron.cs.clemson.edu@hubcap.clemson.edu (David Hutchens) Newsgroups: comp.lang.c Subject: Re: Storing substrings: some real C code in comp.lang.c! Message-ID: <6871@hubcap.clemson.edu> Date: 25 Oct 89 15:55:49 GMT References: <1989Oct25.091519.10718@twwells.com> Sender: news@hubcap.clemson.edu Reply-To: hutch%citron.cs.clemson.edu@hubcap.clemson.edu Lines: 29 From article <1989Oct25.091519.10718@twwells.com>, by bill@twwells.com (T. William Wells): > I have been beating my head against the wall, looking for speed, > *speed*, and MORE SPEED in this program I'm working on. At only > 800 WPM (on my 16MHz '386) it is not exactly a speed demon. > Especially since it has to be run on large lists of words, none of > which are smaller than 80,000 words. > > Bill { uunet | novavax | ankh | sunvice } !twwells!bill > bill@twwells.com First, forget the tree, and the hash. What you need is a Finite State Automaton. The basic idea for this can be found in the May 1988 issue of CACM in an article on an implementation of SCRABBLE(tm). I've implemented a version of this and I can build the FSA in under a minute (for about 50000 words) on a sun-3 and under 5 minutes on a 80286. If possible, you should build the FSA for the words in a separate processing step since it does take a bit of memory. You'll need about a Meg. The FSA, by its very nature allows you to get at prefix strings very easily (that is quite useful in SCRABBLE). David Hutchens hutch@hubcap.clemson.edu PS: Since you are doing something confidential, I assume it is for *business*. I'll be glad to discuss selling you the right to incorporate my program.