Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!mcgill-vision!snorkelwacker!usc!zaphod.mps.ohio-state.edu!wuarchive!decwrl!elroy.jpl.nasa.gov!lll-winken!cert!netnews.upenn.edu!eniac.seas.upenn.edu!jeffe From: jeffe@eniac.seas.upenn.edu (George Jefferson ) Newsgroups: comp.sys.mac.apps Subject: Re: How do so many words fit in a dictionary/thesaurus? Message-ID: <29477@netnews.upenn.edu> Date: 12 Sep 90 21:50:30 GMT References: <3374@stl.stc.co.uk> Sender: news@netnews.upenn.edu Reply-To: jeffe@eniac.seas.upenn.edu (George Jefferson ) Organization: University of Pennsylvania Lines: 15 In article <3374@stl.stc.co.uk> pac@stl.stc.co.uk () writes: >Can anyone give me a reference to the compression techniques used by the >dictionary/thesaurus applications and DAs. Taking Microlytics claim about >Word Finder (220,000 words) it works out at about 1.6 bytes per word. I suspect that the '220,000' words counts plurals, and other word variants as seperate 'words'. A tremendous 'compression' is achieved by recognising the (reasonably) simple rules for adding aprporiate suffixes and prefixes. Eace base word need only cary around a couple of bytes to indicate which modifiers are allowed. -- just a hunch george