Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!elroy.jpl.nasa.gov!decwrl!frnkmth!bill From: bill@franklin.com (bill) Newsgroups: comp.compression Subject: Re: What is a trie ?? (was Re: fast string compr.?) Message-ID: <27May91.045904.2387@franklin.com> Date: 27 May 91 04:59:04 GMT References: <1991May23.235530.11443@looking.on.ca> <24May91.206106.345@franklin.com> Distribution: comp Organization: Franklin Electronic Publishers Lines: 42 In article d88-jwa@byse.nada.kth.se (Jon W{tte) writes: : In article <24May91.206106.345@franklin.com> bill@franklin.com (bill) writes: : : From: bill@franklin.com (bill) : : BTW, the *correct* answer to the question of the "normal" way of : storing the lexicon is this: store the lexicon as a trie. This is : : What is a trie ? I've seen it mentioned over and again in this : group, but not being native english I haven't heard the word : elsewhere. I'm probably familiar with the data structure when : you explain it to me, but until then, I remain in the dark. I can't say what the "official" definition of trie is; Knuth, I think, defines it somewhere, but the definition I use is this: Think of an N-ary tree in which the arcs of the tree are labeled with letters. A node of the tree may be labeled as "terminal". A trie contains a word if there is a path which is marked with the letters of the word from the root to a node marked terminal. My copy of _Data Structures and Algorithms_ (Aho, Hopcroft, Ullman) has a superficial section on them, which uses a definition not too dissimilar to mine. (BTW, I despise this neologism but we're stuck with it. A trie is a tree!) There are many ways of representing tries. One is as a conventional tree. Another is "hash tries", which you can see examples of in the TeX package (in the hyphenation pattern generation code), and in one of the ACM Programming Pearls columns, the one where Bently has Knuth showing off Web. One favored trick with representing tries (and trees) is to merge identical subtrees. One finds identical subtrees of the tree and constructs a DAG (Directed Acyclic Graph) by making arcs to those subtrees point instead to a single instance of the subtree. With a little cleverness, it is possible to do this kind of compression very quickly and in a small amount of memory over and above that which is needed to store the trie itself. Doing this results in dramatic space savings in the trie. I believe that Graham Toal's stuff contains code for this.