Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!atha!aunro!ukma!rutgers!mcnc!ecsgate!burgin From: burgin@ecsvax.uncecs.edu (Robert Burgin) Newsgroups: comp.lang.c Subject: What Data Structure? Keywords: Hash Table Trie Dictionary Message-ID: <1991Jun24.110300.3740@ecsvax.uncecs.edu> Date: 24 Jun 91 11:03:00 GMT Organization: UNC Educational Computing Service Lines: 22 I am writing a program that includes a dictionary look-up of 24,500 English words. Each entry in the dictionary will contain the word itself and a string representing valid syntactic tags for that word (noun, verb, etc.). Would a hash table be best for the implementation? Or a trie? Why? The trie would seem to be ideal for short words but bad for longer words. Since the most frequent words in English are rather short, perhaps that should be a consideration in favor of the trie. On the other hand, tries are notoriously memory-intensive. I worry about collisions in the hash table and wonder what the ideal number of buckets would be for this case. Not to mention having to scare up a proper hash function for strings of variable length. Any thoughts would be appreciated. --rb