Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!atha!aunro!ukma!rutgers!mcnc!ecsgate!burgin
From: burgin@ecsvax.uncecs.edu (Robert Burgin)
Newsgroups: comp.lang.c
Subject: What Data Structure?
Keywords: Hash Table Trie Dictionary
Message-ID: <1991Jun24.110300.3740@ecsvax.uncecs.edu>
Date: 24 Jun 91 11:03:00 GMT
Organization: UNC Educational Computing Service
Lines: 22

I am writing a program that includes a dictionary look-up
of 24,500 English words.  Each entry in the dictionary will
contain the word itself and a string representing valid
syntactic tags for that word (noun, verb, etc.).

Would a hash table be best for the implementation?  Or
a trie?  Why?

The trie would seem to be ideal for short words but bad
for longer words.  Since the most frequent words in English
are rather short, perhaps that should be a consideration
in favor of the trie.  On the other hand, tries are
notoriously memory-intensive.

I worry about collisions in the hash table and wonder
what the ideal number of buckets would be for this case.
Not to mention having to scare up a proper hash function
for strings of variable length.

Any thoughts would be appreciated.

--rb