Path: utzoo!attcan!uunet!midway!ncar!asuvax!cs.utexas.edu!swrinde!ucsd!ucbvax!MITCH.ENG.SUN.COM!wmb From: wmb@MITCH.ENG.SUN.COM (Mitch Bradley) Newsgroups: comp.lang.forth Subject: HASHING THE DICTIONARY Message-ID: <9010101326.AA20013@ucbvax.Berkeley.EDU> Date: 8 Oct 90 15:13:51 GMT Sender: daemon@ucbvax.BERKELEY.EDU Reply-To: Mitch Bradley Organization: The Internet Lines: 30 The F83 hash function selects one of 4 threads by looking at the lower 2 bits of the first character in the name (not the length byte, the first character). The current experimental version of my system uses a 256-way hash function to index into a "cache" of recently-found entries from the forth vocabulary. Other vocabularies do not participate. The hash function concatenates the low 3 bits of the name length with the low 5 bits of the first character in the name. I haven't studied the properties of this scheme in detail. I sort of threw it together in a hurry one evening. Some advantages: * easy to layer on top of an existing system * doesn't consume much memory * hash cache memory can be discarded after the application is compiled A fully-hashed dictionary like Ray Duncan uses is probably "better" in some sense. Binary trees have been tried. I recall a paper by Hans Nieuwenhuiuzen (I think) describing a tree-structured dictionary. As I recall, the disadvantages were: Some problem with using multiple vocabularies (although I can't think what it might be) Loss of chronological ordering; difficulty with redefining words. These difficulties don't seem insurmountable. Mitch