Path: utzoo!attcan!uunet!midway!ncar!asuvax!cs.utexas.edu!swrinde!ucsd!ucbvax!MITCH.ENG.SUN.COM!wmb
From: wmb@MITCH.ENG.SUN.COM (Mitch Bradley)
Newsgroups: comp.lang.forth
Subject: HASHING THE DICTIONARY
Message-ID: <9010101326.AA20013@ucbvax.Berkeley.EDU>
Date: 8 Oct 90 15:13:51 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Reply-To: Mitch Bradley <wmb%MITCH.ENG.SUN.COM@CUNYVM.CUNY.EDU>
Organization: The Internet
Lines: 30

The F83 hash function selects one of 4 threads by looking at the lower 2
bits of the first character in the name (not the length byte, the first
character).

The current experimental version of my system uses a 256-way hash function
to index into a "cache" of recently-found entries from the forth vocabulary.
Other vocabularies do not participate.  The hash function concatenates
the low 3 bits of the name length with the low 5 bits of the first character
in the name.

I haven't studied the properties of this scheme in detail.  I sort of
threw it together in a hurry one evening.  Some advantages:
        * easy to layer on top of an existing system
        * doesn't consume much memory
        * hash cache memory can be discarded after the application is compiled

A fully-hashed dictionary like Ray Duncan uses is probably "better"
in some sense.

Binary trees have been tried.  I recall a paper by Hans Nieuwenhuiuzen
(I think) describing a tree-structured dictionary.  As I recall, the
disadvantages were:
    Some problem with using multiple vocabularies (although I can't
    think what it might be)

    Loss of chronological ordering; difficulty with redefining words.

These difficulties don't seem insurmountable.

Mitch