Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!decvax!yale-com!leichter From: leichter@yale-com.UUCP (Jerry Leichter) Newsgroups: net.unix-wizards Subject: Re: Discussion of "dbm" data base system Message-ID: <2724@yale-com.UUCP> Date: Mon, 16-Jan-84 15:41:40 EST Article-I.D.: yale-com.2724 Posted: Mon Jan 16 15:41:40 1984 Date-Received: Wed, 18-Jan-84 01:20:13 EST References: sri-arpa.15363 Lines: 31 The algorithm used in the "dbm" system was described in TODS (Transactions on Database Systems) about 4 years ago in an article by a couple of people at IBM (whose name will not come back to me now) title "Extendible Hashing". It's a beautiful algorithm, and sparked a series of other articles on hashing schemes that can grow and shrink their hash tables dynamically. The TODS article contains some theoretical and simulation results that indicate that asymtotically extendible hashing and B-trees require about the same amount of space, on average. Extendible hashing has the big advantage that it takes at most 2 disk accesses to get to any data item, independent of the size of the data-base; B-trees have logarithmically-growing access times, and typically require 3 or 4 accesses for real-life, large databases. The problem of blocks filling with records with identical hash keys is an exact analogue of the B-tree problems with blocks filling with identical keyed records. The solution is the same in both cases: You "logically extend" the leaf block with a continuation pointer. This costs you at least one extra data access, so you hope it is rare. It can be shown that for large numbers of bits in the hash, the chance of this happening is very small; of course, the code MUST be there; dbm is just a poor implementation. I don't know the origin of the dbm "magic hash function", but if the author followed the advice of the authors of the "Extendible Hashing" paper, he would have read a paper on "Universal Classes of Hash Functions", by Carter and Wegman, which shows how to construct excellent hash functions "almost all the time". If you want further information, like the exact references, write. -- Jerry decvax!yale-comix!leichter leichter@yale