Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!linus!decvax!harpo!seismo!hao!hplabs!sri-unix!v.wales@ucla-locus
From: v.wales%ucla-locus@sri-unix.UUCP
Newsgroups: net.unix-wizards
Subject: Discussion of "dbm" data base system
Message-ID: <15363@sri-arpa.UUCP>
Date: Mon, 9-Jan-84 15:12:23 EST
Article-I.D.: sri-arpa.15363
Posted: Mon Jan  9 15:12:23 1984
Date-Received: Sun, 15-Jan-84 01:25:39 EST
Lines: 141

From:            Rich Wales <v.wales@ucla-locus>

As part of a local effort here at UCLA to find a good set of random-
access file routines for UNIX, I recently did an analysis of the "dbm"
data base package distributed as part of 4.1BSD (as well as several
other UNIX versions, I believe).  Although the "dbm" algorithm is very
attractive in many respects, it suffers from some very serious flaws
which -- in my opinion -- make it unsuitable for medium- or large-sized
data bases.

Below is a description of the "dbm" algorithm and my conclusions.  I am
sending this out because I assume it will be of interest of lots of peo-
ple.  Also, I would appreciate any comments on my analysis or conclu-
sions.  Specifically:

(1) Does anyone know who originally developed this algorithm, and where
    (if anywhere) it has been published?  (I have heard a rumor that it
    was published in CACM some time ago, but I haven't yet had time to
    go looking for it.)

(2) Are there any hash-coding experts out there who can explain the
    theoretical basis of "dbm"'s hashing scheme?  (There is not a soli-
    tary shred of commentary in the entire source file, and the hashing
    routine seems to depend on what appear for all the world to be a
    set of totally random numbers.)

(3) Has anyone else tried to debug or "clean up" the "dbm" package?  If
    so, I would be interested in hearing about your fixes and the extent
    (if any) to which they might make "dbm" more usable.

(4) Does anyone have a better random-access file package -- a B-tree
    package, for instance -- that they would be willing to share?

-- Rich Wales <v.wales@UCLA-LOCUS>

========================================================================

		ANALYSIS OF THE "dbm" DATA BASE PACKAGE

"dbm" is a simple data-base system which allows storage and retrieval of
essentially arbitrary-length key/data pairs.  Both keys and data may be
arbitrary byte strings (not necessarily ASCII).  In a small- to medium-
sized data base, a given record can be located (or determined not to
exist at all) in about one disk read.

A "dbm" data base consists of two files -- a "data" file (whose name
ends in ".pag") and a "bit-map" file (whose name ends in ".dir").

(1) The data file consists of 1024-byte pages, each of which may contain
    zero or more records.  There is a size restriction:  any given rec-
    ord must fit entirely in a single page.  Further (as discussed in
    more detail below), it turns out that all the records whose keys
    hash to the same value must fit together in a single page as well.

(2) The bit-map file is a two-dimensional triangular array of bits.  For
    a given first subscript "m" (which can be any non-negative integer),
    the second subscript "n" can range from 0 to (2**m)-1.  This two-
    dimensional array is linearized into a one-dimensional array for
    storage purposes via the function B(m,n) = (2**m)+n-1.  (Bits which
    would be stored beyond the actual end of the bit-map file are as-
    sumed to be zero.)  The interpretation of the bits in the bit-map
    file is described later.

"dbm" uses a hashing scheme to place and locate records.  The hashing
algorithm used maps the key into a 32-bit integer.  At the present time,
I have not figured out the theoretical basis of the hashing algorithm,
and unfortunately the "dbm" source itself is totally undocumented!

The low-order "m" bits of the hash code (for a variable value of "m")
are used as the page number in the data file.  Assuming that the bit-map
file has been set up correctly (see below), the algorithm is as follows
(written in a C-like pseudo-code):

	    h = hash_code (key);
	    m = 0;
	    n = h & ((2**m)-1);
	    while (bit_map[m,n] == 1)
	    {   m = m+1;
		n = h & ((2**m)-1);
	    }
	    /* "n" is the desired page number.  "m" is saved for later
	     * reference in case page "n" fills up and must be split.
	     */

Initially, the entire bit-map file contains zeros; hence, the first
record or records will all go into page 0 of the data file.  Eventually,
though, page 0 will fill up.  When this happens, bit 0 of the bit-map
file (i.e., bit_map[0,0]) is set to 1, and all records in page 0 whose
keys hash to an odd number are moved to page 1.  The effect of turning
on bit 0 of the bit-map file is that subsequent records will be placed
(or searched for) in either page 0 or page 1, depending on the low-order
bit of the hash code.

As more and more records are placed in the data base, more and more
pages of the data file are split in a manner similar to the above exam-
ple.  In general, if page "n" needs to be split, the bit "bit_map[m,n]"
is set to 1, and all records in page "n" for which the m'th bit of the
hash code is a 1 (i.e., hash_code(key) & (2**m) != 0) are moved to the
page numbered n+(2**m).

An informal interpretation of the bit map is that, if the low-order "m"
bits of the hash code are "n", and bit_map[m,n] is 1, the algorithm must
examine at least the next bit of the hash code (by incrementing "m").

Records may be deleted from the data base.  Deletion of records does not
affect the bit-map file in any way, but the space formerly occupied by a
deleted record can be reused by subsequent additions.

The "dbm" scheme sounds very elegant at first glance, but closer exami-
nation reveals two critically serious difficulties:

(1) In certain cases where hash codes differ only in the high-order bit
    or bits, the page-splitting process could easily create an unaccept-
    ably huge (though sparse) file.  Since most UNIX utilities (includ-
    ing "dump" and "tar") do not understand sparse files, this creates
    a severe problem when trying to make backup copies of a "dbm" data
    base or the file system containing it.

(2) Since the hash code determines the one and only page in which "dbm"
    will look for a record with a given key, all records whose keys hash
    to the same value must fit in a single page of the data file.  Since
    the user has no way of predicting whether this condition will arise,
    and no recourse if it does, I must consider this an unacceptable re-
    striction in the "dbm" design.
    
    To make matters even worse, the "dbm" system as originally distrib-
    uted in 4.1BSD does not check for violations of this restriction.
    Rather, it will go into an endless loop trying to split data-file
    pages in order to make room for a new record.  If the hash code in
    question is very large, the data base may be suddenly transformed
    into a giant sparse file in the course of this vain effort.

In addition to the above fundamental flaws, the package also has a few
other bugs.  One of the most serious (but probably easily fixable) is
that you can't manipulate multiple data bases in a single run.  Not
only does the "dbminit" routine not close the previous set of files be-
fore opening a new set, but the routines that read pages from the ".dir"
and ".pag" files use a caching scheme with "static" internal variables
that "dbminit" cannot reinitialize.

========================================================================