Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!decvax!harpo!seismo!hao!hplabs!sri-unix!v.wales@ucla-locus From: v.wales%ucla-locus@sri-unix.UUCP Newsgroups: net.unix-wizards Subject: Discussion of "dbm" data base system Message-ID: <15363@sri-arpa.UUCP> Date: Mon, 9-Jan-84 15:12:23 EST Article-I.D.: sri-arpa.15363 Posted: Mon Jan 9 15:12:23 1984 Date-Received: Sun, 15-Jan-84 01:25:39 EST Lines: 141 From: Rich Wales As part of a local effort here at UCLA to find a good set of random- access file routines for UNIX, I recently did an analysis of the "dbm" data base package distributed as part of 4.1BSD (as well as several other UNIX versions, I believe). Although the "dbm" algorithm is very attractive in many respects, it suffers from some very serious flaws which -- in my opinion -- make it unsuitable for medium- or large-sized data bases. Below is a description of the "dbm" algorithm and my conclusions. I am sending this out because I assume it will be of interest of lots of peo- ple. Also, I would appreciate any comments on my analysis or conclu- sions. Specifically: (1) Does anyone know who originally developed this algorithm, and where (if anywhere) it has been published? (I have heard a rumor that it was published in CACM some time ago, but I haven't yet had time to go looking for it.) (2) Are there any hash-coding experts out there who can explain the theoretical basis of "dbm"'s hashing scheme? (There is not a soli- tary shred of commentary in the entire source file, and the hashing routine seems to depend on what appear for all the world to be a set of totally random numbers.) (3) Has anyone else tried to debug or "clean up" the "dbm" package? If so, I would be interested in hearing about your fixes and the extent (if any) to which they might make "dbm" more usable. (4) Does anyone have a better random-access file package -- a B-tree package, for instance -- that they would be willing to share? -- Rich Wales ======================================================================== ANALYSIS OF THE "dbm" DATA BASE PACKAGE "dbm" is a simple data-base system which allows storage and retrieval of essentially arbitrary-length key/data pairs. Both keys and data may be arbitrary byte strings (not necessarily ASCII). In a small- to medium- sized data base, a given record can be located (or determined not to exist at all) in about one disk read. A "dbm" data base consists of two files -- a "data" file (whose name ends in ".pag") and a "bit-map" file (whose name ends in ".dir"). (1) The data file consists of 1024-byte pages, each of which may contain zero or more records. There is a size restriction: any given rec- ord must fit entirely in a single page. Further (as discussed in more detail below), it turns out that all the records whose keys hash to the same value must fit together in a single page as well. (2) The bit-map file is a two-dimensional triangular array of bits. For a given first subscript "m" (which can be any non-negative integer), the second subscript "n" can range from 0 to (2**m)-1. This two- dimensional array is linearized into a one-dimensional array for storage purposes via the function B(m,n) = (2**m)+n-1. (Bits which would be stored beyond the actual end of the bit-map file are as- sumed to be zero.) The interpretation of the bits in the bit-map file is described later. "dbm" uses a hashing scheme to place and locate records. The hashing algorithm used maps the key into a 32-bit integer. At the present time, I have not figured out the theoretical basis of the hashing algorithm, and unfortunately the "dbm" source itself is totally undocumented! The low-order "m" bits of the hash code (for a variable value of "m") are used as the page number in the data file. Assuming that the bit-map file has been set up correctly (see below), the algorithm is as follows (written in a C-like pseudo-code): h = hash_code (key); m = 0; n = h & ((2**m)-1); while (bit_map[m,n] == 1) { m = m+1; n = h & ((2**m)-1); } /* "n" is the desired page number. "m" is saved for later * reference in case page "n" fills up and must be split. */ Initially, the entire bit-map file contains zeros; hence, the first record or records will all go into page 0 of the data file. Eventually, though, page 0 will fill up. When this happens, bit 0 of the bit-map file (i.e., bit_map[0,0]) is set to 1, and all records in page 0 whose keys hash to an odd number are moved to page 1. The effect of turning on bit 0 of the bit-map file is that subsequent records will be placed (or searched for) in either page 0 or page 1, depending on the low-order bit of the hash code. As more and more records are placed in the data base, more and more pages of the data file are split in a manner similar to the above exam- ple. In general, if page "n" needs to be split, the bit "bit_map[m,n]" is set to 1, and all records in page "n" for which the m'th bit of the hash code is a 1 (i.e., hash_code(key) & (2**m) != 0) are moved to the page numbered n+(2**m). An informal interpretation of the bit map is that, if the low-order "m" bits of the hash code are "n", and bit_map[m,n] is 1, the algorithm must examine at least the next bit of the hash code (by incrementing "m"). Records may be deleted from the data base. Deletion of records does not affect the bit-map file in any way, but the space formerly occupied by a deleted record can be reused by subsequent additions. The "dbm" scheme sounds very elegant at first glance, but closer exami- nation reveals two critically serious difficulties: (1) In certain cases where hash codes differ only in the high-order bit or bits, the page-splitting process could easily create an unaccept- ably huge (though sparse) file. Since most UNIX utilities (includ- ing "dump" and "tar") do not understand sparse files, this creates a severe problem when trying to make backup copies of a "dbm" data base or the file system containing it. (2) Since the hash code determines the one and only page in which "dbm" will look for a record with a given key, all records whose keys hash to the same value must fit in a single page of the data file. Since the user has no way of predicting whether this condition will arise, and no recourse if it does, I must consider this an unacceptable re- striction in the "dbm" design. To make matters even worse, the "dbm" system as originally distrib- uted in 4.1BSD does not check for violations of this restriction. Rather, it will go into an endless loop trying to split data-file pages in order to make room for a new record. If the hash code in question is very large, the data base may be suddenly transformed into a giant sparse file in the course of this vain effort. In addition to the above fundamental flaws, the package also has a few other bugs. One of the most serious (but probably easily fixable) is that you can't manipulate multiple data bases in a single run. Not only does the "dbminit" routine not close the previous set of files be- fore opening a new set, but the routines that read pages from the ".dir" and ".pag" files use a caching scheme with "static" internal variables that "dbminit" cannot reinitialize. ========================================================================