Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!jarthur!elroy.jpl.nasa.gov!hacgate!ashtate!dbase!awd
From: awd@dbase.A-T.COM (Alastair Dallas)
Newsgroups: comp.databases
Subject: Re: clipper internals
Summary: Smaller keys
Keywords: clipper dbase index
Message-ID: <441@dbase.A-T.COM>
Date: 26 Feb 90 17:49:41 GMT
References: <13221@cbnewse.ATT.COM>
Distribution: na
Organization: Ashton Tate Development Center Glendale, Calif.
Lines: 26


Congratulations!  You've managed to hit on precisely what my management's
lawyers mean when they speak of "proprietary information."  Sorry for being
flip, but this is in reply to mail that asked question after question
pertaining to the exact nature of Clipper (and by extension dBASE) operations
and there's just no way I can be forthcoming.

I can say that the main speed cost in any PC database system is reading the
disk.  Nothing else (string compare vs numeric compare) comes close to 
affecting the bottom line speed so profoundly as being able to avoid 
"hitting the disk" even once.  Therefore, by keeping your index keys
small you allow the system to pack more of them into a fixed-length block
(dBASE IV supports adjustable block sizes), which ultimately reduces the
number of disk reads (especially for SKIP operations).  If you want to get
really tricky, write code that hashes your key values into a 4-byte long
and index on a UDF that uses this value to build a 4-byte Character string.
That'll let you SKIP 40 times or so without reading another index node.

The other thing I _can_ say is that you might look at Knuth's "Art of 
Computer Programming," Vol. 3: Sorting and Searching.  It describes the
operation of Clipper's and dBASE's indexing in sufficient abstraction so
as not to perturb the lawyers.

Hope it helps.

/alastair/