Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!caip!topaz!ll-xn!nike!ucbcad!ucbvax!decvax!ima!johnl From: johnl@ima.UUCP (John R. Levine) Newsgroups: net.database Subject: Re: Btree Question... Message-ID: <164@ima.UUCP> Date: Thu, 17-Jul-86 11:06:33 EDT Article-I.D.: ima.164 Posted: Thu Jul 17 11:06:33 1986 Date-Received: Fri, 18-Jul-86 05:19:14 EDT References: <151@itcatl.UUCP> <5303@ut-sally.UUCP> Reply-To: johnl@ima.UUCP (John R. Levine) Organization: Javelin Software Corporation Lines: 40 Summary: don't cheat when you delete In article <5303@ut-sally.UUCP> ark@sally.UUCP (Arthur M. Keller) writes: >In article <151@itcatl.UUCP> parris@itcatl.UUCP (Parris Hughes) writes: >>... I am having problems understanding the B-Tree deletion alogrithm. > >I suggest that the answer is only of `academic' interest. Because of >fanout considerations (increasing fanout decreases tree height), all >internal (non-leaf) nodes contain keys only for the purposes of >navigation. All actual database keys appear only in the leaves of the >B-tree (this is technically called a B+-tree (the + is a superscript, >the - is a hyphen)). This means that internal node keys need not be >actual database keys, and hence are not affected by deletion. Also, >right end key compression is appropriate to save space (and can even >be a factor when deciding to split a node!). Not necessarily. For one thing, you shouldn't assume that all useful B-trees are really indices into data stored external to the B-tree. If your records are fairly small, and if there is one key that is the predominant access path, it makes perfect sense to put the actual records into the B-tree. If you need secondary indices, they would be B-trees which map the secondary key into the primary key which you could then look up. This scheme sacrifices secondary index lookup time for primary index lookup time, which may be what you want. The other point is that if you don't delete stale records from the B-tree, the nice logarithmic access time property of B-trees no longer holds. I find this particularly insidious when the keys in question are things like dates of pending orders in a file, so that the file tends to shrink at the back and grow at the front. Delete your stale records, you'll be glad you did. Both left and right hand key compression are useful ways to cram more entries into a tree node, and there's no argument that in the usual case that it's more expensive to fetch a node from the disk than to process it once it's read in, they're a big win. On the other hand, I have this blindingly fast Eagle disk drive lashed up to a very sluggish IBM PC, and the tradeoffs are not always what you'd assume. -- John R. Levine, Javelin Software Corp., Cambridge MA +1 617 494 1400 { ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.EDU The opinions expressed herein are solely those of a 12-year-old hacker who has broken into my account and not those of any person or organization.