Path: utzoo!utgpu!water!watmath!clyde!rutgers!sri-spam!sri-unix!quintus!ok From: ok@quintus.UUCP (Richard A. O'Keefe) Newsgroups: comp.databases Subject: Re: Pro-Pre-Relational Keywords: network,hierarchical,relational,comparison Message-ID: <514@cresswell.quintus.UUCP> Date: 8 Jan 88 09:15:20 GMT References: <2557@sfsup.UUCP> <68@coot.AUSTIN.LOCKHEED.COM> Organization: Quintus Computer Systems, Mountain View, CA Lines: 100 I strongly urge everyone who is interested in this debate, and who hasn't already got a copy, to try to obtain access to a copy of "Relational Database : Selected Writings", C. J. Date, Addision-Wesley, 1986 US$ 35 (hardback) The relevant chapter at the moment is Chapter 6: "Some Relational Myths Exploded". In article <68@coot.AUSTIN.LOCKHEED.COM>, chris@AUSTIN.LOCKHEED.COM (Chris Wood) writes: > There are/were a number of "good things" about network/hierarchical models: > 2. CODASYL/Network DBMSs had a little feature called "Place Near" that > allowed related record occurrences of more than one record type to be > physically clustered, thus optimizing performance of retrievals. > > How can relational implementations do this unless they know about such > relationships? They can't, any more than the CODASYL ones could. Both the DBTG stuff and the relational model distinguish between the abstract design and the physical layout. Putting something near something else is a physical question. This is Date's "MYTH NUMBER 6: The data must be hierarchically clustered for good performance." As Date says "what is generally overlooked when such claims are made is that such interleaving biases the database TOWARD some applications but AGAINST others." (Page 89.) It also ties in with "MYTH NUMBER 7: Hierarchic clustering requires pointers." which is exploded on page 90, on which he says that ORACLE actually lets you do it. The sketch is CREATE CLUSTER DEPTEMP DEPT# ... ALTER CLUSTER DEPTEMP ADD TABLE DEPT WHERE DEPT# = DEPTEMP.DEPT# ALTER CLUSTER DEPTEMP ADD TABLE EMP WHERE DEPT# = DEPTEMP.DEPT# Note that this does not at all change what the relations look like or what you can do with them, only where tuples are stored. > 3. Many hierarchical and Network DBMSs allow repeating fields and even > repeating groups of fields. Relational "purist" violently object to this > on the grounds that it is not "normalized". > > However, consider the following scenario: > I have a General Ledger application with 4000 account entries. I need to > keep track of 12 months worth of data for each account. In the relational Actually, they will refer to "First Normal Form". The relational *model* says that data items should be atomic *with respect to the application*. What this means is actually pretty vague. Many *implementations* say that a data item should be some sort of number, string, or timestamp. Basically, yes, if there is structure that your application is interested in, it should be explicit in the relations. In this case, since there are 12 months in every year, why not have one relation with 12 separate attributes over the same domain? If some of the values are known and others aren't, that's exactly what NULL values are for. > speed of retrieval obviously. On average, it should take about 10 times > as long to examine 40000 records as 4000 records. Of course the records Not if the 4,000 records are 10 times bigger than the 40,000. What counts is number of disc accesses. Suppose that you have 4,000 records each 64 bytes long and 36,000 records each 12 bytes long, whereas if you had been able to pack them together you'd have had 4,000 records each 112 bytes long. Then the repeating-field version would have taken 448,000 bytes, and the "flat" version would take 688,000 bytes, a ratio of about 1.54. If you are examining all the information, the slow-down would probably be about the same, a factor of 1.5, NOT a factor of 10. What the factor actually is depends on the layout your particular data base system picks, of course. But what is so special about repeating fields? That's only warmed-over COBOL. Why not let me use any data structure my programming language will support? Let's see, arbitrary sized trees, logical variables, arbitrary precision integers, ... You mean COBOL doesn't do that? Oh. Why not let me store the triangular arrays, N-dimensional tables with margins, experiment designs, and so on I use in GENSTAT? You mean COBOL doesn't do that? Oh. Look, you have to draw the line *somewhere*. > ----------------------------MAIN POINT-------------------------------- > > Not everything in hierarchical/network technology is bad. We should learn > from both our successes as well as our mistakes. I think that some of the > good features of these "old" technologies should be salvaged, and > incorporated into relational implementations. > Absolutely right. (Er, what were those two successes, again?)