Path: utzoo!utgpu!water!watmath!clyde!rutgers!sri-spam!sri-unix!quintus!ok
From: ok@quintus.UUCP (Richard A. O'Keefe)
Newsgroups: comp.databases
Subject: Re: Pro-Pre-Relational
Keywords: network,hierarchical,relational,comparison
Message-ID: <514@cresswell.quintus.UUCP>
Date: 8 Jan 88 09:15:20 GMT
References: <2557@sfsup.UUCP> <68@coot.AUSTIN.LOCKHEED.COM>
Organization: Quintus Computer Systems, Mountain View, CA
Lines: 100

I strongly urge everyone who is interested in this debate,
and who hasn't already got a copy, to try to obtain access to a
copy of
	"Relational Database : Selected Writings",
	C. J. Date,
	Addision-Wesley, 1986
	US$ 35 (hardback)
The relevant chapter at the moment is Chapter 6:
	"Some Relational Myths Exploded".

In article <68@coot.AUSTIN.LOCKHEED.COM>,
chris@AUSTIN.LOCKHEED.COM (Chris Wood) writes:
> There are/were a number of "good things" about network/hierarchical models:
> 2. CODASYL/Network DBMSs had a little feature called "Place Near" that
> allowed related record occurrences of more than one record type to be
> physically clustered, thus optimizing performance of retrievals.
> 
> How can relational implementations do this unless they know about such
> relationships?
They can't, any more than the CODASYL ones could.
Both the DBTG stuff and the relational model distinguish between
the abstract design and the physical layout.  Putting something near
something else is a physical question.

This is Date's
    "MYTH NUMBER 6: The data must be hierarchically clustered
		    for good performance."
As Date says "what is generally overlooked when such claims are made
is that such interleaving biases the database TOWARD some applications
but AGAINST others."  (Page 89.)

It also ties in with
    "MYTH NUMBER 7: Hierarchic clustering requires pointers."
which is exploded on page 90, on which he says that ORACLE actually
lets you do it.  The sketch is

	CREATE CLUSTER DEPTEMP
		DEPT# ...
	ALTER CLUSTER DEPTEMP
		ADD TABLE DEPT
		WHERE DEPT# = DEPTEMP.DEPT#
	ALTER CLUSTER DEPTEMP
		ADD TABLE EMP
		WHERE DEPT# = DEPTEMP.DEPT#

Note that this does not at all change what the relations look like
or what you can do with them, only where tuples are stored.


> 3. Many hierarchical and Network DBMSs allow repeating fields and even
> repeating groups of fields.  Relational "purist" violently object to this
> on the grounds that it is not "normalized". 
> 
> However, consider the following scenario:
> I have a General Ledger application with 4000 account entries.  I need to
> keep track of 12 months worth of data for each account.  In the relational

Actually, they will refer to "First Normal Form".
The relational *model* says that data items should be atomic *with
respect to the application*.  What this means is actually pretty
vague.  Many *implementations* say that a data item should be some
sort of number, string, or timestamp.  Basically, yes, if there is
structure that your application is interested in, it should be
explicit in the relations.

In this case, since there are 12 months in every year, why not
have one relation with 12 separate attributes over the same domain?
If some of the values are known and others aren't, that's exactly
what NULL values are for.

> speed of retrieval obviously.  On average, it should take about 10 times
> as long to examine 40000 records as 4000 records.  Of course the records

Not if the 4,000 records are 10 times bigger than the 40,000.  What counts
is number of disc accesses.  Suppose that you have 4,000 records each
64 bytes long and 36,000 records each 12 bytes long, whereas if you
had been able to pack them together you'd have had 4,000 records each
112 bytes long.  Then the repeating-field version would have taken
448,000 bytes, and the "flat" version would take 688,000 bytes, a
ratio of about 1.54.  If you are examining all the information, the
slow-down would probably be about the same, a factor of 1.5, NOT a
factor of 10.  What the factor actually is depends on the layout your
particular data base system picks, of course.

But what is so special about repeating fields?  That's only warmed-over
COBOL.  Why not let me use any data structure my programming language
will support?  Let's see, arbitrary sized trees, logical variables,
arbitrary precision integers, ...  You mean COBOL doesn't do that?  Oh.
Why not let me store the triangular arrays, N-dimensional tables with
margins, experiment designs, and so on I use in GENSTAT?  You mean COBOL
doesn't do that?  Oh.  Look, you have to draw the line *somewhere*.

> ----------------------------MAIN POINT--------------------------------
> 
> Not everything in hierarchical/network technology is bad.  We should learn
> from both our successes as well as our mistakes.  I think that some of the
> good features of these "old" technologies should be salvaged, and 
> incorporated into relational implementations.
> 
Absolutely right.  (Er, what were those two successes, again?)