Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!zephyr.ens.tek.com!tektronix!sequent!dafuller
From: dafuller@sequent.UUCP (David Fuller)
Newsgroups: comp.databases
Subject: Re: Is RDBMS unproven technology?  (Flames to follow....)
Message-ID: <40132@sequent.UUCP>
Date: 6 Aug 90 21:22:57 GMT
References: <1073@ashton.UUCP> <10371@sybase.sybase.com>
Reply-To: dafuller@sequent.UUCP (David Fuller)
Organization: Sequent Computer Systems, Inc
Lines: 100

In article <10371@sybase.sybase.com> tim@ohday.sybase.com (Tim Wood) writes:
>In article <1073@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
>>Are relational databases an unproven technology regarding
>>performance?  
>>
>>....A key tenet of the report is that RDBMS technology has been
>>available for 20 years but still has not been proved in large,
>>complex applications.  The report notes that users associate
>>these products with poor system performance, even though they may
>>be flexible and easier to implement.  The article then goes on 
>>to cite a firm that is reluctant to replace IMS with DB2, and
>>discusses other sites that use a mixture of relational and
>>possibly non-relational systems.

Some random thoughts from someone who's in the trenches and deals with
less than, uhh, theoretical arguments...

In my experience with Very Large Databases, the DBMS type is less
important than the quality of the individual system's implementation.
The 10% of the time you spend developing is quickly subsumed by the
requirement to plan for and provide a stable applications environment.

To wit: The typical SQL-based RDBMS is abstract enough from what's
going on down deep to permit gross errors in implementation.  I've 
looked at systems which fetched 100,000 records and threw away every
one except the single tuple of interest.  The fact that it was an RDBMS
was irrelevant.  You coulda been doing IMS or FOCUS and made that
mistake.

Axiom 1: There is no substitute for planning.

>
>Relational systems have so far been deployed in smaller-scale
>applications than have hierarchical and network systems.  This is due
>to several factors: relational is "newer" (that is, the technology
>existed long before successful commercial products) and the older
>database architectures were deployed in the days when nearly all
>commercial computing resources were centralized and operating in a
>batch-processing environment.  In that environment, updates and access
>to the database are relatively rigidly controlled.

Sure, relational is "new", but the basic access methods have not 
considerably improved; we still use B-trees and relative files and 
maybe hashed files.  The "relational" aspect is a layer above this.
I can write slow code in any environment; and there is nothing inherent
to the relational model which makes it slower than any other model.

The fact is that noble 3NF implementations almost always get mutated
by harsh reality: that you end up generating "extract" tables and other
de-facto optimizations once you do a simple calculation of how many I/Os
it's gonna take to support your subsecond, online application.

That's reality; you can either spend money for hardware or take a hardnosed
approach to implementation.

Second, the biggest horror to big DBMS DBAs is the unknown called
"ad-hoc queries".  It is easy to hurt a production system on many platforms by 
issuing queries from hell that can't ever complete but require massive
sequential scans.  Big DBMS engines usually have strict controls on
adhocery and either prioritize them low or require they complete in batch.

In fact, lots of big systems do overnite extracts and provide an online
system to promote decision support.  Rarely do these systems permit 
queries to "live" data simply because supporting the surge load caused by
adhoc in current implementations costs too much money.

Axiom #2: Ad-hoc means unpredictable, which represents a basic incongruety
against the goal of production.  No current DBMS or implementation knows
how to balance the two in a truly large implementation automatically.

(I have not seen the DBMS yet that sends me mail and counsels "Dave, I've
been reviewing access patterns and I really think you should consider a
clustered index...")

...

In conclusion:

1) There's no free lunch.  Until we find a more expressive mechanism
   for revealing the intent of the user to the DBMS then we're going to
   live with controls over what a particular user can do.  We need to
   be able to control plowing of new furrows thru a DBMS carefully
   versus handling heads-down data entry with predictable speed.

2) Experience at Tandem shows that a true SQL RDBMS doesn't have to be
   slower, in fact the State of California has committed to NonStop SQL
   for their entire vehicle database based on some strenuous benchmarks.

3) We are a long ways away from creating DBMS systems into which data
   can be poured and the relied on to balance access and update needs.
   No matter what your implementation, it will take intelligence and
   forethought to create a successful implementation.

Speaking for myself, as always...

-- 
Dave Fuller				   
Sequent Computer Systems		  Think of this as the hyper-signature.
(312) 318-0050 (humans)			  It means all things to all people.
{uunet,sun,...}!sequent!dafuller