Path: utzoo!attcan!uunet!jarthur!usc!rutgers!att!ulysses!swfc
From: swfc@ulysses.att.com (Shu-Wie F Chen)
Newsgroups: comp.databases
Subject: Re: Is RDBMS unproven technology?  (Flames to follow....)
Message-ID: <13545@ulysses.att.com>
Date: 7 Aug 90 15:03:22 GMT
References: <1073@ashton.UUCP> <10371@sybase.sybase.com> <13532@ulysses.att.com> <10419@sybase.sybase.com>
Sender: netnews@ulysses.att.com
Reply-To: swfc@ulysses.att.com (Shu-Wie F Chen)
Organization: AT&T Bell Labs
Lines: 161

In article <10419@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes:
|>In article <13532@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F
Chen) writes:
|>>In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood)
writes:
|>>|>
|>>|>Relational systems have so far been deployed in smaller-scale
|>>|>applications than have hierarchical and network systems.  
|>>
|>>I don't see ... why
|>>relational systems have so far only been deployed in smaller-scale
|>>applications.
|>
|>What I'm driving at is that relational has not developed in a DP/MIS
|>context, and DP/MIS is where most of the large-scale business applications
|>have traditionally resided.  Relational is the architecture of choice for
|>the "bottom-up" development of organizational databases, where local DP
|>departments are creating relational databases to manager their local
|>operations, and looking for ways to tie all those local databases
|>together.
|>

Yes, relational databases are easier to implement (from a DBA's point of view).

|>>|>The appeal of relational systems has been the promise of flexible
|>>|>access to the database by users far removed from the DP department.
|>>
|>>RDBMSs have made two contributions:
|>>1. non-procedural access
|>>2. data independence
|>
|>True, but most users running canned applications won't be as aware of 
|>these features as applications programmers, who beenfit most from them.  
|>I was really only discussing the end-users, since they are the largest
|>group of database utilizers in an organization.  Your comment is
|>correct and rounds out my point.

If you are talking about end users and canned applications, the model
used isn't that important.  If you talk about the programmers who
implement the canned application, then it is a different story. 
Frankly, I am now confused.  Your previous arguments made sense for
application programmers, but now you say you were really talking about
end users.

|>>|>Relational systems lend themselves well to distributed database, where
|>>|>by definition there will be fewer, if any, centralized [servers]
|>>
|>>Huh?  What definition?
|>
|>If a database is distributed, then the database state is maintained by
|>more than one server.  The limiting case is where every machine on
|>the network is of similar size and maintains an equal part of the database.  
|>A more likely scenario is a server hierarchy, such as in telephone exchanges.
|>

My question arose because you gave no reason why you thought relational
systems were better for distributed databases.  I then gave my reason below.

|>>I think relational systems lend themselves well to distributed databases
|>>because they are set-oriented, rather than navigational systems like the
|>>hierarchical and network models.  
|>
|>That's what I was driving at.  Thanks for the words beyond "so many words".
|>

|>>|>... Today's technology is proving (already has, actually) that the
|>>|>assertion that relational is slow is out-of-date.  What's more,
|>>
|>>I think that that assertion was proven incorrect about 10-15 years ago.
|>
|>It was proven that RDBMS COULD be as fast as existing navigational systems,
|>but there haven't been competitive products till recently.  "Proof"
|>for many folks requires no less than a released (or announced :-) product.
|>

Well, have there seen any released competitive products (by mean
competitive, I don't mean better than other *relational* DBMSs, but
better than any *other* DBMSs).

|>
|>>... 1000TPS is high-performance.  12TPS (or 34) is acceptable.
|>
|>Show me someone getting 1000TPS on a Sun 3/280.  What must not exist in a
|>product is a performance ceiling above which throughput stops growing
|>(linerarly) with increase in platform scale.  That's the essence of 
|>the perceived "relational bug."

This was really a cheap shot on my part.  I was referring to Kai Li's
main-memory database system at Princeton which I believe achieved 1000
TPS.  No, it was not on a Sun 3/280...

|>
|>>... [J]oins are a real big performance killer for relational systems.  
|>
|>Not if they are pre-optimized or pre-computed.

What do you mean by pre-optimized or pre-computed?  What if I performed
a join that was not pre-optimized or pre-computed?

|>
|>>So there is some substance behind the users
|>>associating relational "... products with poor system performance, even
|>>though they may be flexible and easier to implement."[from the original
|>>posting on the British report]
|>
|>Sure, the substance is based on historical knowledge.  That knowledge
|>is being obsoleted by the onset of RDBMSs that scale well.  I believe
|>users will be able to have both DP-scale performance and ease of use in RDBMS
|>in the near future.
|>

How many RDBMSs scale well (besides Sybase, of course ;-)?  Better yet,
how many RDBMSs scale?

|>>But to answer Tom's question on whether "relational" has to mean "overhead":
|>>Relational does not mean overhead, but since it provides more "features"
|>>(flexible, easier to implement, easier to use(?)), some overhead *must*
|>>be incurred.
|>
|>The question is where to place that overhead.  That's one problem we
|>(Sybase anyway) are trying to solve.
|>
|>>I think a good discussion would be over where the overheads are.  For
|>>starters, relational query compilation has to be smarter.  
|>
|>Hmm, I've been developing the opinion that query compilation is a largely
|>solved problem (cost-based optimizers, etc.), but that fundamental things
|>like I/O management and access methods policies need a lot more work
|>in RDBMS.  So sounds like we have a good discussion ahead of us :-) .

I/O management and access methods policies are orthogonal to the data
model.  These issues are just as important in navigational models.

The reason I suggested query compilation as a point of study is that in
navigational systems, the application programmer has to know the
physical layout of the database files in order to write code that could
navigate.  The programmer has to know about the clustering, the indices,
what pointers to chase, etc.  (Please correct if I am wrong about this. 
I have never had the opportunity to program on a navigational system). 
Therefore, a *good* application programmer would know the best way to
access the database for a given query and could write optimal code.  One
the other hand, in the relational model, application programmers are
encouraged not to know the underlying physical layout of the database. 
They are dependent on the query compiler to map their logical view and
operations to physical operations.  I don't believe current compilers
have reached the expertise of hand-crafted coders in performing this mapping.

It is certainly easier to talk about relational things since a
declarative language is used instead of a procedural one.  But the
penalty of a declarative language is that it must be translated to a
procedural one.  Though RDBMSs (and in particular, Sybase) can use
precompiled queries to improve performance, this does not solve the
problem of ad-hoc queries.

BTW, is there such a thing as an ad-hoc query in navigational systems?

Cheers,
*swfc