Path: utzoo!attcan!uunet!cimshop!davidm
From: cimshop!davidm@uunet.UU.NET (David S. Masterson)
Newsgroups: comp.databases
Subject: Re: Comment on the "Third-Generation Database System Manifesto"
Message-ID: <CIMSHOP!DAVIDM.90Sep23204350@uunet.UU.NET>
Date: 24 Sep 90 03:43:50 GMT
References: <21178@hercules.csl.sri.com>
Sender: davidm@cimshop.UUCP
Distribution: comp
Organization: Consilium Inc., Mountain View, California.
Lines: 244
In-reply-to: donovan@julius.csl.sri.com's message of 21 Sep 90 17:33:43 GMT

In article <21178@hercules.csl.sri.com> donovan@julius.csl.sri.com 
(Donovan Hsieh) writes:

   Recently, the paper "Third-Generation Database System Manifesto" was
   published in the 1990 ACM SIGMOD Conference Proceedings.
   [...deleted...]			The paper was also partly written in
   response to an earlier published paper, "The Object-Oriented Database
   System Manifesto,"

Both papers were very interesting.  Too bad there's such a schism in the
database community.

   Proposition 1.1 of the manifesto says that "A third generation DBMS must
   have a rich type systems." I agree that entirely new database systems such
   as OODBs are not needed to support abstract data types (ADT). But I doubt
   that ALL types can be added to or extended from the current relational
   systems. Stretching existing relational systems beyond their inherent
   limits would most likely cause an inefficient implementation.  I feel that
   a pure OODB approach is more suitable and will be better able to provide
   full ADT features that are compatible with the existing object-oriented
   programming languages, such as C++.

Examples please.  There's too much "feeling" in this paragraph.  By the way,
one criticism I have of the OO paper is the lack of credible reasoning on why
a system implementing the relational model (the current relational systems are
not really *completely* relational) is not a basic OODB that can be built
upon to full object-oriented status.

   Proposition 1.3 of the manifesto says that "Functions, including database
   Procedures and methods, and encapsulation are a good idea." However, it
   makes the criticism that some OODBs require users to use only functions to
   access data elements (attributes) of a collection (object instance). In
   fact, there are some OODB systems that allow an object class to specify
   public and private attributes, where a public attribute can be directly
   accessed by database query languages, and a private attribute can only be
   accessed through pre-defined methods to protect object integrity.

You contradict yourself.  "Some OODBs require" means just that -- some.  Also,
what is the definition of "object integrity" in this case and how do these
methods differ from "constraints and triggers" (IMHO, there is none).

   Proposition 1.4 of the manifesto says that "Unique identifiers (UIDs) for
   records should be assigned by the DBMS only if a user-defined primary key
   is not available". It also argues that a human-readable, immutable primary
   key in relational systems is superior to the UID or OID used by OODBs. A
   UID in an OODB has different meanings and purposes than a primary key in a
   relational system. A UID guarantees that no two object instances will
   contain the same ID during the lifetime of the system.

   Also, it is very costly when a primary key is CHANGED in a relational
   database, such as when someone's SS# is initially entered incorrectly.  It
   must then be changed everywhere it was used as a foreign key. If a person
   is referenced by UID, the SS# only has to be changed in one place.

Actually, the immutable, system-assigned primary key was a central part of the
Codd/Date RM/T model (Date's Introduction to Database Systems V2), so the UID
concept doesn't disagree with the relational model.  This is one point where,
IMHO, the manifesto disagrees with the model.  IMO, though, the question
foreign key updating could be handled by the proper use of constraints and
triggers.

   Proposition 2.1 of the manifesto says that "Essentially, all programmatic
   access to a database should be through a non-procedural, high-level access
   language".  It argues that the navigational approach used in OODBs is
   undesirable and inefficient comparing with the use of non-procedural query
   languages in relational systems. I feel that this proposition is rather
   misleading. The manifesto claims that a well-written and well-tuned query
   optimizer can almost always produce a better execution method than a human.
   A query optimizer could probably do a better job for repetitive and
   straightforward accesses. However, there are cases where human navigation
   is required and a query optimizer cannot foresee all patterns and usages.

Perhaps true, but the manifesto was addressing the tradeoffs of the two
approaches in that the majority of the cases in an information management
system will tend toward the ad-hoc, so a query optimizer will generally do
better.  Its the same argument that was made for 3GLs over Assembler.

   An example is computing the transitive closure of a given parent object.
   First of all, the standard relational algebra does not support a query like
   "Find all children belonging to a given parent" in a single query
   expression (although some extended relational systems allow queries to
   compute transitive closures). 

Actually, Codd's new book addresses this with the recursive outer-join, so the
new "standard" for the relational model addresses this.  Now if we could only
get the relational vendors to realize this.

   				A common solution is to implement a "for
   loop" in the application code to compute the closure one record at a time.
   The result is that the "select" query must be optimized for each loop (some
   smarter query optimizers will detect the looped query and stored its
   optimized query graph and execution method in pre-compiled modules so that
   they can be reused). Furthermore, the query optimizer cannot take advantage
   of buffered records because the next query will use the previous child
   value as its current parent search value that is most likely not in the
   same buffer (or page).

On the other hand, another common approach used is to do an n-way outer-join
from parent to child to grandchild where n is the known number of levels in
the hierarchy.  This is the best that the current generation of relational
systems can do and it does allow the query optimizer to do some optimizations.
However, if the hierarchy is sparsely populated, this isn't that good.

   			On the other hand, an OODB user could use procedure
   calls to write the same loop without going through time-consuming
   optimizations and could dereference child pointers recursively. If the OODB
   schema defines a clustering based on this reference mode, users will be
   able to gain even more performance with fewer disk accesses because most
   child objects will have been cached into the buffer during the initial
   access.

A relational system can take advantage of the same clustering idea.  "Time
consuming optimizations", IMHO, are in the eye of the beholder.  An OODB
wouldn't have the full understanding of relationships that can occur in a
database and, so, couldn't take advantage of a recursive join operation
(unless the OODB was a relational DB).

   Another example would be a traversal of objects that involves computation,
   like a CAD application where some optimization of the connections between
   objects involves computation in the application language. I would argue
   that (1) navigation is much more "natural" for these computations than
   using a mixture of queries and programming, and (2) the mixture is
   inefficient for the reasons suggested above.

But the Manifesto's Prop 1.1 states the need for a "rich type system".  This
should include all the computational capability needed to support the new
type.  So, the "true" third generation system should merge more seamlessly the
programming idea with the query idea.

   As for the impact of schema evolution, I agree that the use of "views" in
   relational databases offers good insulation for applications from changes
   to the database schema definition. However, specifying the data elements
   with a declarative query language does not guarantee insulation if the
   primitive data element definition is changed. Also, some OODBs support
   "derived" objects, which provide a service like views. (Derived objects are
   defined procedurally or declaratively by a set of pre-conditions and
   post-conditions to instantiate or modify the objects.)

I think views are better thought of as a composition operation than a
derivation operation.  The current generation of relational database systems,
IMHO, does not really support enough of the view concept to truly insulate
applications from the database design (lack of multi-table updates).

   In the same proposition, the manifesto questioned the performance benefit
   for OODBs that use low-level calls to navigate individual objects. It also
   criticized CAD programmers as being close-minded for not using query
   optimizers provided by databases systems. I feel that both arguments are
   rather misleading. First, there are various techniques proposed by many
   OODB researchers to address and resolve the performance issue. For example,
   a direct memory map technique currently used by one OODB vendor has
   reported tremendous performance gains over other indexed or hash-based
   dereferencing techniques, such as those that were mentioned in the
   manifesto. Numerous published cases have also reported poor performance
   when using off-the-shelf relational databases to support object navigation,
   such as in the closure computation example described earlier.

Do you think that relational systems could not improve themselves with the
above solutions?

   Proposition 2.4 of the manifesto says that "Performance indicators have
   almost nothing to do with data models and must not appear in them." I
   disagree with this claim. Although performance is heavily influenced by
   individual implementation techniques, there exists inherent limitations on
   the performance achievable for the underlying data models. For example, the
   relational model explicitly disallows the storing of ordered tuples. This
   makes it very inefficient to represent lists, and users are forced to sort
   on a sequence number implemented by their applications.

What?!?  True, relations are inherently non-ordered things in the model, but
that does not eliminate indexing in an implementation of the relational model.
Therefore, the optimizer can determine that an "order by" has no work to do
because the tuples are already ordered by the index in the needed fashion.

   It is always possible to extend existing database models with new features
   and constraints through arbitrary implementations. But the end result would
   be undesirable if the extension exceeds the limitations of the model, or
   lacks the support of a formal mathematical representation.

As has been seen, current relational implementations are in no way approaching
the limits of the relational model (see Codd's new book).

   Proposition 3.1 and 3.2 of manifesto say that "Third generation DBMSs must
   be accessible from multiple HLLs" and "Persistent X for a variety of Xs is
   a good idea. They will all be supported on top of a single DBMS by compiler
   extensions and a (more or less) complex run time system". In theory, I
   agree that next-generation databases (either third-generation or OODB)
   should be accessible from multiple HLLs (High Level Languages), and the
   DBMS should provide a multiple run-time type translations between
   declarative query languages and HLLs.  However it is impractical for DBMSs
   to support all HLLs. For example, many MIS programmers are interested in
   adopting new object-oriented technologies (that is, to use object-oriented
   design methodology and object-oriented programming languages) to implement
   new MIS applications if they are given the opportunity to do so rather than
   revamping and retrofitting existing COBOL code. If they are given the
   opportunity to choose a DBMS to match with their new object-oriented
   applications, most likely they will use a fully supported OODB product
   because it provides a better match.

However, as has been seen, the "existing COBOL code" doesn't go away as it is
usually the revenue producing code.  Therefore, the adoption of new
technologies (object-oriented or otherwise) usually requires a "ramping up" to
full implementation with old code being adjusted to "try out" the new ideas.

   In relational databases, the type "impedance mismatch" between SQL and HLLs
   have long been criticized as being inefficient and unnatural. Even if the
   third-generation DBMSs provide brilliant ways to bridge the gap between all
   HLLs and SQL, new object-oriented users will always opt for the OODB
   because they are a natural fit.

Even relational people criticize SQL as not being really relational.  I don't
believe the manifesto called for standardizing on SQL (but I don't have it in
front of me).

   As for OODBs, although the lack of declarative query languages and a formal
   object algebra/calculus make it less intuitive for end users to use
   currently, many researchers have proposed different solutions and
   approaches to resolve this deficiency. We must allow more time for this new
   technology to be refined and improved, just as it took more than a decade
   for relational databases to become mature and popular, and replace the old
   network and hierarchical databases.

I agree as long as the ultimate OODB programmer (not the OODBMS programmer)
isn't really just a network or hierarchical database programmer with a new
name. 

   In summary, I feel that there is room for both technologies to co-exist,
   and new database models will always be proposed to address existing
   deficiencies.  We will probably see some fusion of both database approaches
   in the near future that will benefit database users. In the long run, I
   foresee OODBs replacing (extended) relational DBMSs in selected market
   segments. I would also predict that the next wave after OODBs will be
   fully-integrated, intelligent database (or knowledge-based) systems that
   will combine both AI and database technologies.

I also see the database technologies merging in the future, but I also see
more complete implementations of the current systems solving many of the
problems that have been seen.  I wouldn't abandon the current technology quite
yet.   ;-)
--
====================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mtn. View, CA  94043
====================================================================
"If someone thinks they know what I said, then I didn't say it!"