Path: utzoo!attcan!uunet!cimshop!davidm From: cimshop!davidm@uunet.UU.NET (David S. Masterson) Newsgroups: comp.databases Subject: Re: Comment on the "Third-Generation Database System Manifesto" Message-ID: Date: 24 Sep 90 03:43:50 GMT References: <21178@hercules.csl.sri.com> Sender: davidm@cimshop.UUCP Distribution: comp Organization: Consilium Inc., Mountain View, California. Lines: 244 In-reply-to: donovan@julius.csl.sri.com's message of 21 Sep 90 17:33:43 GMT In article <21178@hercules.csl.sri.com> donovan@julius.csl.sri.com (Donovan Hsieh) writes: Recently, the paper "Third-Generation Database System Manifesto" was published in the 1990 ACM SIGMOD Conference Proceedings. [...deleted...] The paper was also partly written in response to an earlier published paper, "The Object-Oriented Database System Manifesto," Both papers were very interesting. Too bad there's such a schism in the database community. Proposition 1.1 of the manifesto says that "A third generation DBMS must have a rich type systems." I agree that entirely new database systems such as OODBs are not needed to support abstract data types (ADT). But I doubt that ALL types can be added to or extended from the current relational systems. Stretching existing relational systems beyond their inherent limits would most likely cause an inefficient implementation. I feel that a pure OODB approach is more suitable and will be better able to provide full ADT features that are compatible with the existing object-oriented programming languages, such as C++. Examples please. There's too much "feeling" in this paragraph. By the way, one criticism I have of the OO paper is the lack of credible reasoning on why a system implementing the relational model (the current relational systems are not really *completely* relational) is not a basic OODB that can be built upon to full object-oriented status. Proposition 1.3 of the manifesto says that "Functions, including database Procedures and methods, and encapsulation are a good idea." However, it makes the criticism that some OODBs require users to use only functions to access data elements (attributes) of a collection (object instance). In fact, there are some OODB systems that allow an object class to specify public and private attributes, where a public attribute can be directly accessed by database query languages, and a private attribute can only be accessed through pre-defined methods to protect object integrity. You contradict yourself. "Some OODBs require" means just that -- some. Also, what is the definition of "object integrity" in this case and how do these methods differ from "constraints and triggers" (IMHO, there is none). Proposition 1.4 of the manifesto says that "Unique identifiers (UIDs) for records should be assigned by the DBMS only if a user-defined primary key is not available". It also argues that a human-readable, immutable primary key in relational systems is superior to the UID or OID used by OODBs. A UID in an OODB has different meanings and purposes than a primary key in a relational system. A UID guarantees that no two object instances will contain the same ID during the lifetime of the system. Also, it is very costly when a primary key is CHANGED in a relational database, such as when someone's SS# is initially entered incorrectly. It must then be changed everywhere it was used as a foreign key. If a person is referenced by UID, the SS# only has to be changed in one place. Actually, the immutable, system-assigned primary key was a central part of the Codd/Date RM/T model (Date's Introduction to Database Systems V2), so the UID concept doesn't disagree with the relational model. This is one point where, IMHO, the manifesto disagrees with the model. IMO, though, the question foreign key updating could be handled by the proper use of constraints and triggers. Proposition 2.1 of the manifesto says that "Essentially, all programmatic access to a database should be through a non-procedural, high-level access language". It argues that the navigational approach used in OODBs is undesirable and inefficient comparing with the use of non-procedural query languages in relational systems. I feel that this proposition is rather misleading. The manifesto claims that a well-written and well-tuned query optimizer can almost always produce a better execution method than a human. A query optimizer could probably do a better job for repetitive and straightforward accesses. However, there are cases where human navigation is required and a query optimizer cannot foresee all patterns and usages. Perhaps true, but the manifesto was addressing the tradeoffs of the two approaches in that the majority of the cases in an information management system will tend toward the ad-hoc, so a query optimizer will generally do better. Its the same argument that was made for 3GLs over Assembler. An example is computing the transitive closure of a given parent object. First of all, the standard relational algebra does not support a query like "Find all children belonging to a given parent" in a single query expression (although some extended relational systems allow queries to compute transitive closures). Actually, Codd's new book addresses this with the recursive outer-join, so the new "standard" for the relational model addresses this. Now if we could only get the relational vendors to realize this. A common solution is to implement a "for loop" in the application code to compute the closure one record at a time. The result is that the "select" query must be optimized for each loop (some smarter query optimizers will detect the looped query and stored its optimized query graph and execution method in pre-compiled modules so that they can be reused). Furthermore, the query optimizer cannot take advantage of buffered records because the next query will use the previous child value as its current parent search value that is most likely not in the same buffer (or page). On the other hand, another common approach used is to do an n-way outer-join from parent to child to grandchild where n is the known number of levels in the hierarchy. This is the best that the current generation of relational systems can do and it does allow the query optimizer to do some optimizations. However, if the hierarchy is sparsely populated, this isn't that good. On the other hand, an OODB user could use procedure calls to write the same loop without going through time-consuming optimizations and could dereference child pointers recursively. If the OODB schema defines a clustering based on this reference mode, users will be able to gain even more performance with fewer disk accesses because most child objects will have been cached into the buffer during the initial access. A relational system can take advantage of the same clustering idea. "Time consuming optimizations", IMHO, are in the eye of the beholder. An OODB wouldn't have the full understanding of relationships that can occur in a database and, so, couldn't take advantage of a recursive join operation (unless the OODB was a relational DB). Another example would be a traversal of objects that involves computation, like a CAD application where some optimization of the connections between objects involves computation in the application language. I would argue that (1) navigation is much more "natural" for these computations than using a mixture of queries and programming, and (2) the mixture is inefficient for the reasons suggested above. But the Manifesto's Prop 1.1 states the need for a "rich type system". This should include all the computational capability needed to support the new type. So, the "true" third generation system should merge more seamlessly the programming idea with the query idea. As for the impact of schema evolution, I agree that the use of "views" in relational databases offers good insulation for applications from changes to the database schema definition. However, specifying the data elements with a declarative query language does not guarantee insulation if the primitive data element definition is changed. Also, some OODBs support "derived" objects, which provide a service like views. (Derived objects are defined procedurally or declaratively by a set of pre-conditions and post-conditions to instantiate or modify the objects.) I think views are better thought of as a composition operation than a derivation operation. The current generation of relational database systems, IMHO, does not really support enough of the view concept to truly insulate applications from the database design (lack of multi-table updates). In the same proposition, the manifesto questioned the performance benefit for OODBs that use low-level calls to navigate individual objects. It also criticized CAD programmers as being close-minded for not using query optimizers provided by databases systems. I feel that both arguments are rather misleading. First, there are various techniques proposed by many OODB researchers to address and resolve the performance issue. For example, a direct memory map technique currently used by one OODB vendor has reported tremendous performance gains over other indexed or hash-based dereferencing techniques, such as those that were mentioned in the manifesto. Numerous published cases have also reported poor performance when using off-the-shelf relational databases to support object navigation, such as in the closure computation example described earlier. Do you think that relational systems could not improve themselves with the above solutions? Proposition 2.4 of the manifesto says that "Performance indicators have almost nothing to do with data models and must not appear in them." I disagree with this claim. Although performance is heavily influenced by individual implementation techniques, there exists inherent limitations on the performance achievable for the underlying data models. For example, the relational model explicitly disallows the storing of ordered tuples. This makes it very inefficient to represent lists, and users are forced to sort on a sequence number implemented by their applications. What?!? True, relations are inherently non-ordered things in the model, but that does not eliminate indexing in an implementation of the relational model. Therefore, the optimizer can determine that an "order by" has no work to do because the tuples are already ordered by the index in the needed fashion. It is always possible to extend existing database models with new features and constraints through arbitrary implementations. But the end result would be undesirable if the extension exceeds the limitations of the model, or lacks the support of a formal mathematical representation. As has been seen, current relational implementations are in no way approaching the limits of the relational model (see Codd's new book). Proposition 3.1 and 3.2 of manifesto say that "Third generation DBMSs must be accessible from multiple HLLs" and "Persistent X for a variety of Xs is a good idea. They will all be supported on top of a single DBMS by compiler extensions and a (more or less) complex run time system". In theory, I agree that next-generation databases (either third-generation or OODB) should be accessible from multiple HLLs (High Level Languages), and the DBMS should provide a multiple run-time type translations between declarative query languages and HLLs. However it is impractical for DBMSs to support all HLLs. For example, many MIS programmers are interested in adopting new object-oriented technologies (that is, to use object-oriented design methodology and object-oriented programming languages) to implement new MIS applications if they are given the opportunity to do so rather than revamping and retrofitting existing COBOL code. If they are given the opportunity to choose a DBMS to match with their new object-oriented applications, most likely they will use a fully supported OODB product because it provides a better match. However, as has been seen, the "existing COBOL code" doesn't go away as it is usually the revenue producing code. Therefore, the adoption of new technologies (object-oriented or otherwise) usually requires a "ramping up" to full implementation with old code being adjusted to "try out" the new ideas. In relational databases, the type "impedance mismatch" between SQL and HLLs have long been criticized as being inefficient and unnatural. Even if the third-generation DBMSs provide brilliant ways to bridge the gap between all HLLs and SQL, new object-oriented users will always opt for the OODB because they are a natural fit. Even relational people criticize SQL as not being really relational. I don't believe the manifesto called for standardizing on SQL (but I don't have it in front of me). As for OODBs, although the lack of declarative query languages and a formal object algebra/calculus make it less intuitive for end users to use currently, many researchers have proposed different solutions and approaches to resolve this deficiency. We must allow more time for this new technology to be refined and improved, just as it took more than a decade for relational databases to become mature and popular, and replace the old network and hierarchical databases. I agree as long as the ultimate OODB programmer (not the OODBMS programmer) isn't really just a network or hierarchical database programmer with a new name. In summary, I feel that there is room for both technologies to co-exist, and new database models will always be proposed to address existing deficiencies. We will probably see some fusion of both database approaches in the near future that will benefit database users. In the long run, I foresee OODBs replacing (extended) relational DBMSs in selected market segments. I would also predict that the next wave after OODBs will be fully-integrated, intelligent database (or knowledge-based) systems that will combine both AI and database technologies. I also see the database technologies merging in the future, but I also see more complete implementations of the current systems solving many of the problems that have been seen. I wouldn't abandon the current technology quite yet. ;-) -- ==================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"