Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!zephyr.ens.tek.com!tektronix!sequent!dafuller From: dafuller@sequent.UUCP (David Fuller) Newsgroups: comp.databases Subject: Re: Is RDBMS unproven technology? (Flames to follow....) Message-ID: <40132@sequent.UUCP> Date: 6 Aug 90 21:22:57 GMT References: <1073@ashton.UUCP> <10371@sybase.sybase.com> Reply-To: dafuller@sequent.UUCP (David Fuller) Organization: Sequent Computer Systems, Inc Lines: 100 In article <10371@sybase.sybase.com> tim@ohday.sybase.com (Tim Wood) writes: >In article <1073@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes: >>Are relational databases an unproven technology regarding >>performance? >> >>....A key tenet of the report is that RDBMS technology has been >>available for 20 years but still has not been proved in large, >>complex applications. The report notes that users associate >>these products with poor system performance, even though they may >>be flexible and easier to implement. The article then goes on >>to cite a firm that is reluctant to replace IMS with DB2, and >>discusses other sites that use a mixture of relational and >>possibly non-relational systems. Some random thoughts from someone who's in the trenches and deals with less than, uhh, theoretical arguments... In my experience with Very Large Databases, the DBMS type is less important than the quality of the individual system's implementation. The 10% of the time you spend developing is quickly subsumed by the requirement to plan for and provide a stable applications environment. To wit: The typical SQL-based RDBMS is abstract enough from what's going on down deep to permit gross errors in implementation. I've looked at systems which fetched 100,000 records and threw away every one except the single tuple of interest. The fact that it was an RDBMS was irrelevant. You coulda been doing IMS or FOCUS and made that mistake. Axiom 1: There is no substitute for planning. > >Relational systems have so far been deployed in smaller-scale >applications than have hierarchical and network systems. This is due >to several factors: relational is "newer" (that is, the technology >existed long before successful commercial products) and the older >database architectures were deployed in the days when nearly all >commercial computing resources were centralized and operating in a >batch-processing environment. In that environment, updates and access >to the database are relatively rigidly controlled. Sure, relational is "new", but the basic access methods have not considerably improved; we still use B-trees and relative files and maybe hashed files. The "relational" aspect is a layer above this. I can write slow code in any environment; and there is nothing inherent to the relational model which makes it slower than any other model. The fact is that noble 3NF implementations almost always get mutated by harsh reality: that you end up generating "extract" tables and other de-facto optimizations once you do a simple calculation of how many I/Os it's gonna take to support your subsecond, online application. That's reality; you can either spend money for hardware or take a hardnosed approach to implementation. Second, the biggest horror to big DBMS DBAs is the unknown called "ad-hoc queries". It is easy to hurt a production system on many platforms by issuing queries from hell that can't ever complete but require massive sequential scans. Big DBMS engines usually have strict controls on adhocery and either prioritize them low or require they complete in batch. In fact, lots of big systems do overnite extracts and provide an online system to promote decision support. Rarely do these systems permit queries to "live" data simply because supporting the surge load caused by adhoc in current implementations costs too much money. Axiom #2: Ad-hoc means unpredictable, which represents a basic incongruety against the goal of production. No current DBMS or implementation knows how to balance the two in a truly large implementation automatically. (I have not seen the DBMS yet that sends me mail and counsels "Dave, I've been reviewing access patterns and I really think you should consider a clustered index...") ... In conclusion: 1) There's no free lunch. Until we find a more expressive mechanism for revealing the intent of the user to the DBMS then we're going to live with controls over what a particular user can do. We need to be able to control plowing of new furrows thru a DBMS carefully versus handling heads-down data entry with predictable speed. 2) Experience at Tandem shows that a true SQL RDBMS doesn't have to be slower, in fact the State of California has committed to NonStop SQL for their entire vehicle database based on some strenuous benchmarks. 3) We are a long ways away from creating DBMS systems into which data can be poured and the relied on to balance access and update needs. No matter what your implementation, it will take intelligence and forethought to create a successful implementation. Speaking for myself, as always... -- Dave Fuller Sequent Computer Systems Think of this as the hyper-signature. (312) 318-0050 (humans) It means all things to all people. {uunet,sun,...}!sequent!dafuller