Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!ucbcad!pasteur!postgres!larry From: larry@postgres.uucp (Larry Rowe) Newsgroups: comp.databases Subject: Re: Data vs Application oriented design Keywords: relational,network,comparison Message-ID: <89@pasteur.Berkeley.Edu> Date: 8 Jan 88 02:06:27 GMT References: <12673@pyramid.pyramid.com> <69@coot.AUSTIN.LOCKHEED.COM> Sender: news@pasteur.Berkeley.Edu Reply-To: larry@postgres.UUCP (Larry Rowe) Organization: Postgres Research Group, UC Berkeley Lines: 133 recently there have been several messages about relational -vs- non-relational systems and the difficulty of developing production applications in relational systems. i resisted ``putting my 2 cents worth'' in, but couldn't, so... 1. problem with program iteration in SQL. this is a problem with the design of SQL. Embedded QUEL has a retrieve construct that looks something like: RETRIEVE (name=emp.name, salary=emp.salary) WHERE emp.salary > 40000 { } that executes the host language stmts once for each record that qualifies in the emp table. the program variables name and salary are set to the appropriate values in each qualifying record. the designers of embedded SQL didn't put this construct into their research prototype, so it wasn't part of the product. least i sound too much like ``quel was wonderful, sql sucks,'' i should mention that Univeristy INGRES did not allow nested queries (i.e., the host language stmts above could not include another QUEL stmt) because of the way INGRES was first implemented on a PDP-11 (remember them, 64K byte processes) in 4 or 5 processes. it was just too complicated to do at the same time as writing all the rest of the system. 2. the iteration issue has been solved by just about every programming language designed with new query language constructs (i.e., no SQL compatible). I developed a language at Berkeley, named Rigel, that ran on top of Unversity INGRES that allowed the example above to be coded as for e in emp where e.salary > 40000 do end where references to fields in qualifying records could be written ``e.name'' in . not hard to do in a new language. almost impossible to do when embedding a query language into an existing programming language (unless you want to write a complete parser for the programming language). 3. relational -vs- non-relational. this issue is a little surprising at this late date. relational systems use many of the same implementation strategies that non-relational systems used. sometimes they work, sometimes they don't. for example, someone mentioned the idea of storing precomputed joins (i.e., storing records of different types close to each other). ORACLE implemented a version of this several years ago and it didn't work very well. (at first, their implementation was bad, but i thought they cleaned up later.) from what i could tell, it didn't really improve the performance of their system. now, why could this happen? basically, for one of several reasons: 1. their implementation wasn't good. 2. performance was dominated by overhead elsewhere in the system (e.g., parsing, planning, or executing the commands OR getting records from disk page and converting/copying them to the application program OR application program processing of the return records (e.g., converting to print rep, drawing on screen, etc.) dominated elapsed time). 3. people didn't run the query that this implementation optimizes very often (more below). so, why does this strategy not work better. well, let's look at what queries it improves. let's suppose we have two relations DEPT(dname,...) and EMP(name,...,dept) and that i precompute join (DEPT,EMP, EMP.dept=DEPT.dname) what queries can take advantage of this structure? 1. ret(emp.all) where qual(emp) NOPE 2. ret(dept.all) where qual(dept) NOPE 3. ret(emp.all,dept.all) where qual(emp,dept) and emp.dept-dept.dname MAYBE whether or not #3 goes faster depends on how many dept objects (i.e., dept record + all emp records) are fetched. if it is a complete scan, the query will do *exactly the same number of page reads* that the partition per table implementation will perform. hence, no advantage. what if the query fetches just one dept object? this query should go faster because it will require fewer page reads (1 instead of 2 -- remember appropriate storage structure design will cluster emp's on dept and hash/btree dept). so, how much different will this make? consider a 1 VAX780 machine. typical relational systems take approx. 70 msec to execute a null query (i.e., get through communication to the backend and process null query and return). i/o's are maybe 25 msec. and it takes 3 msecs to process each return record. so time to fetch dept object is: 70 + 25*p + n*3 suppose n=25. that means p=1 ==> 170 msec and p=2 ==> 205msec (i.e., 20% faster). now, can you detect 20% faster queries when the time is less than 1/4 of a second? probably not. now, assuming you buy my arguement above (which you're free to disagree with), why are relational systems slow when compared to non-relational systems? several reasons. first, the non-relational systems didn't compile and plan queries at run-time. second, the dbms code was linked into the user program so that calls from the application program were subroutine calls not remote procedure calls (i.e., pass msg and switch processes). third, some relational products do not have efficient access methods (e.g., do they support hash, can data be stored in storage structure or is the storage structure really just an index, was the program and database optimized for performance, etc.). fourth, many times a performance problem is a locking problem not a storage or query processing problem. maybe there's a critical lock/section of code that limits the performance of the systems. etc, etc. i could go on. 4. relational performance but, are relational systems really slower? my experience with benchmarks between commercial network systems and relational systems on VAX's is NO. i frankly was very surprised in the early 1980's RTI-INGRES and ORACLE did as well as they did against SEED, DBMS-32, and other hybrid systems. in fact, relational systems can be excellent tools for building production applications. (btw, to the fellow who questioned whether real production systems can be implemented with relational systems, the answer is yes. many companies are running their entire business, or key segments of their business or manufacturing DP using relational systems. moreover, tandem's DBMS is essentially a relational system and they've made a very good business out of on-line xact processing applications -- heard of the new york stock exchange. wonder why their computer systems didn't crash when the market crashed and trading volumes > 500M shares traded; 3-5 times average. reason was it was a tandem system.) enough preaching. what problems do relational systems have that probably account for the complaints registered? as with most new technologies, it has been oversold. novice users are not going to write production applications that are bug free and highly efficient. moreover, experienced programmers are going to have to learn some new ways of building and optimizing systems. and finally, relational products are going to have to mature so that they can satisfy the needs of more users. i believe the biggest problem that relational products have today is providing the tools to make it easier to solve the performance problems that people invariably fall into (e.g., database design tools - both logical and physical design, on-line performance monitors, application design tools to simplify the construction of OLTP applications (see tandem pathway product or ibm CICS product), etc.). larry rowe