Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!ucbcad!pasteur!postgres!larry
From: larry@postgres.uucp (Larry Rowe)
Newsgroups: comp.databases
Subject: Re: Data vs Application oriented design
Keywords: relational,network,comparison
Message-ID: <89@pasteur.Berkeley.Edu>
Date: 8 Jan 88 02:06:27 GMT
References: <12673@pyramid.pyramid.com> <69@coot.AUSTIN.LOCKHEED.COM>
Sender: news@pasteur.Berkeley.Edu
Reply-To: larry@postgres.UUCP (Larry Rowe)
Organization: Postgres Research Group, UC Berkeley
Lines: 133

<munch line>
recently there have been several messages about relational -vs- non-relational
systems and the difficulty of developing production applications in relational
systems.  i resisted ``putting my 2 cents worth'' in, but couldn't, so...

1. problem with program iteration in SQL.
  this is a problem with the design of SQL.  Embedded QUEL has a retrieve
construct that looks something  like:

	RETRIEVE (name=emp.name, salary=emp.salary)
	WHERE emp.salary > 40000
	{
		<host language stmts>
	}

that executes the host language stmts once for each record that qualifies
in the emp table.  the program variables name and salary are set to the 
appropriate values in each qualifying record.
  the designers of embedded SQL didn't put this construct into their research
prototype, so it wasn't part of the product.  least i sound too much like 
``quel was wonderful, sql sucks,'' i should mention that Univeristy INGRES
did not allow nested queries (i.e., the host language stmts above could not
include another QUEL stmt) because of the way INGRES was first implemented
on a PDP-11 (remember them, 64K byte processes) in 4 or 5 processes.  it
was just too complicated to do at the same time as writing all the rest of
the system.

2. the iteration issue has been solved by just about every programming language
designed with new query language constructs (i.e., no SQL compatible).  I
developed a language at Berkeley, named Rigel, that ran on top of Unversity
INGRES that allowed the example above to be coded as

	for e in emp where e.salary > 40000 do
		<stmts>
	end

where references to fields in qualifying records could be written ``e.name''
in <stmts>.  not hard to do in a new language.  almost impossible to do when
embedding a query language into an existing programming language (unless you
want to write a complete parser for the programming language).

3. relational -vs- non-relational.
  this issue is a little surprising at this late date.  relational systems
use many of the same implementation strategies that non-relational systems
used.  sometimes they work, sometimes they don't.  for example, someone
mentioned the idea of storing precomputed joins (i.e., storing records of
different types close to each other).  ORACLE implemented a version of this
several years ago and it didn't work very well.  (at first, their 
implementation was bad, but i thought they cleaned up later.)  from what
i could tell, it didn't really improve the performance of their system.  now,
why could this happen?  basically, for one of several reasons:

	1. their implementation wasn't good.
	2. performance was dominated by overhead elsewhere in the system
	   (e.g., parsing, planning, or executing the commands OR getting
	   records from disk page and converting/copying them to the 
	   application program OR application program processing of the return
	   records (e.g., converting to print rep, drawing on screen, etc.)
	   dominated elapsed time).
	3. people didn't run the query that this implementation optimizes very
	   often (more below).

so, why does this strategy not work better.  well, let's look at what queries
it improves.  let's suppose we have two relations DEPT(dname,...) and 
EMP(name,...,dept) and that i precompute 
	join (DEPT,EMP, EMP.dept=DEPT.dname)
what queries can take advantage of this structure?

	1. ret(emp.all) where qual(emp)		NOPE
	2. ret(dept.all) where qual(dept)	NOPE
	3. ret(emp.all,dept.all) 
	   where qual(emp,dept) and emp.dept-dept.dname	MAYBE

whether or not #3 goes faster depends on how many dept objects (i.e., dept
record + all emp records) are fetched.  if it is a complete scan, the query
will do *exactly the same number of page reads* that the partition per table
implementation will perform.  hence, no advantage.  what if the query fetches
just one dept object?  this query should go faster because it will require
fewer page reads (1 instead of 2 -- remember appropriate storage structure
design will cluster emp's on dept and hash/btree dept).  so, how much
different will this make?  consider a 1 VAX780 machine.  typical relational
systems take approx. 70 msec to execute a null query (i.e., get through
communication to the backend and process null query and return).  i/o's
are maybe 25 msec.  and it takes 3 msecs to process each return record.
so time to fetch dept object is:
	70 + 25*p + n*3
suppose n=25.  that means p=1 ==> 170 msec and p=2 ==> 205msec (i.e., 20%
faster).  now, can you detect 20% faster queries when the time is less than
1/4 of a second?  probably not.

now, assuming you buy my arguement above (which you're free to disagree with),
why are relational systems slow when compared to non-relational systems?  
several reasons.  first, the non-relational systems didn't compile and plan
queries at run-time.  second, the dbms code was linked into the user program
so that calls from the application program were subroutine calls not remote
procedure calls (i.e., pass msg and switch processes).  third, some relational
products do not have efficient access methods (e.g., do they support hash,
can data be stored in storage structure or is the storage structure really
just an index, was the program and database optimized for performance, etc.).
fourth, many times a performance problem is a locking problem not a storage
or query processing problem.  maybe there's a critical lock/section of code
that limits the performance of the systems.  etc, etc.  i could go on.

4. relational performance
  but, are relational systems really slower?  my experience with benchmarks
between commercial network systems and relational systems on VAX's is NO.
i frankly was very surprised in the early 1980's RTI-INGRES and ORACLE
did as well as they did against SEED, DBMS-32, and other hybrid systems.

in fact, relational systems can be excellent tools for building production
applications.  (btw, to the fellow who questioned whether real production
systems can be implemented with relational systems, the answer is yes.  many
companies are running their entire business, or key segments of their business
or manufacturing DP using relational systems.  moreover, tandem's DBMS is
essentially a relational system and they've made a very good business out
of on-line xact processing applications -- heard of the new york stock exchange.
wonder why their computer systems didn't crash when the market crashed and
trading volumes > 500M shares traded; 3-5 times average.  reason was it was
a tandem system.)

enough preaching.  what problems do relational systems have that probably
account for the complaints registered?  as with most new technologies, it has
been oversold.  novice users are not going to write production applications
that are bug free and highly efficient.  moreover, experienced programmers
are going to have to learn some new ways of building and optimizing systems.
and finally, relational products are going to have to mature so that they can
satisfy the needs of more users.  i believe the biggest problem that relational
products have today is providing the tools to make it easier to solve the
performance problems that people invariably fall into (e.g., database design
tools - both logical and physical design, on-line performance monitors,
application design tools to simplify the construction of OLTP applications
(see tandem pathway product or ibm CICS product), etc.).
	larry rowe