Path: utzoo!utgpu!watmath!att!pacbell!rtech!menace!dennism
From: dennism@menace.rtech.COM (Dennis Moore (x2435, 1080-276) INGRES/teamwork)
Newsgroups: comp.databases
Subject: Re: Extended RDB vs OODB
Summary: Where do these facts come from?
Keywords: OODB C++ RDBMS
Message-ID: <3367@rtech.rtech.com>
Date: 15 Aug 89 16:57:35 GMT
References: <3560052@wdl1.UUCP> <411@odi.ODI.COM> <458@cimshop.UUCP> <2177@cadillac.CAD.MCC.COM> <20@dgis.daitc.mil> <2230@cadillac.CAD.MCC.COM>
Sender: news@rtech.rtech.com
Reply-To: dennism@menace.UUCP (Dennis Moore (x2435, 1080-276) INGRES/teamwork)
Organization: Relational Technology, Inc. (Opinions expressed are the writers own)
Lines: 113

In article 3452 of comp.databases, speyer@joy.cad.mcc.com (Bruce Speyer) writes:

>In article <20@dgis.daitc.mil> jkrueger@dgis.daitc.mil (Jonathan Krueger) writes:
>>speyer@joy.cad.mcc.com (Bruce Speyer) writes:
>>
>>>If an application must cross its process boundary in order to
>>>communicate with the database system it probably is at least two orders
>>>of magnitude too slow.  That is why all of the C++ based OODBMS efforts
>>>are using the application memory heap for the cache.
>>
>>Could you provide some performance measurement data that qualify
>>and quantify this assertion?
>>
>>-- Jon
>
>No, I don't have the numbers or the time to work them up.  Perhaps somebody else
>could provide actual statistics and even disprove my assertion.  It would be
>interesting to hear from somebody involved with the HP Iris system which is
>based upon a relational database.
>

It is true that changing contexts takes a small number of milliseconds,
depending primarily on the architecture of the CPU (i.e. an 80x86 takes a long
time, because it is a segmented architecture, 68x00 takes the same amount of
time to do a kernel call as a non-kernel call).  However, you must do a context
switch to call a C++ library routine or to call a database routine, so there
is not much difference there.  The difference in instantaneous response present
currently in most DBMS's (OO *OR* R) is that they are client-server (or multi-
server, in the case of INGRES (caveat -- I work for RTI and INGRES is our
product)).  This means to access data, you use IPC (inter-process communication)
rather than a function call.  IPC generally is much slower than a function call,
but let's not forget one *MAJOR* saver here -- the SAME server can serve
literally hundreds of users.  If each had it's own linked copy of the C++ data
access routines, there would be so much swapping/paging going on on the host,
that nothing would get done.  Even if linked libraries were used, each user
would have her own data segments etc., and would use many more resources than
the DBMS does currently.  Therefore, I have no issue with the claim that a
single user system is better off with a highly tuned, memory hogging,
specialized access method, than an RDBMS.

>About 3 years ago I tried putting an electronic information model on top of a
>relational system.  It took about 30-40 times longer to netlist a circuit then
>it did using a fairly inefficient internally developed memory-based database
>system. An operation such as packaging the electronics is much worse since it
>must transverse much more of the electronic information model and be constantly
>refering to the library portion of the model which was distributed to another
>database (making the join operation much more expensive).
>

Excuse me, have you heard of distributed database?  INGRES*STAR would allow you
to keep your packaging information in a separate "database," and still do joins
just as if the data were in the same database.  The concept of "a database" (as
opposed to "a different database") basically goes away, as the user can pick
and choose tables from multiple "databases" to be in one STAR database.  Maybe
the reason it was slow was that you didn't know what you were doing.

Let me posit a different architecture for your electronic information model.
Could you have read in all the data into memory from an RDBMS and performed
the same manipulations in-core that you did in your system?  The advantage to
this architecture is that you can lock the records while you are manipulating
them (with THREE WORDS ("FOR DIRECT UPDATE"), as opposed to many lines of code),
you get all the transaction processing capabilities of the DBMS (i.e. rollback,
savepoints, commit), you get all the utilities of the DBMS, etc.  To put it
in a few words, YOU GET THE *MS* FROM THE DBMS, and you do your own processing.

>Compare the cost of processing a tuple at a time to a C++ style database.  If
>the object is in-memory then optimally an indirect reference and a test is all
>that is required to transverse a relation or access an attribute.
>

What a surprise!  In INGRES, there is a concept called a TABLE FIELD (NOTE --
many other databases (such as Gupta's RESULT SETS, Sybase's SETS, etc.) have
the same concept with other terms).  You select a SET AT A TIME, NOT A TUPLE
AT A TIME into the TABLE FIELD.  BTW, do you know that a database oriented to
TUPLE AT A TIME processing is not relational?  By definition, a relational
database can process a SET AT A TIME.  For instance, if the diagram tuple has
a surrogate key DIAGRAM#, which is a foreign key for the components table (which
I will call COMPONENTS), then you could find all the components of a diagram
by the following SQL statement:
	SELECT * FROM COMPONENTS WHERE DIAGRAM# = :diagram_number;
where diagram_number is a C variable (for instance) containing the number of
the host diagram.  The results of this select could be stored in a table field
and manipulated in core.  BTW, all the table field manipulations (i.e.
INSERTROW, DELETEROW, etc.) are in our language, so you don't have to write
list processing classes -- we already did.

So, in summary, whether you use an OO system or an RDBMS (which has OO
features and capabilities), you can process the data in memory.  You STILL
have to get that data from disk and to disk SOMETIME, and the RDBMS will be
better at that.  In addition, the RDBMS already comes with the in-memory
manipulation features.  The RDBMS also protects against hardware and software
(i.e. the break key) failures and provides you with the capability to start
off a process and then back out if you don't like the results.  The RDBMS is
optimized to provide consistency and concurrency for the data.  The OO "faction"
here kees talking about what RDBMS's don't do, and yet every example so far
has been doable with an RDBMS today.  I am *SURE* that there *ARE* things that
an OODB can do, but RDBMS's are developing new features faster (there are more
people in engineering in *MY* company than in their whole company) and faster.

I would like to point out that only two people are doing this rather poor
defense of the entire OODB industry.  After all, if OO was not a good idea,
we wouldn't be developing even more OO features now.

>My apologies for not being able to back up my statements with benchmarks.

'Nuff said ...

>Bruce Speyer / MCC CAD Program                        WORK: [512] 338-3668
>3500 W. Balcones Center Dr.,  Austin, TX. 78759       ARPA: speyer@mcc.com 
>
>

-- Dennis Moore, my own opinions, etc etc etc