Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!dimacs.rutgers.edu!seismo!beno.CSS.GOV
From: jean@beno.CSS.GOV (Jean Anderson)
Newsgroups: comp.databases
Subject: Re: Join Contest
Message-ID: <49001@seismo.CSS.GOV>
Date: 18 Jul 90 01:23:21 GMT
Sender: usenet@seismo.CSS.GOV
Distribution: usa
Organization: Center for Seismic Studies, Arlington, VA
Lines: 50

Mike Tossy writes:
> In article <CIMSHOP!DAVIDM.90Jul15215312@uunet.UU.NET>, cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
> > In article <12072@blia.BLI.COM> miket@blia.BLI.COM (Mike Tossy) writes:
> > 
> >    My observation is that the number of tables used by an application 
> >    increases dramaticly if the database design is well normalized.
> > 
> > Is that good or bad?  (Speaking from your observations that is :-)
> 
> I think it is good.  The normal reason to denormalize is to gain
> performance.  Assuming that your RDBMS is good at joins this is not
> necesary, and you can even gain speed by normalizing because the resulting

I think "well normalized" is a matter or style and frequently gets confused
with excessive decomposition.  I have seen tables that were so decomposed 
they consisted of little more than primary and foreign keys.  This may be OK
for an application which is entirely canned, but it makes the logical design
artificially complex for any database which has to support interactive SQL 
users.

Also, my experience with two different products so far tells me that the higher
the number of joins, the longer the query takes to run.  With a large amount
of data or complex queries, this can become prohibitive.  I may get flamed 
off the net, but I try to normalize to third normal form, then denormalize 
for performance.  Here are several ways I back off:

   1	If there is a one to one relationship between two tables, they
	share the same primary key and are frequently retrieved together,
	I consider making them a single table so long as the table doesn't 
	get too wide. "Too wide" is a matter of style; I start getting 
	uncomfortable with tables that have more than 20-25 columns.

   2	If a join is used frequently to do a simple lookup, I include
	that field as redundant data and eliminate the join.  I take a
	serious look at it to see if this will introduce update anomalies.

   3	I use repeating columns to store aggregate information rather than 
	have it be computed on the fly.

I realize that breaking rules like this would drive a purist nuts, but I 
answer to a user community that daily wants to see it all go faster with
the current product on the current platform.

   -	Jean Anderson
	SAIC, Geophysics Division
	jean@seismo.css.gov

            +++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++   Any opinions are mine, not my employer's. +++++++++++++++++
            +++++++++++++++++++++++++++++++++++++++++++++++++