Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!decwrl!nsc!voder!blia!miket
From: miket@blia.sharebase.com (Mike Tossy)
Newsgroups: comp.databases
Subject: Re: SQL Duplicate Row Deletion ???
Message-ID: <13794@blia.sharebase.com>
Date: 4 Apr 91 19:14:22 GMT
References: <91091.141528SYSPMZT@GECRDVM1.BITNET> <1991Apr3.085710.57@cim-vax.honeywell.com>
Organization: ShareBase Corp, Los Gatos, CA
Lines: 58

In article <1991Apr3.085710.57@cim-vax.honeywell.com>, tdoyle@cim-vax.honeywell.com writes:
> In article <1991Apr3.010838.2063@eng.ufl.edu>, tsao@helios.tcad.ee.ufl.edu ( Tsao) writes:
> > In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes:
> > |> I've made a nice mistake loading data twice over several days into a table, and
> > ......
> > 
> > 	Would anybody give me an example justifying the need for duplicated tuples
> > in a table ?
> > 
> >  
> 
> In a "clean" database implementation there is NO excuse for such behavior,
> but alas it is simple/machine efficient? for DBMS vendors to do it this way.
> 
> Since the process of relational modelling means that ultimately the 
> tuple/relation is supposed to be in third normal form (or a close
> approximation) and thus should have one and only one set of attributes that
> uniquely identify the object. This set of attributes then should automatically
> have a unique index. None of the popular DBMS enforce this.

ShareBase III (an ANSI SQL based system) does create a unique index (by
default) on the primary key.  Unfortunatly the ANSI Standard requires that
duplicate rows be permitted - a really dumb idea - so we allow duplicate
rows if the DBA requests (Not that there is much useful that can be done
with them (dirty data excluded).

ShareBase II is QUEL based, like Ingres and others, and only permits
duplicates when there is no sorting (clustered) index.  This is mostly an
expense issue - finding duplicates without indices can be expensive.  Lots of
other things, like garbage collection, typically break when there is no
clustered index.  All our manuals and classes stress that a clustered index
was required on every table for proper system operation.  I'll guess this is
a good rule for the other QUEL based systems.

By the way, duplicates is one of the fundimental semantic differences between
SQL based and QUEL based systems.  QUEL defines duplicate tuples (rows) as
having no meaning and the optimizer is allowed (encouraged?!) to delete
duplicates.  SQL says that duplicates have meaning (although it doesn't give
you a way to manipulate them!) and the optimizer can only delete duplicates
at user request (using the distinct clause).  Thus projections which do not
include the primary key (including most aggregates) can give very different
results under SQL semantics and QUEL semantics.  And, just because you are
using SQL syntax does not mean you are getting SQL semantics.  Most of the
QUEL based systems long ago delivered SQL; but with QUEL semantics!  It is
the semantics (not the syntax) that keeps several of the vendors from passing
the NIST ANSI SQL tests.

> ...
>
> Bipin Prasad      bprasad@honcim1.honeywell.com

--
      >>>>>>  The above statements are only my opinions <<<<<<

Mike Tossy					ShareBase Coropration
miket@sharebase.com				14600 Wichester Blvd
(408) 378-7575 ext2200				Los Gatos, CA 95030
	(ShareBase is a subsidiary of Teradata Corportation)