Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!decwrl!nsc!voder!blia!miket From: miket@blia.sharebase.com (Mike Tossy) Newsgroups: comp.databases Subject: Re: SQL Duplicate Row Deletion ??? Message-ID: <13794@blia.sharebase.com> Date: 4 Apr 91 19:14:22 GMT References: <91091.141528SYSPMZT@GECRDVM1.BITNET> <1991Apr3.085710.57@cim-vax.honeywell.com> Organization: ShareBase Corp, Los Gatos, CA Lines: 58 In article <1991Apr3.085710.57@cim-vax.honeywell.com>, tdoyle@cim-vax.honeywell.com writes: > In article <1991Apr3.010838.2063@eng.ufl.edu>, tsao@helios.tcad.ee.ufl.edu ( Tsao) writes: > > In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes: > > |> I've made a nice mistake loading data twice over several days into a table, and > > ...... > > > > Would anybody give me an example justifying the need for duplicated tuples > > in a table ? > > > > > > In a "clean" database implementation there is NO excuse for such behavior, > but alas it is simple/machine efficient? for DBMS vendors to do it this way. > > Since the process of relational modelling means that ultimately the > tuple/relation is supposed to be in third normal form (or a close > approximation) and thus should have one and only one set of attributes that > uniquely identify the object. This set of attributes then should automatically > have a unique index. None of the popular DBMS enforce this. ShareBase III (an ANSI SQL based system) does create a unique index (by default) on the primary key. Unfortunatly the ANSI Standard requires that duplicate rows be permitted - a really dumb idea - so we allow duplicate rows if the DBA requests (Not that there is much useful that can be done with them (dirty data excluded). ShareBase II is QUEL based, like Ingres and others, and only permits duplicates when there is no sorting (clustered) index. This is mostly an expense issue - finding duplicates without indices can be expensive. Lots of other things, like garbage collection, typically break when there is no clustered index. All our manuals and classes stress that a clustered index was required on every table for proper system operation. I'll guess this is a good rule for the other QUEL based systems. By the way, duplicates is one of the fundimental semantic differences between SQL based and QUEL based systems. QUEL defines duplicate tuples (rows) as having no meaning and the optimizer is allowed (encouraged?!) to delete duplicates. SQL says that duplicates have meaning (although it doesn't give you a way to manipulate them!) and the optimizer can only delete duplicates at user request (using the distinct clause). Thus projections which do not include the primary key (including most aggregates) can give very different results under SQL semantics and QUEL semantics. And, just because you are using SQL syntax does not mean you are getting SQL semantics. Most of the QUEL based systems long ago delivered SQL; but with QUEL semantics! It is the semantics (not the syntax) that keeps several of the vendors from passing the NIST ANSI SQL tests. > ... > > Bipin Prasad bprasad@honcim1.honeywell.com -- >>>>>> The above statements are only my opinions <<<<<< Mike Tossy ShareBase Coropration miket@sharebase.com 14600 Wichester Blvd (408) 378-7575 ext2200 Los Gatos, CA 95030 (ShareBase is a subsidiary of Teradata Corportation)