Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!src.honeywell.com!cim-vax.honeywell.com!tdoyle
From: tdoyle@cim-vax.honeywell.com
Newsgroups: comp.databases
Subject: Re: SQL Duplicate Row Deletion ???
Message-ID: <1991Apr5.074844.59@cim-vax.honeywell.com>
Date: 5 Apr 91 13:48:44 GMT
References: <91091.141528SYSPMZT@GECRDVM1.BITNET> <1991Apr3.010838.2063@eng.ufl.edu> <1991Apr3.085710.57@cim-vax.honeywell.com> <1991Apr4.004438.15592@inel.gov>
Organization: Honeywell CIS
Lines: 87

In article <1991Apr4.004438.15592@inel.gov>, cdm@gem-hy.Inel.GOV (Dale Cook) writes:
> 
> Sorry, but I must disagree.  Attributes do not _identify_ an instance of an
> entity, they merely describe it.  A subtle, but important, distinction.
> There is nothing in data modelling that says multiple instances of an entity
> or relationship cannot have identical attributes.

Attributes are the ONLY way to describe an object. You may CHOOSE to use the
rowid as an additional attribute for the sake of convenience (or in some 
extreme case that I have not encountered, a necessity), but don't try
to pass that off as data modelling.

Note that the person who originally posted the request has now created a 
unique index to avoid the problem.

> 
> |> Another problem with the popular DBMS's is that they don't treat null values
> |> correctly. If the attributes in the unique index are null, then multiple
> |> entries with null should be permitted (using the theory that null !=
> null, i.e
> |> the two nulls are different).
> |> 
> 
> I don't follow this.
> 

Most reasonable databases treat NULL as a special value for an attribute. 
It means that the value is UNKNOWN. This special value is different from
0 for numeric data and blank for strings. Thus if you had the following
data in your table:

          Person	balance ($)
            A		10
	    B		20
	    C		30
	    D		NULL
      	    E		NULL

SELECT COUNT(*) WHERE balance > 0        will return 3
SELECT AVG(balance)                      will return 20    not 12

Since one null is not the same as another null, balance of D is not the same
as the balance for D. This has implications on how JOINS betweens tables are
executed, how selections are performed, how boolean logic is performed, etc.
(NULL != NULL, result of comparision (NULL > NULL) is unknown).

Most major relational databases do this well/correctly.

WHERE THEY FAIL IS WHEN the attribute that has null values, is part of
the primary key. Then they treat NULL as a single value (which is how NULL is
represented internally in the relational DBMSs). This really hoses things.

As an example: 

	Following is a table where employee number is the key:

	emp-no	name	status	salary   ...etc
        (unique
	index)

	1	A	...    ...............
	2       B       ....     ...........
	NULL    C	....................      New hire, emp-no pending
	NULL    D	....................      New hire, emp-no pending

Or take this example, where purchase requisitions are assigned purchase order
number:

   Tables:  1. Purchase Order  (unique index POnumber)
	    2. Invoices		(unique index POnumber, InvoiceNumber)

Table 1. Purchase Order:

	POnumber  Status amount	vendor limit clause ... etc
        (unique
	 index)

	1	expire	20000  ...................
	2	current	20000  ...................
	NULL	new	20000  ...................  POnumber unaasigned waiting
	NULL	new	20000  ...................  for signoff process
-----------------------------------------------------------------


You get the picture.

Bipin Prasad      bprasad@honcim1.honeywell.com