Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!src.honeywell.com!cim-vax.honeywell.com!tdoyle From: tdoyle@cim-vax.honeywell.com Newsgroups: comp.databases Subject: Re: SQL Duplicate Row Deletion ??? Message-ID: <1991Apr5.074844.59@cim-vax.honeywell.com> Date: 5 Apr 91 13:48:44 GMT References: <91091.141528SYSPMZT@GECRDVM1.BITNET> <1991Apr3.010838.2063@eng.ufl.edu> <1991Apr3.085710.57@cim-vax.honeywell.com> <1991Apr4.004438.15592@inel.gov> Organization: Honeywell CIS Lines: 87 In article <1991Apr4.004438.15592@inel.gov>, cdm@gem-hy.Inel.GOV (Dale Cook) writes: > > Sorry, but I must disagree. Attributes do not _identify_ an instance of an > entity, they merely describe it. A subtle, but important, distinction. > There is nothing in data modelling that says multiple instances of an entity > or relationship cannot have identical attributes. Attributes are the ONLY way to describe an object. You may CHOOSE to use the rowid as an additional attribute for the sake of convenience (or in some extreme case that I have not encountered, a necessity), but don't try to pass that off as data modelling. Note that the person who originally posted the request has now created a unique index to avoid the problem. > > |> Another problem with the popular DBMS's is that they don't treat null values > |> correctly. If the attributes in the unique index are null, then multiple > |> entries with null should be permitted (using the theory that null != > null, i.e > |> the two nulls are different). > |> > > I don't follow this. > Most reasonable databases treat NULL as a special value for an attribute. It means that the value is UNKNOWN. This special value is different from 0 for numeric data and blank for strings. Thus if you had the following data in your table: Person balance ($) A 10 B 20 C 30 D NULL E NULL SELECT COUNT(*) WHERE balance > 0 will return 3 SELECT AVG(balance) will return 20 not 12 Since one null is not the same as another null, balance of D is not the same as the balance for D. This has implications on how JOINS betweens tables are executed, how selections are performed, how boolean logic is performed, etc. (NULL != NULL, result of comparision (NULL > NULL) is unknown). Most major relational databases do this well/correctly. WHERE THEY FAIL IS WHEN the attribute that has null values, is part of the primary key. Then they treat NULL as a single value (which is how NULL is represented internally in the relational DBMSs). This really hoses things. As an example: Following is a table where employee number is the key: emp-no name status salary ...etc (unique index) 1 A ... ............... 2 B .... ........... NULL C .................... New hire, emp-no pending NULL D .................... New hire, emp-no pending Or take this example, where purchase requisitions are assigned purchase order number: Tables: 1. Purchase Order (unique index POnumber) 2. Invoices (unique index POnumber, InvoiceNumber) Table 1. Purchase Order: POnumber Status amount vendor limit clause ... etc (unique index) 1 expire 20000 ................... 2 current 20000 ................... NULL new 20000 ................... POnumber unaasigned waiting NULL new 20000 ................... for signoff process ----------------------------------------------------------------- You get the picture. Bipin Prasad bprasad@honcim1.honeywell.com