Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uwm.edu!src.honeywell.com!cim-vax.honeywell.com!tdoyle From: tdoyle@cim-vax.honeywell.com Newsgroups: comp.databases Subject: Re: SQL Duplicate Row Deletion ??? Message-ID: <1991Apr3.160345.58@cim-vax.honeywell.com> Date: 3 Apr 91 22:03:45 GMT References: <91091.141528SYSPMZT@GECRDVM1.BITNET><1991Apr3.010838.2063@eng.ufl.edu> Organization: Honeywell CIS Lines: 21 In article , drack@titan.tsd.arlut.utexas.edu (Dave Rackley) writes: > > I cannot justify the need for them, but in real-time data collection duplicates > are often created by multiple sensors at a single source. It then becomes a > data reduction/analysis issue to remove duplicates, while recording the fact > that duplicate data captures occurred. > > Why let this happen? Our data is always dirty--we have to clean it up before > the customer gets his hands on it! This may be a little esotoric, but I would suggest that the table is not in third normal form. If the table represents data measurements from various sensors, this would indicate that one attribute is missing from the table: i.e. sensor-id. If the table represents simply the sequential measurement then a date-time or a counter attribute may be required. Once the database "correctly" models the world, then it is quite possible, (when the user-view of the object does not include one of the primary identifiers), that the tuple presented in the view corresponds to more than one tuple in the original relation. But, this does not excuse the original deficiency the relational DBMSs.