Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!ucbvax!bloom-beacon!eru!hagbard!sunic!mcsun!ukc!keele!csa18
From: csa18@seq1.keele.ac.uk (R.J. Husmo)
Newsgroups: comp.databases
Subject: Re: Duplicated tuples problem (Oracle/SQL)
Message-ID: <775@keele.keele.ac.uk>
Date: 2 Jan 91 12:14:14 GMT
References: <772@keele.keele.ac.uk> <CIMSHOP!DAVIDM.90Dec21095239@uunet.UU.NET> <12224@sybase.sybase.com>
Reply-To: csa18@seq1.kl.ac.uk (R.J. Husmo)
Organization: /usr/local/lib/news/organization
Lines: 53

In article <12224@sybase.sybase.com> you write:
>In article <CIMSHOP!DAVIDM.90Dec21095239@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
>>>>>>> On 21 Dec 90 10:13:14 GMT, csa18@seq1.keele.ac.uk (R.J. Husmo) said:
>>
>>R.J.> Q: Is there a way to automatically delete duplicate tuples from a
>>R.J.> table/all tables in a db using SQL?
>>
>>The typical way is to put a unique index on the primary key of each table.
>
>i don't think this answers his/her question.  it sounds to me like s/he has
>duplicates in his/her tables and needs to get rid of them.
[...]

Indeed. Spot on. Let me describe the scenario, as there may be a clever
way of solving a specific problem which does not have a general solution.

Our DBA, who has only been a DBA for a few months, was given the task of
enlarging the storage space for our DB. Having looked in the DB manuals,
he decided that the easiest/safest way to do this would be to export all
the tables and their associated data, and then replace the old DB with a 
new and larger version.

When setting up the new DB, he included all the system tables and data.
He then imported the old tables and data. This meant that some of the
system tuples were loaded twice. So when doing, for example, a 'help'
each help topic appears twice. At this point our DBA asked for help. He
spent three days going through the process above, and does not wish to do
so again. So I posted the query here, as it seemed like an interesting 
problem.

We are using ORACLE RDBMS V6.0.27.8.2, by the way.
>|>
>|>          1.  Create a temporary table with the same structure as the
>|>              one containing the duplicates.
>|>
>|>          2.  INSERT INTO new_tab
>|>              SELECT DISTINCT *
>|>              FROM dup_tab;
>|>
>|>          3.  Delete all records from dup_tab and refill it from new_tab.
>|>
We thought of doing this, but there are some problems:
	We do not know the format of the tables involved
	Nor do we know the name of all the tables involved, although
		I suspect that we can quite easily find out
	There are rather a lot of tables involved (61 candidates)
We thought of some other ways of doing it, too, some of which included
some rather creative shell script writing, but none seemed to automate 
the process sufficiently (As you pointed out in your posting).
The more I think about it, the more I think I'll convince our DBA to put
off writing his thesis and do it again. Properly, this time.

Reidar.