Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!pacbell.com!ucsd!qualcom.qualcomm.com!harvey.qualcomm.com!greg
From: greg@harvey.qualcomm.com (Greg Noel)
Newsgroups: comp.databases
Subject: Re: Optimal SQL query needed
Summary: Summary of replies
Message-ID: <1991Mar20.234523.6677@qualcomm.com>
Date: 20 Mar 91 23:45:23 GMT
References: <1991Mar13.224746.5293@qualcomm.com>
Sender: news@qualcomm.com
Followup-To: sender
Organization: Qualcomm, Inc., San Diego, CA
Lines: 75

I wrote:
>I have an Oracle data base table, with three columns, which I will call A, B,
>and C.  ....  I want to define a view on this table that only consists
>of the most recent active record for each value of A.  How do I define this
>view such that the lookup is optimal?  Intuitively, what I would like to tell
>the data base engine is to look up all records with the same value of A,
>riffle through them, and select the one with the largest B.C value.

I received several replies to this; my thanks for the interest.  Here's what
was suggested:

J. Pitt <jmp@stan.xx.swin.OZ.AU> (isn't OZ.AU redundant???) asks:
>Have you tried the obvious ?
>	SELECT A, B, MAX(C)
>	FROM   TBL
>	GROUP BY A,B;
>(include other columns as required of course)

Unfortunately, it's the last line that's the killer -- it doesn't seem to
be possible to include other columns, except as summary functions (such as
the MAX or MIN functions).  What I need is the record corresponding to
MAX(B).MAX(C).

Mike Tossy <miket@blia.sharebase.com> suggests adding another column B' that
contains the end of the range for which the record is valid.  He notes:
>This solution is not without cost.  Inserting a new record becomes a longer
>process - you must update the old current record also.  But, the update
>should be very fast.  ....

In a similar vein, someone at U.Iowa <cmdkrj@pmvax.weeg.uiowa.edu> and Doug
Manatt <manatt@sirius.llnl.gov> both suggest adding a flag to the table that
marks the current record.

These are nice solutions, but ones that won't work in this case.  Originally,
we had a historical table containing date ranges and a current table with the
most-recent value.  It was a maintenance nightmare, and this design is an
attempt to eliminate some of those difficulties.  (Mike worries that we might
be put off by the extra disk space involved; that's the least of our worries!)
There are two reasons why these won't work:
    -	I'm not the data base administrator; I can't add another column just
	to improve the processing of a single program.  And because of the
	maintenance hassle described in the next point, it's probably not
	worth the complexity elsewhere in the system.
    -	There are several programs that update this table, and because of the
	order in which the data may be processed, the update is not always to
	the "current" record -- that is, sometimes it is the history that is
	being changed.  The updating programs must be very careful to notice
	when they are not updating the current record and correctly splice in
	the dates.
I do agree with the point Mike makes that this is the "true" solution to the
data base problem; we may consider adopting it at a later date.  Cmdkrj also
quotes Codd, "The only time not to normalize is when you can make an efficiency
gain."  I agree, and this might be one of the times.

Lastly, Edward Screven <escreven@us.oracle.com> suggests:
>    select A, B, C from T t1
>    where not exists
>    (select 'x' from T t2
>     where t2.A =  t1.A
>       and (t2.B > t1.B or
>            (t2.B >= t1.B and 
>             t2.C >  t1.C)));

I've looked at this until I'm cross-eyed and still have no intuition as to
why this works, but it does.  It correctly selects the most-recent record.
It's not optimal in that it joins the table with itself (requiring two indexed
lookups to select on a value from column A), but it does capture both of the
constraints from columns B and C in a single sub-query.

For the moment, I'm using this last solution, but once things settle down a
bit from this conversion, I plan to examine a solution involving date ranges.

Thanks for all the response!
-- 
Greg Noel, Unix Guru         greg@qualcomm.com  or  greg@noel.cts.com