Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!pacbell.com!ucsd!qualcom.qualcomm.com!harvey.qualcomm.com!greg From: greg@harvey.qualcomm.com (Greg Noel) Newsgroups: comp.databases Subject: Re: Optimal SQL query needed Summary: Summary of replies Message-ID: <1991Mar20.234523.6677@qualcomm.com> Date: 20 Mar 91 23:45:23 GMT References: <1991Mar13.224746.5293@qualcomm.com> Sender: news@qualcomm.com Followup-To: sender Organization: Qualcomm, Inc., San Diego, CA Lines: 75 I wrote: >I have an Oracle data base table, with three columns, which I will call A, B, >and C. .... I want to define a view on this table that only consists >of the most recent active record for each value of A. How do I define this >view such that the lookup is optimal? Intuitively, what I would like to tell >the data base engine is to look up all records with the same value of A, >riffle through them, and select the one with the largest B.C value. I received several replies to this; my thanks for the interest. Here's what was suggested: J. Pitt (isn't OZ.AU redundant???) asks: >Have you tried the obvious ? > SELECT A, B, MAX(C) > FROM TBL > GROUP BY A,B; >(include other columns as required of course) Unfortunately, it's the last line that's the killer -- it doesn't seem to be possible to include other columns, except as summary functions (such as the MAX or MIN functions). What I need is the record corresponding to MAX(B).MAX(C). Mike Tossy suggests adding another column B' that contains the end of the range for which the record is valid. He notes: >This solution is not without cost. Inserting a new record becomes a longer >process - you must update the old current record also. But, the update >should be very fast. .... In a similar vein, someone at U.Iowa and Doug Manatt both suggest adding a flag to the table that marks the current record. These are nice solutions, but ones that won't work in this case. Originally, we had a historical table containing date ranges and a current table with the most-recent value. It was a maintenance nightmare, and this design is an attempt to eliminate some of those difficulties. (Mike worries that we might be put off by the extra disk space involved; that's the least of our worries!) There are two reasons why these won't work: - I'm not the data base administrator; I can't add another column just to improve the processing of a single program. And because of the maintenance hassle described in the next point, it's probably not worth the complexity elsewhere in the system. - There are several programs that update this table, and because of the order in which the data may be processed, the update is not always to the "current" record -- that is, sometimes it is the history that is being changed. The updating programs must be very careful to notice when they are not updating the current record and correctly splice in the dates. I do agree with the point Mike makes that this is the "true" solution to the data base problem; we may consider adopting it at a later date. Cmdkrj also quotes Codd, "The only time not to normalize is when you can make an efficiency gain." I agree, and this might be one of the times. Lastly, Edward Screven suggests: > select A, B, C from T t1 > where not exists > (select 'x' from T t2 > where t2.A = t1.A > and (t2.B > t1.B or > (t2.B >= t1.B and > t2.C > t1.C))); I've looked at this until I'm cross-eyed and still have no intuition as to why this works, but it does. It correctly selects the most-recent record. It's not optimal in that it joins the table with itself (requiring two indexed lookups to select on a value from column A), but it does capture both of the constraints from columns B and C in a single sub-query. For the moment, I'm using this last solution, but once things settle down a bit from this conversion, I plan to examine a solution involving date ranges. Thanks for all the response! -- Greg Noel, Unix Guru greg@qualcomm.com or greg@noel.cts.com