Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!bcm!dimacs.rutgers.edu!seismo!beno.CSS.GOV From: jean@beno.CSS.GOV (Jean Anderson) Newsgroups: comp.databases Subject: Re: smart queries and scientists Message-ID: <49244@seismo.CSS.GOV> Date: 5 Dec 90 06:11:57 GMT Sender: usenet@seismo.CSS.GOV Organization: Center for Seismic Studies, Arlington, VA Lines: 80 References: <1453@beaudin.UUCP> <49175@seismo.CSS.GOV> DC> Query optimization is a tough task - with or without statistics - and I DC> look forward to the day when Oracle makes it a standard on our platform, DC> but for now, building a finely structured database and developing smart DC> queries are our key to success. JB> If you have the time, would you mind posting/mailing an english (non-sql) JB> example of an inefficient query and then an example of a more efficient JB> query? You may choose an extreme case to make a point, but a real life JB> example would be more educational. Thank you for your input. I'm in Dale's developement arm in San Diego and I have a real life seismology query that I think fits your request. We have communications code that transfers raw signaling processing data from the ORACLE database in Norway to the Expert System database in D.C. on an hourly basis. [Note in the sample below we don't transfer a count of the records, we transfer the actual records. The actual transfer involves 5 tables, not 1, so the time delay to transfer gets more interesting.] The seismologist complained that the transfer was taking too long for the amount of data in the table (only several thousand rows). The detection table has one index on 'time', another on 'sta'. He wanted only two out of four stations transferred for a given time period: SQL> set timing on SQL> select count(*) from detection 2 where sta in ('ARC', 'NRS') 3* and time > 648519000 and time <= 648524000; COUNT(*) ---------- 28 Elapsed: 00:01:33.48 I have SQL_TRACE set to TRUE on our development database so I tkprof'd the trace file and discovered ORACLE did two separate scans of the sta index and ignored the time index. I disabled ORACLE's use of the sta index by applying a function to the sta column, forcing ORACLE to use the index on time: SQL> select count(*) from detection 2 where upper(sta) in ('ARC', 'NRS') 3* and time > 648519000.000 and time <= 648524000; COUNT(*) ---------- 28 Elapsed: 00:00:00.61 After going to the IOUG in September and attending the session on "How ORACLE uses indexes", it is now obvious to me what happened. ORACLE prefers a constant check above all and will instantly chose that index first. Since doesn't gather statistics yet, it can't know that the distribution of values for time is nearly unique and makes the better index. The distribution of values is about 25% of the table per sta; time is nearly unique. The end result--we eliminated the index on sta (we may create a concatenated index on time/sta). But the seismologist gagged when I told him what the fix to his performance problem was. Our scientists have to be smarter about their SQL queries--and they aren't. They don't have time to be. Their goal in life is to do seismology, not become expert SQL-ers. - Jean Anderson DBA, SAIC Geophysics Division jean@seismo.css.gov -or- jean@esosun.css.gov (619)458-2727 +++++++++++++++++++++++++++++++++++++++++++++++++ ++ Any opinions are mine, not my employer's. ++ +++++++++++++++++++++++++++++++++++++++++++++++++ Brought to you by Super Global Mega Corp .com