Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!amdahl!rtech!squid!robf From: robf@squid.rtech.com (Robert Fair) Newsgroups: comp.databases Subject: Re: optimising sql Message-ID: <4646@rtech.rtech.com> Date: 2 Feb 90 21:15:42 GMT References: <81@arkham.enet.dec.com> Sender: news@rtech.rtech.com Reply-To: robf@squid.UUCP (Robert Fair) Distribution: usa Organization: Ingres Corporation, Alameda CA 94501 Lines: 55 >From: may@28182.dec.com (Patrick May) writes >>In article <1990Jan31.010510.15242@welch.jhu.edu>, francois@welch.jhu.edu (Francois Schiettecatte) writes... >>Does anyone out there know of any books which might help >>me to optimise sql statements which are to be sent to a > >An issue of _Database Programming and Design_ sometime around October or >so 1988 had an article on this. I'll try and dig up a more precise >reference. > >Their major points concerned avoiding the use of IN, ANY, and ALL in the >WHERE clause -- the same functionality can be acheived with EXISTS and >at the same time force the use of an index if available. There were >also some suggestions on ordering the table names in the FROM clause, >but these were more implementation dependent. 1) The order in the FROM clause should make NO DIFFERENCE to the query plan generated if the DBMS has any kind of decent optimizer. Having said that some DBMS products today still don't have proper optimizers, so you may still be forced to munge with the FROM clause to get decent performance. 2) IN, ANY, ALL are indeed equivilent to EXISTS in many cases, but that doesn't mean you shoudn't use them. If the DBMS has a good optimizer equivilent expressions should usually generate very similar query plans. It sounds like the book was written after using a DBMS which had an optimizer only capable of handling EXISTS effectively. So, look for a product which has a smart optimizer and save yourself a lot of hassle. Things to look for include: - A statistical optimizer which has data distribution information available for smart use of indexes. - Flattening subqueries into joins automatically when appropriate - Multiple join strategies (tuple substitution, sort-merge etc) - A cost-based method which thinks of things like CPU overhead and disk IO when choosing the best query plan. If you have a distributed database (based on several different machines) the optimizer should also be capable of allowing for things like network overhead etc Given a smart optimizer which turns your SQL into an intelligent query plan, a more important issue become your database schema (layout) - do you have appropriate indexes/storage structures on columns which can use them ? Is the data normalized appropriately ? etc If you have a huge database with absolutely no indexes the smartest optimizer in the world won't help (although it'll find the best way of getting the result without using any indexes !) Robert L. Fair Ingres Corperation Technical Support Group.