Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!amdahl!rtech!squid!robf
From: robf@squid.rtech.com (Robert Fair)
Newsgroups: comp.databases
Subject: Re: optimising sql
Message-ID: <4646@rtech.rtech.com>
Date: 2 Feb 90 21:15:42 GMT
References: <81@arkham.enet.dec.com>
Sender: news@rtech.rtech.com
Reply-To: robf@squid.UUCP (Robert Fair)
Distribution: usa
Organization: Ingres Corporation, Alameda CA 94501
Lines: 55

>From: may@28182.dec.com (Patrick May) writes
>>In article <1990Jan31.010510.15242@welch.jhu.edu>, francois@welch.jhu.edu (Francois Schiettecatte) writes...
>>Does anyone out there know of any books which might help
>>me to optimise sql statements which are to be sent to a 
>
>An issue of _Database Programming and Design_ sometime around October or
>so 1988 had an article on this.  I'll try and dig up a more precise
>reference.
>
>Their major points concerned avoiding the use of IN, ANY, and ALL in the
>WHERE clause -- the same functionality can be acheived with EXISTS and
>at the same time force the use of an index if available.  There were
>also some suggestions on ordering the table names in the FROM clause,
>but these were more implementation dependent.

1) The order in the FROM clause should make NO DIFFERENCE to the query plan
   generated if the DBMS has any kind of decent optimizer. Having said that
   some DBMS products today still don't have proper optimizers, so you
   may still be forced to munge with the FROM clause to get decent
   performance. 

2) IN, ANY, ALL are indeed equivilent to EXISTS in many cases, but that
   doesn't  mean you shoudn't use them. If the DBMS has a good optimizer
   equivilent expressions should usually generate very similar
   query plans. It sounds like the book was written after using a
   DBMS which had an optimizer only capable of handling EXISTS effectively.

So, look for a product which has a smart optimizer and save yourself
a lot of hassle. Things to look for include:

- A statistical optimizer which has data distribution information
  available for smart use of indexes.
   
- Flattening subqueries into joins automatically when appropriate

- Multiple join strategies (tuple substitution, sort-merge etc)

- A cost-based method which thinks of things like CPU overhead
  and disk IO when choosing the best query plan.

If you have a distributed database (based on several different machines)
the optimizer should also be capable of allowing for things like
network overhead etc

Given a smart optimizer which turns your SQL into an intelligent query
plan, a more important issue become your database schema (layout) -
do you have appropriate indexes/storage structures on columns which can use
them ? Is the data normalized appropriately ? etc If you have a huge
database with absolutely no indexes the smartest optimizer in the world
won't help (although it'll find the best way of getting  the result
without using any indexes !)

Robert L. Fair
Ingres Corperation
Technical Support Group.