Path: utzoo!utgpu!attcan!uunet!husc6!uwvax!rutgers!apple!bionet!agate!helios.ee.lbl.gov!lll-tis!daitc!jkrueger@daitc.daitc.mil
From: jkrueger@daitc.daitc.mil (Jonathan Krueger)
Newsgroups: comp.databases
Subject: Re: Is order important in SQL?
Keywords: SQL Unify performance join-query
Message-ID: <229@daitc.daitc.mil>
Date: 7 Nov 88 04:50:45 GMT
References: <344@sunny.UUCP>
Sender: jkrueger@daitc.daitc.mil
Reply-To: jkrueger@daitc.daitc.mil (Jonathan Krueger)
Distribution: na
Organization: Defense Applied Information Technology Center
Lines: 115
In-reply-to: poage@sunny.UUCP (Tom Poage)

In article <344@sunny.UUCP>, poage@sunny (Tom Poage) writes:
>	Is the order of entries in the "select", "from", and 
>	"where" clauses important in terms of performance?

If order affects performance, your system lacks a query optimizer.
I've heard there are still vendor products for which this is true.
Empirical testing can reveal such shoddy products.  Lack of evidence
does not constitute an endorsement, however.

The following are two QUEL queries put to RTI INGRES to answer the
question "who among us can afford his preferred style of wallpaper?":

range of w is wallpaper		/* styles and costs */
range of p is prefer		/* people's style preferences */
range of b is budgets		/* names of budgets and balences in them */
range of a is authorized	/* who's authorized to draw from what budget */

/* one way to put the query */
retrieve
	(b.account, w.cost, w.color, p.person)
where
	b.account = a.account and
	w.cost <= b.balence and
	w.color = p.color

/* semantically identical way (produces identical result relation) */
retrieve
	(w.color, w.cost, p.person, b.account)
where
	w.cost <= b.balence and
	b.account = a.account and
	w.color = p.color

As you see, order of qualifications, including those for joins, is
varied.  Order of columns named in the target list is varied, also
including those later joined on.  (Translate to SQL if you like, it
doesn't make any difference.)  Results look like this (abbreviated):

|color          |cost                |person              |accoun|
|----------------------------------------------------------------|
|blue           |              $10.00|bill                |disc  |
|blue           |              $10.00|darryl              |disc  |
|----------------------------------------------------------------|

|accoun|cost                |color          |person              |
|----------------------------------------------------------------|
|disc  |              $10.00|blue           |bill                |
|disc  |              $10.00|blue           |darryl              |
|----------------------------------------------------------------|

In other words, go for the cheap blue stuff, then you can hide it in
your discretionary account :-)

As an unsupported feature, RTI offers a map of its query execution
plan (QEP).  This shows in what order it plans to perform the search.
The QEP for the query above is shown below.  Which query?  Doesn't
matter.  Order of searching is quite independent of ordering of table
and column names in the two queries.  This confirms that RTI has a
query optimizer.

If your product doesn't provide a way to display QEP's or their ilk,
you can measure cpu time required.  You'll have to establish a "fuzz
factor", a threshold for differences smaller than which are not
considered significant.  You'll find this a tedious process, not
nearly as straightforward as saving QEP's to files and diff'ing them.

The QEP generated for either query is:

                        Join(account)
                        Sorted(account)
                        Pages 1 Tups 25
                        D12 C0
             /                                  \
            Proj-rest                           Sort(account)
            Sorted(account)                     Pages 1 Tups 25
            Pages 1 Tups 8                      D8 C0
            D4 C0
 /                                   /
authorized                          Cart-Prod
B-Tree(account)                     Heap
Pages 4 Tups 8                      Pages 1 Tups 25
                                    D8 C0
                         /                      \
                        Join(color)             budgets     
                        Heap                    Heap
                        Pages 1 Tups 5          Pages 1 Tups 5
                        D3 C0
             /                      \
            Proj-rest               Sort(color)
            Heap                    Pages 1 Tups 5
            Pages 1 Tups 5          D1 C0
            D2 C0
 /                       /
wallpaper               Proj-rest
Heap                    Heap
Pages 2 Tups 5          Pages 1 Tups 5
                        D1 C0
             /
            prefer      
            Heap
            Pages 1 Tups 5


You read this from botttom to top.  E.g. the first thing it plans to
do is a projection-restriction on prefer (w.color = p.color) before
jointing prefer to wallpaper.  The last thing it plans to do is join
budgets to authorized account on account.  "Heap" is the unkeyed,
unindexed storage structure of the table; i.e., not B-Tree, ISAM, or
Hash.  Note therefore that I'm doing a three-way join, two equijoins
and a reljoin, on columns not understood to be primary or secondary
keys.  Simpler tasks have more readable QEP's.  But then they wouldn't
be interesting enough to vary order in.

-- Jon
--