Path: utzoo!utgpu!attcan!uunet!husc6!uwvax!rutgers!apple!bionet!agate!helios.ee.lbl.gov!lll-tis!daitc!jkrueger@daitc.daitc.mil From: jkrueger@daitc.daitc.mil (Jonathan Krueger) Newsgroups: comp.databases Subject: Re: Is order important in SQL? Keywords: SQL Unify performance join-query Message-ID: <229@daitc.daitc.mil> Date: 7 Nov 88 04:50:45 GMT References: <344@sunny.UUCP> Sender: jkrueger@daitc.daitc.mil Reply-To: jkrueger@daitc.daitc.mil (Jonathan Krueger) Distribution: na Organization: Defense Applied Information Technology Center Lines: 115 In-reply-to: poage@sunny.UUCP (Tom Poage) In article <344@sunny.UUCP>, poage@sunny (Tom Poage) writes: > Is the order of entries in the "select", "from", and > "where" clauses important in terms of performance? If order affects performance, your system lacks a query optimizer. I've heard there are still vendor products for which this is true. Empirical testing can reveal such shoddy products. Lack of evidence does not constitute an endorsement, however. The following are two QUEL queries put to RTI INGRES to answer the question "who among us can afford his preferred style of wallpaper?": range of w is wallpaper /* styles and costs */ range of p is prefer /* people's style preferences */ range of b is budgets /* names of budgets and balences in them */ range of a is authorized /* who's authorized to draw from what budget */ /* one way to put the query */ retrieve (b.account, w.cost, w.color, p.person) where b.account = a.account and w.cost <= b.balence and w.color = p.color /* semantically identical way (produces identical result relation) */ retrieve (w.color, w.cost, p.person, b.account) where w.cost <= b.balence and b.account = a.account and w.color = p.color As you see, order of qualifications, including those for joins, is varied. Order of columns named in the target list is varied, also including those later joined on. (Translate to SQL if you like, it doesn't make any difference.) Results look like this (abbreviated): |color |cost |person |accoun| |----------------------------------------------------------------| |blue | $10.00|bill |disc | |blue | $10.00|darryl |disc | |----------------------------------------------------------------| |accoun|cost |color |person | |----------------------------------------------------------------| |disc | $10.00|blue |bill | |disc | $10.00|blue |darryl | |----------------------------------------------------------------| In other words, go for the cheap blue stuff, then you can hide it in your discretionary account :-) As an unsupported feature, RTI offers a map of its query execution plan (QEP). This shows in what order it plans to perform the search. The QEP for the query above is shown below. Which query? Doesn't matter. Order of searching is quite independent of ordering of table and column names in the two queries. This confirms that RTI has a query optimizer. If your product doesn't provide a way to display QEP's or their ilk, you can measure cpu time required. You'll have to establish a "fuzz factor", a threshold for differences smaller than which are not considered significant. You'll find this a tedious process, not nearly as straightforward as saving QEP's to files and diff'ing them. The QEP generated for either query is: Join(account) Sorted(account) Pages 1 Tups 25 D12 C0 / \ Proj-rest Sort(account) Sorted(account) Pages 1 Tups 25 Pages 1 Tups 8 D8 C0 D4 C0 / / authorized Cart-Prod B-Tree(account) Heap Pages 4 Tups 8 Pages 1 Tups 25 D8 C0 / \ Join(color) budgets Heap Heap Pages 1 Tups 5 Pages 1 Tups 5 D3 C0 / \ Proj-rest Sort(color) Heap Pages 1 Tups 5 Pages 1 Tups 5 D1 C0 D2 C0 / / wallpaper Proj-rest Heap Heap Pages 2 Tups 5 Pages 1 Tups 5 D1 C0 / prefer Heap Pages 1 Tups 5 You read this from botttom to top. E.g. the first thing it plans to do is a projection-restriction on prefer (w.color = p.color) before jointing prefer to wallpaper. The last thing it plans to do is join budgets to authorized account on account. "Heap" is the unkeyed, unindexed storage structure of the table; i.e., not B-Tree, ISAM, or Hash. Note therefore that I'm doing a three-way join, two equijoins and a reljoin, on columns not understood to be primary or secondary keys. Simpler tasks have more readable QEP's. But then they wouldn't be interesting enough to vary order in. -- Jon --