Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!voder!cullsj!gupta From: gupta@cullsj.UUCP (Yogesh Gupta) Newsgroups: comp.databases Subject: Re: Sybase bug? Summary: Subquery is not the same as join! Message-ID: <638@cullsj.UUCP> Date: 10 Jul 89 18:59:57 GMT References: <637@cullsj.UUCP> <4982@sybase.sybase.com> Organization: Cullinet Software, San Jose, CA Lines: 117 In article <4982@sybase.sybase.com>, jeffl@sybase.Sybase.COM (Jeff Lichtman) writes: > >> > select * from hello where a = 1 or a in (select b from world) > > > > Thus, the predicate a = 1 should be applied to each row of the table > > hello, and would be true for the one row that exists in table hello... > > Thus, the result should contain 1 row. > > > > Yogesh Gupta. > > The result of a SQL query is supposed to be as if the following operations > were performed (in this order): > > 1) Cartesian product You missed a very important point though - The cartesian product is of only those tables that are in the . (see lengthy ANSI quote below). Thus, in our query, there is NO cartesian product involved! > 2) Restriction (where clause) > 3) Projection (select list) > > (Note that this doesn't necessarily describe the actual processing strategy.) > In this case, the Cartesian product of the tables contains no rows, so the > answer contains no rows. Again, there is no Cartesian product involved. The following is a long discussion which tries to explain how subqueries are different from simple joins: What is the difference between the following the queries? Q1. Select a from hello where a = 1 or a in (select b from world); Q2. Select a from hello, world where a = 1 or a = b; Note that the behaviour of the subquery is different in the sense that there is NO initial cartesian product! Thus, to quote ANSI again (X3.135-1986): ::= [] [] [] General Rules 1) If all the optional clauses are omitted, then the table is the result of the . Otherwise, each specified clause is applied to the result of the previously specified clause and the table is the result of of the application of the last specified clause. ... General Rules: 2) Case: a) If the contains a single
, then the result of the is the table identified by that
. General Rules: 1) Let R denote the result of the . 2) The is applied to each row of R. The result of the is a table of those rows of R for which the result of the is true. 3) Each in the is effectively executed for each row of R and the results used in the application of the to the given row of R. If any executed contains an outer reference ... From the above, the logical order of processing (not necessarily the actual order of execution) of the query Q1 is as follows: Step A) The has the table "hello". The result of the from clause is the single row with the value of "a" being 1. Step B) The contains two predicates. Applying each of them to the value of the single row, the first predicate (a = 1) yields true while the second (subquery) yields false. The result of True OR False is True. The the result of applying the is also the single row with the value of "a" being 1. However, the logical order of processing of the query Q2 is as follows: Step A) The has the tables "hello" and "world". The result of the from clause is the extended cartesian product of the two tables. As the number of rows in the table "world" is zero, the result has no rows. I hope this makes it clear. (I shouldn't have to explain this to two people from Sybase, should I :-) Yogesh Gupta. Cullinet Software, Inc. ----- The opinions expressed in this article are those of the author only and do not represent those of Cullinet.