Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!decwrl!ucbvax!mtxinu!sybase!jeffl@sybase.Sybase.COM From: jeffl@sybase.Sybase.COM (Jeff Lichtman) Newsgroups: comp.databases Subject: Re: Informix sql sum/group by question Message-ID: <3462@sybase.sybase.com> Date: 22 Mar 89 07:46:11 GMT References: <131@bsadrc.UUCP> Sender: news@sybase.sybase.com Distribution: na Lines: 121 > select sum(amt), amt from tr_dat where tr_vendor = 5; > > Informix gives me an error saying the amt must be in a group by > statement.... > > My question is why? Why should I have to have the amt in a group by > clause? > -- > Darrel R Carver - A clever quote goes here. Have we got any in Informix does it according to the ANSI standard. Sybase doesn't have this restriction, but I know why it exists, and I will try to explain. Loosely, the rule is that if you have aggregates in your query, no columns may appear in the query that are not inside an aggregate or part of the group by clause. The rule makes it impossible to write queries that do not associate aggregated values with un-aggregated columns. In ANSI SQL, the result of a query is either grouped or not grouped. If it is not grouped, all the aggregates in the query produce single values. If it is grouped, each aggregate produces one value for each group. Suppose you have the following table, TABX: COL1 | COL2 | COL3 ------------------ 1 | 1 | 1 2 | 1 | 1 3 | 1 | 2 4 | 2 | 1 5 | 2 | 2 The query: select sum(COL1) from TABX will produce: ----- 15 The query: select sum(COL1), COL2 from TABX group by COL2 will produce: | COL2 ------------ 6 | 1 9 | 2 Now, what should the following produce: select sum(COL1), COL2 from TABX There is no association between the aggregate and COL2. In ANSI SQL, not only is the syntax illegal, but there is no definition for what this should produce if it were legal. In Sybase, we give the answer: | COL2 ------------ 15 | 1 15 | 2 15 | 3 15 | 4 15 | 5 That is, since there is no grouping, Sybase simply duplicates the scalar (single-value) result for each value of COL2. This is the only way I can think of to interpret this query, but I can't imagine why someone would want this result. It would make more sense to do two selects: select sum(COL1) from TABX select COL1 from TABX Now, try to think of what the following query should produce: select sum(COL1), COL2, COL3 from TABX group by COL2 This query is also illegal under ANSI SQL. COL3 is neither in an aggregate nor in the "group by" clause. Not only does the standard make this illegal, it doesn't define what such a query should do if it were legal, because there is no association between COL3 and the groups produced by the query. Sybase produces this result: | COL2 | COL3 ------------------- 6 | 1 | 1 6 | 1 | 1 6 | 1 | 2 9 | 2 | 1 9 | 2 | 2 Eh? How do we get that? In Sybase, if you have a query with columns that aren't in the "group by" clause or an aggregate, we first form an intermediate result (which you can see is the same as if we had left COL3 out of the query): | COL2 ------------ 6 | 1 9 | 2 and then join this with the base table (TABX) on the grouped column (COL2). I maintain that this is a reasonable way to interpret this query - since there is no explicit association of the aggregate values with the column SUM3, we match each aggregate value with all of the rows where the grouping columns are equal. I think most people would admit that there is no intuitively obvious result for such a query. Not only that, but it's hard to implement the interpretation I have given above. That's why ANSI and many products disallow it. --- Jeff Lichtman at Sybase {mtxinu,pacbell}!sybase!jeffl -or- jeffl@sybase.com "Saints should always be judged guilty until they are proved innocent..."