Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!decwrl!ucbvax!mtxinu!sybase!jeffl@sybase.Sybase.COM
From: jeffl@sybase.Sybase.COM (Jeff Lichtman)
Newsgroups: comp.databases
Subject: Re: Informix sql sum/group by question
Message-ID: <3462@sybase.sybase.com>
Date: 22 Mar 89 07:46:11 GMT
References: <131@bsadrc.UUCP>
Sender: news@sybase.sybase.com
Distribution: na
Lines: 121

> 	select sum(amt), amt from tr_dat where tr_vendor = 5;
> 
> Informix gives me an error saying the amt must be in a group by
> statement....
> 
> My question is why?  Why should I have to have the amt in a group by
> clause?
> -- 
> Darrel R Carver	            -	A clever quote goes here.  Have we got any in

Informix does it according to the ANSI standard.  Sybase doesn't have this
restriction, but I know why it exists, and I will try to explain.

Loosely, the rule is that if you have aggregates in your query, no columns may
appear in the query that are not inside an aggregate or part of the
group by clause.  The rule makes it impossible to write queries that do not
associate aggregated values with un-aggregated columns.

In ANSI SQL, the result of a query is either grouped or not grouped.  If it
is not grouped, all the aggregates in the query produce single values.  If
it is grouped, each aggregate produces one value for each group.

Suppose you have the following table, TABX:

	COL1 | COL2 | COL3
	------------------
	  1  |   1  |  1
	  2  |   1  |  1
	  3  |   1  |  2
	  4  |   2  |  1
	  5  |   2  |  2

The query:

	select sum(COL1) from TABX

will produce:

	-----
	 15

The query:

	select sum(COL1), COL2 from TABX
	group by COL2

will produce:

	     | COL2
	------------
	  6  |  1
	  9  |  2

Now, what should the following produce:

	select sum(COL1), COL2 from TABX

There is no association between the aggregate and COL2.  In ANSI SQL,
not only is the syntax illegal, but there is no definition for what this
should produce if it were legal.  In Sybase, we give the answer:

	     | COL2
	------------
	  15 |  1
	  15 |  2
	  15 |  3
	  15 |  4
	  15 |  5

That is, since there is no grouping, Sybase simply duplicates the scalar
(single-value) result for each value of COL2.  This is the only way I
can think of to interpret this query, but I can't imagine why someone
would want this result.  It would make more sense to do two selects:

	select sum(COL1) from TABX
	select COL1 from TABX

Now, try to think of what the following query should produce:

	select sum(COL1), COL2, COL3 from TABX
	group by COL2

This query is also illegal under ANSI SQL.  COL3 is neither in an aggregate
nor in the "group by" clause.  Not only does the standard make this illegal,
it doesn't define what such a query should do if it were legal, because
there is no association between COL3 and the groups produced by the query.

Sybase produces this result:

	     | COL2 | COL3
	-------------------
	  6  |   1  |  1
	  6  |   1  |  1
	  6  |   1  |  2
	  9  |   2  |  1
	  9  |   2  |  2

Eh?  How do we get that?

In Sybase, if you have a query with columns that aren't in the "group by"
clause or an aggregate, we first form an intermediate result (which you
can see is the same as if we had left COL3 out of the query):

	     | COL2
	------------
	  6  |  1
	  9  |  2
	
and then join this with the base table (TABX) on the grouped column (COL2).
I maintain that this is a reasonable way to interpret this query - since
there is no explicit association of the aggregate values with the column
SUM3, we match each aggregate value with all of the rows where the grouping
columns are equal.

I think most people would admit that there is no intuitively obvious result
for such a query.  Not only that, but it's hard to implement the interpretation
I have given above.  That's why ANSI and many products disallow it.
---
Jeff Lichtman at Sybase
{mtxinu,pacbell}!sybase!jeffl  -or- jeffl@sybase.com
"Saints should always be judged guilty until they are proved innocent..."