Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!lll-tis!ames!pasteur!postgres!larry
From: larry@postgres.uucp (Larry Rowe)
Newsgroups: comp.databases
Subject: Re: Informix 4GL Question?
Message-ID: <974@pasteur.Berkeley.Edu>
Date: 24 Feb 88 08:37:24 GMT
References: <714@uel.uel.co.uk> <2314@geac.UUCP>
Sender: news@pasteur.Berkeley.Edu
Reply-To: larry@postgres.UUCP (Larry Rowe)
Organization: Postgres Research Group, UC Berkeley
Lines: 51
Keywords: Informix 4GL

the referenced messages asked about relational calculus queries that
answer the queries:
	1. get the first occurrence of some predicate
	2. get the first N occurrences of some predicate
	3. return true/false if an occurrence exists
QUEL has an aggregate function that does numbers 1 and 3.  the ANY
aggregate terminates when the first record is returned.  i believe
it returns 1 if a record is found, 0 otherwise.  for example,
the following query returns 1 if there is anybody in the toy department:

	RETRIEVE (x=ANY(emp.name where emp.dept= "toy"))

assuming that the emp relation has 100K tuples, the query terminates 
when it finds the first person in the toy department.  in other words
it terminates the scan implied by the aggregate function.  i'm not certain
whether university ingres or rti ingres will use an index on the
dept attribute to avoid scanning the primary data table, but i suspect 
they do.

this illustrates query 3.  query 1 is simple to do, just put the ANY
aggregate in the predicate rather than the target list.  for example,

	RETRIEVE (emp.all)
	WHERE ANY(emp.name by emp.dept where emp.dept= "toy") = 1

the by-clause is needed to link the tuple variable in the ANY
aggregate to the record constructed in the target list.

query 2 is not easy.  it can be solved for fixed N by doing a series of
nested aggregates (not possible in SQL) but this is really gruesome.  a
better strategy would be to have a specific operator to implement this
type of query.  maybe something like

	FIRST $n OF (RETRIEVE ...)

of course, this should probably also be an aggregate so you can count the
number of departments in which the 5th highest salary is greater than $10K...

one problem with this new formulation is that it may or may not lead to
more efficient processing.  suppose you asked for the 5th element from
a retrieve.  if the retrieved data doesn't have to be sorted you could
gain a substantial win by stopping the query early as in the ANY aggregate
above.  however, if the data had to be sorted (e.g., get the N-th highest),
i doubt that you would save very much because you still have to process
the query and sort the return set to determine the N-th highest (unless the
data was presorted in some storage structure).  once you have to sort, you
won't save much time if the answer set is very large.  my sense is that
the sorted version is needed much more often than the non-sorted version
when you ask for something where N is not equal to 1. 

sigh, yet another little query optimization to put into your optimizers.
	larry