Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!lll-tis!ames!pasteur!postgres!larry From: larry@postgres.uucp (Larry Rowe) Newsgroups: comp.databases Subject: Re: Informix 4GL Question? Message-ID: <974@pasteur.Berkeley.Edu> Date: 24 Feb 88 08:37:24 GMT References: <714@uel.uel.co.uk> <2314@geac.UUCP> Sender: news@pasteur.Berkeley.Edu Reply-To: larry@postgres.UUCP (Larry Rowe) Organization: Postgres Research Group, UC Berkeley Lines: 51 Keywords: Informix 4GL the referenced messages asked about relational calculus queries that answer the queries: 1. get the first occurrence of some predicate 2. get the first N occurrences of some predicate 3. return true/false if an occurrence exists QUEL has an aggregate function that does numbers 1 and 3. the ANY aggregate terminates when the first record is returned. i believe it returns 1 if a record is found, 0 otherwise. for example, the following query returns 1 if there is anybody in the toy department: RETRIEVE (x=ANY(emp.name where emp.dept= "toy")) assuming that the emp relation has 100K tuples, the query terminates when it finds the first person in the toy department. in other words it terminates the scan implied by the aggregate function. i'm not certain whether university ingres or rti ingres will use an index on the dept attribute to avoid scanning the primary data table, but i suspect they do. this illustrates query 3. query 1 is simple to do, just put the ANY aggregate in the predicate rather than the target list. for example, RETRIEVE (emp.all) WHERE ANY(emp.name by emp.dept where emp.dept= "toy") = 1 the by-clause is needed to link the tuple variable in the ANY aggregate to the record constructed in the target list. query 2 is not easy. it can be solved for fixed N by doing a series of nested aggregates (not possible in SQL) but this is really gruesome. a better strategy would be to have a specific operator to implement this type of query. maybe something like FIRST $n OF (RETRIEVE ...) of course, this should probably also be an aggregate so you can count the number of departments in which the 5th highest salary is greater than $10K... one problem with this new formulation is that it may or may not lead to more efficient processing. suppose you asked for the 5th element from a retrieve. if the retrieved data doesn't have to be sorted you could gain a substantial win by stopping the query early as in the ANY aggregate above. however, if the data had to be sorted (e.g., get the N-th highest), i doubt that you would save very much because you still have to process the query and sort the return set to determine the N-th highest (unless the data was presorted in some storage structure). once you have to sort, you won't save much time if the answer set is very large. my sense is that the sorted version is needed much more often than the non-sorted version when you ask for something where N is not equal to 1. sigh, yet another little query optimization to put into your optimizers. larry