Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!labrea!rutgers!columbia!garfield.columbia.edu!andy
From: andy@garfield.columbia.edu (Andy Lowry)
Newsgroups: comp.databases
Subject: Re: Informix 4GL Question?
Message-ID: <5351@columbia.edu>
Date: 26 Feb 88 19:51:11 GMT
References: <714@uel.uel.co.uk> <2314@geac.UUCP> <974@pasteur.Berkeley.Edu>
Sender: nobody@columbia.edu
Reply-To: andy@garfield.columbia.edu.UUCP (Andy Lowry)
Organization: Columbia University CS Department
Lines: 29
Keywords: Informix 4GL
Summary: Order statistics in linear time

In article <974@pasteur.Berkeley.Edu> larry@postgres.UUCP (Larry Rowe) writes:
>the referenced messages asked about relational calculus queries that
>answer the queries:
>	1. get the first occurrence of some predicate
>	2. get the first N occurrences of some predicate
>	3. return true/false if an occurrence exists

Regarding the 2nd type of query:
>...  if the retrieved data doesn't have to be sorted you could
>gain a substantial win by stopping the query early as in the ANY aggregate
>above.  however, if the data had to be sorted (e.g., get the N-th highest),
>i doubt that you would save very much because you still have to process
>the query and sort the return set to determine the N-th highest (unless the
>data was presorted in some storage structure).

Note that the problem of finding the Nth largest (or smallest) element
in a list is a linear-time operation, and does not require sorting the
list.  An O(n) number-of-comparisons algorithm is presented in section
3.6 of Aho, Hopcroft & Ullman's "The Design and Analysis of Computer
Algorithms" (Addison-Wesley, Reading MA, 1974).  Even without
expending the effort of learning and implementing this algorithm, some
savings can be achieved using well-known sorting algorithms that yield
a correctly placed element on each iteration (like bubble sort or heap
sort).  The sort procedure could simply leave the tail end of the list
unsorted once the first N elements have been correctly placed.  (The
resulting algorithms will not be linear in n, but the operation will
be cheaper than a full sort using the same algorithm).

-Andy Lowry