Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!labrea!rutgers!columbia!garfield.columbia.edu!andy From: andy@garfield.columbia.edu (Andy Lowry) Newsgroups: comp.databases Subject: Re: Informix 4GL Question? Message-ID: <5351@columbia.edu> Date: 26 Feb 88 19:51:11 GMT References: <714@uel.uel.co.uk> <2314@geac.UUCP> <974@pasteur.Berkeley.Edu> Sender: nobody@columbia.edu Reply-To: andy@garfield.columbia.edu.UUCP (Andy Lowry) Organization: Columbia University CS Department Lines: 29 Keywords: Informix 4GL Summary: Order statistics in linear time In article <974@pasteur.Berkeley.Edu> larry@postgres.UUCP (Larry Rowe) writes: >the referenced messages asked about relational calculus queries that >answer the queries: > 1. get the first occurrence of some predicate > 2. get the first N occurrences of some predicate > 3. return true/false if an occurrence exists Regarding the 2nd type of query: >... if the retrieved data doesn't have to be sorted you could >gain a substantial win by stopping the query early as in the ANY aggregate >above. however, if the data had to be sorted (e.g., get the N-th highest), >i doubt that you would save very much because you still have to process >the query and sort the return set to determine the N-th highest (unless the >data was presorted in some storage structure). Note that the problem of finding the Nth largest (or smallest) element in a list is a linear-time operation, and does not require sorting the list. An O(n) number-of-comparisons algorithm is presented in section 3.6 of Aho, Hopcroft & Ullman's "The Design and Analysis of Computer Algorithms" (Addison-Wesley, Reading MA, 1974). Even without expending the effort of learning and implementing this algorithm, some savings can be achieved using well-known sorting algorithms that yield a correctly placed element on each iteration (like bubble sort or heap sort). The sort procedure could simply leave the tail end of the list unsorted once the first N elements have been correctly placed. (The resulting algorithms will not be linear in n, but the operation will be cheaper than a full sort using the same algorithm). -Andy Lowry