Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!haven!vrdxhq!daitc!daitc.daitc.mil
From: jkrueger@daitc.daitc.mil (Jonathan Krueger)
Newsgroups: comp.databases
Subject: Re: How do you...
Message-ID: <466@daitc.daitc.mil>
Date: 28 Mar 89 21:13:52 GMT
References: <1707@anasaz.UUCP>
Sender: jkrueger@daitc.daitc.mil
Reply-To: jkrueger@daitc.daitc.mil (Jonathan Krueger)
Organization: Defense Applied Information Technology Center
Lines: 60
In-reply-to: john@anasaz.UUCP (John Moore)

In article <1707@anasaz.UUCP>, john@anasaz (John Moore) writes:
>I have a relation that consists of a large (>500,000) set of last and
>first names, and an application that needs to do partial searches of
>the list. For example, the query might be to find names starting with
>"mo". A further restriction is that it must be in alphabetical order.
>Finally (and the only tough one for me) is that I only want the first
>20 names (for display on a screen). There may be 10,000 names that
>meet the criteria. Also, the list is dynamic - names are continuously
>being added and deleted by multiple processes.
>
>If the relation is indexed on last name, it should be possible to
>do this, but how do I (in SQL in general, and Informix-Turbo in particular):
>
>(1) Make sure that the set of all tuples matching the query not be sorted
>    before I get the records (i.e. it is prohibitively expensive if the
>    database engine extracts a temporary file of the 10,000
>    names, sorts them, and then presents them to my process).

No problem, just don't ask for the sort.

The relational model specifically does not specify top-to-bottom
ordering of rows.  If a particular implementation guarentees retrieval
in a default sorting as a side effect of a storage structure,
retrievals may return rows in a sort order with no extra overhead But
this is non-portable, of course.  But you said you didn't care if they
were sorted, you just wanted no extra cost for it.  BTW, the database
engine that can't sort 10K names efficiently is not worth bothering
about.

>(2) Limit the query to the first 20 names - it is also too expensive
>    if I get 10,000 names and just throw away the last 9980 of them.
>(3) Later on, without holding open a cursor, get the NEXT 20 names.

No way to express "first n matches" in relational calculus.
Practically, when embedded query languages define how retrievals
connect to the semantics of host language loops, there may be a safe
way to break off the query, or to perform other work for each row
retrieved, or for each group of rows retrieved.  For example:

#define	BUNCH	20	/* how many rows to get at a time */
#define	LIMIT	200	/* if this many or more returned, abort */
##	char	last[20], first[20];
	int	ctr = 0;
##	range of n is names
##	retrieve (last = n.lname, first = n.fname)
##	{
		if (++ctr > LIMIT)
##			endretrieve
		if (ctr%BUNCH == 0)
			/* do stuff */
##	}

Note that this works only if the "stuff" can be embedded inline.  If
you need instead the semantics of a function that gets called
externally and returns the next 20 rows each time, this syntax isn't
powerful enough.  You need something that "saves context".  In SQL
such context savers are known as cursors.

-- Jon
--