Xref: utzoo comp.object:81 comp.databases:3815
Path: utzoo!attcan!uunet!odi!jack
From: jack@odi.com (Jack Orenstein)
Newsgroups: comp.object,comp.databases
Subject: Re: Query optimization in OODB
Message-ID: <1989Oct6.153505.24492@odi.com>
Date: 6 Oct 89 15:35:05 GMT
References: <16958@watdragon.waterloo.edu>
Reply-To: jack@odi.com (Jack Orenstein)
Distribution: comp
Organization: Object Design Inc., Burlington, MA
Lines: 60

In article <16958@watdragon.waterloo.edu> zmhasan@watdragon.waterloo.edu (Ziaul Masum Hasan) writes:
>Could any one out there explain me why the use of a method in a query         
>predicate (<attribute-name operator value>), as an attribute-name or
>as an operator, cause difficulties with query compilation and optimization ?

Query optimization depends on the use of pre-computed results, usually
in the form of an index. For example, suppose you have a set of
people, indexed by name. Then the name index is a data structure that
provides for rapid search on name, and returns a reference to one
person, (or more, if names are not unique). In order to be accurate,
the name index must be kept up to date. If a new person is added to
the set of people, a new entry is needed in the index. If a person's
name changes, this must be reflected in the index.

It is the need to keep an index up to date that makes it difficult to
index on the output from a method or function. Suppose that the same
set of people is indexed by income(), where income() is a function
computed from various pieces of information associated with each
person. It is difficult to predict when the output from the income()
function changes as a result of changes to information about a person.
For example, income() is sensitive to changes in interest rates given
by banks where the person has an account. In order for the index to be
accurate, it would have to be updated in response to each such change.

Another difficulty is due to functions whose value changes rapidly.
Even if the index could, in principle, be kept up to date, it would
not be feasible to do so. E.g., an index on age(), where age is
computed from a stored birthdate and a time-of-day clock would have to
be updated for each clock tick (unless the index maintainer is very
smart). 

If all dependencies can be located, e.g. by analyzing the source code
of the income() function, and of functions it calls, then index
updating would be possible, but I don't know of anyone doing this in
the general case. (Also, this approach just doesn't work if the
function value depends on random numbers, input, or anything not
stored in the database.) Indexes on stored values, which may be
references to other objects, are much easier to update since the
relevant updates are fairly obvious. For further information, see the
paper by Maier & Stein (Stein & Maier?) in the book edited by Shriver
& Wegner (Wegner & Shriver?) and the paper by Shekita and Carey from
the EXODUS group in the 1989 SIGMOD proceedings. These papers talk
about index updating more than optimization.

As for a method appearing as an operator: this is less of a problem,
as long as certain properties are guaranteed. For example, =, <, and >
can be defined for rational numbers, and an index keyed by instances
of a rational number class can be created. As long as the optimizer
knows (because the class definer asserts it) that the functions
implementing =, <, and > for rational numbers behave "as expected",
optimization can proceed. See some of the early POSTGRES papers for
more information along these lines.

Finally, I'm not sure what you mean by difficulties in compilation.
Optimization is the hard problem, but if the optimizer gets stuck, a
"brute-force" strategy can be used.


Jack Orenstein
Object Design, Inc.