Xref: utzoo comp.object:81 comp.databases:3815 Path: utzoo!attcan!uunet!odi!jack From: jack@odi.com (Jack Orenstein) Newsgroups: comp.object,comp.databases Subject: Re: Query optimization in OODB Message-ID: <1989Oct6.153505.24492@odi.com> Date: 6 Oct 89 15:35:05 GMT References: <16958@watdragon.waterloo.edu> Reply-To: jack@odi.com (Jack Orenstein) Distribution: comp Organization: Object Design Inc., Burlington, MA Lines: 60 In article <16958@watdragon.waterloo.edu> zmhasan@watdragon.waterloo.edu (Ziaul Masum Hasan) writes: >Could any one out there explain me why the use of a method in a query >predicate (), as an attribute-name or >as an operator, cause difficulties with query compilation and optimization ? Query optimization depends on the use of pre-computed results, usually in the form of an index. For example, suppose you have a set of people, indexed by name. Then the name index is a data structure that provides for rapid search on name, and returns a reference to one person, (or more, if names are not unique). In order to be accurate, the name index must be kept up to date. If a new person is added to the set of people, a new entry is needed in the index. If a person's name changes, this must be reflected in the index. It is the need to keep an index up to date that makes it difficult to index on the output from a method or function. Suppose that the same set of people is indexed by income(), where income() is a function computed from various pieces of information associated with each person. It is difficult to predict when the output from the income() function changes as a result of changes to information about a person. For example, income() is sensitive to changes in interest rates given by banks where the person has an account. In order for the index to be accurate, it would have to be updated in response to each such change. Another difficulty is due to functions whose value changes rapidly. Even if the index could, in principle, be kept up to date, it would not be feasible to do so. E.g., an index on age(), where age is computed from a stored birthdate and a time-of-day clock would have to be updated for each clock tick (unless the index maintainer is very smart). If all dependencies can be located, e.g. by analyzing the source code of the income() function, and of functions it calls, then index updating would be possible, but I don't know of anyone doing this in the general case. (Also, this approach just doesn't work if the function value depends on random numbers, input, or anything not stored in the database.) Indexes on stored values, which may be references to other objects, are much easier to update since the relevant updates are fairly obvious. For further information, see the paper by Maier & Stein (Stein & Maier?) in the book edited by Shriver & Wegner (Wegner & Shriver?) and the paper by Shekita and Carey from the EXODUS group in the 1989 SIGMOD proceedings. These papers talk about index updating more than optimization. As for a method appearing as an operator: this is less of a problem, as long as certain properties are guaranteed. For example, =, <, and > can be defined for rational numbers, and an index keyed by instances of a rational number class can be created. As long as the optimizer knows (because the class definer asserts it) that the functions implementing =, <, and > for rational numbers behave "as expected", optimization can proceed. See some of the early POSTGRES papers for more information along these lines. Finally, I'm not sure what you mean by difficulties in compilation. Optimization is the hard problem, but if the optimizer gets stuck, a "brute-force" strategy can be used. Jack Orenstein Object Design, Inc.