Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!gem.mps.ohio-state.edu!ginosko!uunet!munnari.oz.au!bruce!goanna!ajk
From: ajk@goanna.oz (Dr Alan Kent)
Newsgroups: comp.databases
Subject: Re: Parsing Query Languages in the Client or Server
Message-ID: <2445@goanna.oz>
Date: 8 Oct 89 05:26:41 GMT
References: <6155@sybase.sybase.com> <17450@pasteur.Berkeley.EDU> <6253@sybase.sybase.com>
Organization: Comp Sci, RMIT, Melbourne, Australia
Lines: 97

I thought I would add my 2 cents worth to this discussion.

We here at RMIT (Royal Melbourne Institute of Technology) have a
research project (Titan+) which involves building a database system
which supports complex objects. I personally dont think it really
classifies as object oriented, but it does support nested tables and
good text support. We plan to also include structured or tuple types
and links (like refs in LauRel). The system is very definitely
experimental, although the good bits are slowly to be merged with
a commerical database system produced locally. We intend to use the
database system as a basis for a document retrieval package.

Currently we construct expression trees and send them down the communication
channel. As we saw it (when we made the decision), the advantages were

(1) Emedded query language commands only needed to be parsed once when
    the application program was compiled (though some checking still needs
    to be done to make sure the schema had not changed).
(2) There would be lower communication overheads (very debatable!)
(3) Query trees for complex dynamic conditions would be easier to construct.
    Consider a program dynamically forming a query such as
	SELECT *
	FROM PARTS
	WHERE cond1 AND (cond2 OR cond3) OR cond4
    The problem with creating this text dynamically is knowing when you
    need the string "AND", "OR", parenthises etc. This is much easier
    with expression trees (or so we have found in practice).
(4) Sending strings or binary data is easy as you dont have to be careful
    of quote characters in strings (e.g. consider   'John's car').

The current programming interface consists of a large library of small
functions for building expression trees.  For a quick and dirty embedded
query language preprocessor, embedded queries were mapped to expressions
which returned expression tree pointers. To actually insert constants
into the tree involves calling another function to replace identifier
nodes with constant value nodes. (As a side benefit, identifier nodes
can also be replaced with any other subtree allowing conditions to
be constructed dynamically and inserted into the where clause).

For example
	qry = $ SELECT * FROM PARTS WHERE COLOR = thecolor $;
	qry = tqlsubst(qry, "thecolor", color);

Anyway, after using this approach for a while now, we have found the
following:

(1) When using sockets or pipes etc., the size of the message is negligable.
    The cost of doing a system call or context switch outweighs other costs.
(2) Dynamically formed conditions are fairly rare, but have proved
    *extremely* useful on occasion (especially when dealing with text).
(3) You have to learn a lot to construct expression trees manually in a
    program and is a great source of bugs.
(4) People faint when they see the number of functions provided to build
    expression trees (but the range of functions do make life easier).
(5) Debugging is easier for users when then generate text queries rather
    than expression trees. Even the simple embedded query language above
    was a god-send after building trees manually.
(6) We now have the parser as function in the library that given a string
    returns and expression tree.
(7) Users can see the query generated by say a forms package which can
    help them learn a query language.
(8) We do not use standard SQL as it is not powerful enough to handle the
    data types we support. We now have in fact two query languages to
    access a database - Standard SQL (which can only access some of the
    tables) and our own special query language. They both produce the
    same expression tree structure. If the query language was built
    into the kernel, we would have had to turn SQL into our query language
    at the textual level (which would be harder, but certainly not
    impossible).

One reason we have not gone the approach of most embedded SQL systems
of using actual variable names embedded in the query language
(e.g. SELECT color INTO :variable FROM PARTS) is that while
this works fine for integers and strings, the more complex data
structures we support (such as nested tables) do not map easily
to programming language data types. (Ideas anyone?)

A QUESTION:
  What is the best way to send user entered constants in a query language
  when those constants may contain the string delimiter characters?
  Having a function to turn a string into a string constant is my best
  solution so far (ie. a function given "John's Car" returns "'John\'s Car'")

IN SUMMARY, we have found the expression tree interface to be very flexible,
but dangerous and difficult to learn to use. You can always hide these
details from users if you want to. We however are not a pure relational
database and so have different types of problems.  Also being only an
experimental system, we have not had much user feedback.

PS: We are still using the expression tree interface for the time being.
    Hope this was of some interest to someone.

Dr Alan Kent
Dept. of Comp. Sci, RMIT, 	 ACSnet: ajk@goanna.cs.rmit.oz.au
GPO BOX 2476 V,                  ARPA: ajk%goanna.cs.rmit.oz.au@uunet.uu.net
Melbourne, 3001 		 CSNET: ajk%goanna.cs.rmit.oz.au@australia
AUSTRALIA                        UUCP: ...!uunet!goanna.cs.rmit.oz.au!ajk