Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!gem.mps.ohio-state.edu!ginosko!uunet!munnari.oz.au!bruce!goanna!ajk From: ajk@goanna.oz (Dr Alan Kent) Newsgroups: comp.databases Subject: Re: Parsing Query Languages in the Client or Server Message-ID: <2445@goanna.oz> Date: 8 Oct 89 05:26:41 GMT References: <6155@sybase.sybase.com> <17450@pasteur.Berkeley.EDU> <6253@sybase.sybase.com> Organization: Comp Sci, RMIT, Melbourne, Australia Lines: 97 I thought I would add my 2 cents worth to this discussion. We here at RMIT (Royal Melbourne Institute of Technology) have a research project (Titan+) which involves building a database system which supports complex objects. I personally dont think it really classifies as object oriented, but it does support nested tables and good text support. We plan to also include structured or tuple types and links (like refs in LauRel). The system is very definitely experimental, although the good bits are slowly to be merged with a commerical database system produced locally. We intend to use the database system as a basis for a document retrieval package. Currently we construct expression trees and send them down the communication channel. As we saw it (when we made the decision), the advantages were (1) Emedded query language commands only needed to be parsed once when the application program was compiled (though some checking still needs to be done to make sure the schema had not changed). (2) There would be lower communication overheads (very debatable!) (3) Query trees for complex dynamic conditions would be easier to construct. Consider a program dynamically forming a query such as SELECT * FROM PARTS WHERE cond1 AND (cond2 OR cond3) OR cond4 The problem with creating this text dynamically is knowing when you need the string "AND", "OR", parenthises etc. This is much easier with expression trees (or so we have found in practice). (4) Sending strings or binary data is easy as you dont have to be careful of quote characters in strings (e.g. consider 'John's car'). The current programming interface consists of a large library of small functions for building expression trees. For a quick and dirty embedded query language preprocessor, embedded queries were mapped to expressions which returned expression tree pointers. To actually insert constants into the tree involves calling another function to replace identifier nodes with constant value nodes. (As a side benefit, identifier nodes can also be replaced with any other subtree allowing conditions to be constructed dynamically and inserted into the where clause). For example qry = $ SELECT * FROM PARTS WHERE COLOR = thecolor $; qry = tqlsubst(qry, "thecolor", color); Anyway, after using this approach for a while now, we have found the following: (1) When using sockets or pipes etc., the size of the message is negligable. The cost of doing a system call or context switch outweighs other costs. (2) Dynamically formed conditions are fairly rare, but have proved *extremely* useful on occasion (especially when dealing with text). (3) You have to learn a lot to construct expression trees manually in a program and is a great source of bugs. (4) People faint when they see the number of functions provided to build expression trees (but the range of functions do make life easier). (5) Debugging is easier for users when then generate text queries rather than expression trees. Even the simple embedded query language above was a god-send after building trees manually. (6) We now have the parser as function in the library that given a string returns and expression tree. (7) Users can see the query generated by say a forms package which can help them learn a query language. (8) We do not use standard SQL as it is not powerful enough to handle the data types we support. We now have in fact two query languages to access a database - Standard SQL (which can only access some of the tables) and our own special query language. They both produce the same expression tree structure. If the query language was built into the kernel, we would have had to turn SQL into our query language at the textual level (which would be harder, but certainly not impossible). One reason we have not gone the approach of most embedded SQL systems of using actual variable names embedded in the query language (e.g. SELECT color INTO :variable FROM PARTS) is that while this works fine for integers and strings, the more complex data structures we support (such as nested tables) do not map easily to programming language data types. (Ideas anyone?) A QUESTION: What is the best way to send user entered constants in a query language when those constants may contain the string delimiter characters? Having a function to turn a string into a string constant is my best solution so far (ie. a function given "John's Car" returns "'John\'s Car'") IN SUMMARY, we have found the expression tree interface to be very flexible, but dangerous and difficult to learn to use. You can always hide these details from users if you want to. We however are not a pure relational database and so have different types of problems. Also being only an experimental system, we have not had much user feedback. PS: We are still using the expression tree interface for the time being. Hope this was of some interest to someone. Dr Alan Kent Dept. of Comp. Sci, RMIT, ACSnet: ajk@goanna.cs.rmit.oz.au GPO BOX 2476 V, ARPA: ajk%goanna.cs.rmit.oz.au@uunet.uu.net Melbourne, 3001 CSNET: ajk%goanna.cs.rmit.oz.au@australia AUSTRALIA UUCP: ...!uunet!goanna.cs.rmit.oz.au!ajk