Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.csd.uwm.edu!cs.utexas.edu!uunet!odi!jack From: jack@odi.com (Jack Orenstein) Newsgroups: comp.databases Subject: OO DBMSs (was Re: Extended RDB vs OODB) Message-ID: <1989Aug21.132525.3179@odi.com> Date: 21 Aug 89 13:25:25 GMT References: <3560052@wdl1.UUCP> <411@odi.ODI.COM> <19@dgis.daitc.mil> <1989Aug14.140128.15094@odi.com> <27@dgis.daitc.mil> Reply-To: jack@odi.com (Jack Orenstein) Organization: Object Design Inc., Burlington, MA Lines: 105 Here are replies to some recent questions that have come up in the OO DBMS discussions. The answers are, for the most part, specific to the OO DBMS being built at Object Design, but will often, I believe, apply to competitors' products as well. David Masterson writes: Based on Jack Orenstein's message, I have a couple of questions: 1. In implementing an OODB on top of C++ using the notion of persistent and transient type objects, when you refer to information in the OODB, is it always by an object identifier? How, therefore, would you find objects meeting some qualification if you don't know its identifier? Is this even a type of query you would ask in an OODB world? (you ALWAYS know the identifier because even a qualification would be wrapped in an object which contains the identifier?) It will often be the case that object ids are known because they are stored in persistent variables. For example, a persistent variable of type part* stores the id of a part. In other cases, an id will not be known, but properties of the object can be described as part of a query. Queries are expressed using existing C++ syntax for control (i.e. boolean) expressions. For example, given a set of parts (which may contain both transient and persistent instances), queries can be written to ask for all parts whose weight exceeds a given amount, all parts containing a given sub-part, all parts contained in a given part, etc. Compound queries can be expressed also, e.g. find all parts containing a frammis-joint linkage that were manufactured by Acme. 2. Again using the architecture of persistent and transient objects, is a persistent object ever in memory? Or is it just a transient copy of a persistent object that is in memory? Then, how are persistent objects created? Yes, the persistent objects themselves are manipulated by applications. Copying isn't good enough since a copy of an object has a different identity. (This might not be true in other languages, but the idea of equating an object's id with its address is fundamental to C++. Of course, it is possible to define a base "object" class, define it to have an "id" data member, redefine initialization, =, and == to work off this id, and then use "object" to derive all other classes, but the space and time overhead will be significant). One example of the difficulties that arise is that pointers to an object do not point to copies of the object. Copies of objects can be made, as is usually the case in C++, and the semantics of C++ are preserved. I.e., the copy is a distinct object. From D. C. Martin: Dan Weinreb of ODI writes: There should not be any special declaration for "pointers to persistent" or "pointers to possibly persistent" data as distinct from ordinary pointers. It would be nice if no one ever had to consider if a pointer was persistent or non-persistent, but someone will have to build the access methods and other low-level interface routines to your storage manager in order to provide this type of "pointer swizzling" to the application developer. At UW - Madison the Exodus Project is developing a language called E, which is a persistent C++ language designed to allow an individual to write an her own access methods, and to a certain extent pointers to resident objects are equivalent to persistent. However, for this equivalency the pointer types must be DB pointers, i.e. dbchar* != char*, but a persistent dbchar* is equivalent to a non-persistent dbchar*. We are very familiar with the Exodus project, and with the E language. While the type system of E is far preferable to that of a typical host-language/DBMS combination, it still has two distinct, but "parallel", type systems, and programmers have to be careful about the use of db types. In our product, there will be a single type system, that of C++. There is no fundamental reason why persistent and transient types have to be distinguished in the language used by the application programmer. Unfortunately, the details of how "swizzling" works are proprietary, so I can't discuss the issue. In particular, de-referencing a pointer has exactly the same semantics and syntax regardless of whether the objects are persistent or transient. In general, data manipulation (storing, fetching, testing, adding, printing, field extraction, function calling, casting) looks exactly as it does for normal C++. What about dereferencing a pointer to a 40mb image? Does this mean bringing the entire image into core? There must be some low-level routines to allow the application programmer to inform the language that certain special methods should be used to store, fetch, etc... for special datatypes. I'll have to take the 5th again, but I will say that there is no need to bring in the entire 40mb image just to retrieve one byte of it. Jack Orenstein Object Design, Inc.