Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!rutgers!netnews.upenn.edu!grad2.cis.upenn.edu!aaron From: aaron@grad2.cis.upenn.edu (Aaron Watters) Newsgroups: comp.databases Subject: Re: outerjoins Message-ID: <28495@netnews.upenn.edu> Date: 14 Aug 90 21:57:40 GMT References: <28462@netnews.upenn.edu> <13592@ulysses.att.com> Sender: news@netnews.upenn.edu Reply-To: aaron@grad2.cis.upenn.edu.UUCP (Aaron Watters) Organization: University of Pennsylvania Lines: 42 In article <13592@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes: > >I don't quite see why the outerjoin lies on the fringes of the *standard >relational paradigm*. What differentiates it from other operators with >respect to the paradigm? > The outerjoin(s) introduce null values. If you treat null values as you would any other value then then the outerjoin doesn't cause any problems. More intelligent systems (for example Informix -- since this is the only manual I have handy) treat null values as a special `unknown' value and, justifiably, do not define certain relational operations on relations that include nulls in certain circumstances. Consider a single relation Parental(parent,child) where either parent or child is allowed to be null whenever the parents or children for an individual aren't known. Using datalog type notatoin the natural way to find grandparents is with the rule Grandparent(Person) if Parental(Person,Parent) and Parental(Parent,Child). But this will give the wrong answer if we treat nulls as ordinary values, what you need, apparently, is CertainGparent(Person) if Parental(Person,Parent) and Parental(Parent,Child) and Parent<>NULL. which gives you people who are certainly grandparents, or alternatively MaybeGparent(Person) if Parental(Person,NULL). MaybeGparent(Person) if Parental(Person,Parent) and Parental(Parent,Child). which gives you people who might well be grandparents. Since the first rule is dangerous and misleading in the presence of NULLs, some relational systems require the user to explicitly handle NULLs, as in the latter two examples (correctly, I think). If anyone is interested in extensions to the relational approach that deal with nulls in a more automatic and uniform manner, let me know. -aaron (PS: Apologies to those not used to datalog-type notations. Also, this problem occurs if we interpret NULLs in other ways.)