Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!van-bc!tfic.bc.ca!clh From: clh@tfic.bc.ca (Chris Hermansen) Newsgroups: comp.databases Subject: Re: Outerjoin implementation? Message-ID: <1990Oct2.174006.11192@tfic.bc.ca> Date: 2 Oct 90 17:40:06 GMT References: <1990Sep20.161204.13014@tfic.bc.ca> <11034@sybase.sybase.com> <1990Sep28.231042.5117@tfic.bc.ca> <30354@netnews.upenn.edu> Reply-To: clh@tacitus.UUCP (Chris Hermansen) Distribution: comp Organization: Timberline Forest Inventory Consultants Lines: 86 In article <30354@netnews.upenn.edu> aaron@grad2.cis.upenn.edu.UUCP (Aaron Watters) writes: >[Chris Hermanson argues that nulls are bad...] Geeze, ^^^^^^^^^ it's HermansEn, ok? :-) :-) :-) >In article <1990Sep28.231042.5117@tfic.bc.ca> clh@tacitus.UUCP (Chris Hermansen) writes: >>For example, a field named widget_count is never (realistically) going to >>have a value less than zero, so if you need an "undefined" value for it, use >>-1, or -99999999, or some other nice number. Similarly, a widget_status >>value of "unknown" tells me more than NULL - with NULL, I can't decide whether >>the status is genuinely unknown, or whether the system just doesn't know it. > >While admitting that this last point may be too deep for me, let's >once again consider an example. What if we have two stockrooms >each of which contains a number of widgets. Room A has 10 widgets >and noone knows how many widgets are in room B. The total number of >widgets in both rooms is therefore either 9 or -99999989, of course. That's definitely a problem. One solution is to make the widget count NULL in room B, but then you have either 10 (I think this is the `standard') or NULL (ie undefined; I'm pretty sure this is not the `standard') for the sum. Putting myself in the shoes of the person writing the query, I would say that none of the above answers is very enlightening, unless all I really want to know is if we have five or six (but not eleven!) widgets. My argument was not really that NULLs are pro forma bad, just that I personally don't like the three-state logic that goes with them (and other out-of-band data values) in most cases. I guess I also started to flame about undefined values in general. At the risk of overstating my case, think of the manager who discovers that only ten widgets are available (at $50,000 each) when s/he needs twenty; s/he orders ten more from his/her supplier, ships the ten s/he has, indicates a back-order for his/her customer, and the next time s/he checks the database, finds out there were thirty more in room B. Meanwhile, his/her customer is p*ssed off at the back order and re-orders the other ten from the competition. OK, so that's a bit of a joke, too; my point is that MANY (not ALL) applications shouldn't admit a default value for fields, because the implied semantics of a field (eg widget_count) tend to suggest to the user that there will always be a `reasonable'. David Masterson(?sp) suggested that it might be better to allow defaults (eg NULLs) at data entry time (since, presumably, there are many cases where the data entry person may be unable to supply the missing value, and may even do something weird like enter 5 just to complete the form). I would claim that the best thing to do is bounce the form back to the originator for fixing. >That was a joke, I presume you realize, but how does one get around >this summation anomaly? It seems you need to have something analogous >to the following: > If all widgetcounts are positive then X=sum(widgetcount) > otherwise X=-1. >Furthermore, if you have a large number of conventions like this >more complex queries (especially if you have complex views) can >become quite baroque. If you use actual null values, however, >the queries are still somewhat baroque, but easier to understand. > >A personal interest of mine is the question of whether the >programmer can ignore the possibility of nulls entirely and have >the system handle them in a logically uniform way -- which, I >argue is not currently done automatically. -aaron. I would agree; I think your If... is the best solution (ie, if there are NULLs or other out-of-band values in the set, then the sum, average, etc should probably be out-of-band as well). I don't think this is the way the standard reads; I'm also fairly sure that select sum(widget_count) from fooey and select avg(widget_count) from fooey will behave as though the rows with NULLs in widget count were not really in the database. Certainly this is logical; whether you can therefore ignore the presence of NULLs is really up to you :-) Chris Hermansen Timberline Forest Inventory Consultants Voice: 1 604 733 0731 302 - 958 West 8th Avenue FAX: 1 604 733 0634 Vancouver B.C. CANADA clh@tfic.bc.ca V5Z 1E5 C'est ma facon de parler.