Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!van-bc!tfic.bc.ca!clh From: clh@tfic.bc.ca (Chris Hermansen) Newsgroups: comp.databases Subject: Re: Outerjoin implementation? Message-ID: <1990Sep28.225442.4995@tfic.bc.ca> Date: 28 Sep 90 22:54:42 GMT References: <1422@meaddata.meaddata.com> Reply-To: clh@tacitus.UUCP (Chris Hermansen) Distribution: comp Organization: Timberline Forest Inventory Consultants Lines: 136 In article <1422@meaddata.meaddata.com> meaddata!gordon@uunet.uu.net writes: > >In article <1990Sep20.161204.13014@tfic.bc.ca>, clh@tfic.bc.ca (Chris >Hermansen) writes: >|> In article >cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >|> >In article pcg@cs.aber.ac.uk >|> >(Piercarlo Grandi) writes: [stuff deleted regarding use of nulls, particularly in outer joins] >Absolutely, as a matter of fact, I think the NULL is completely necessary for >the accurate representation of data. If, for example, you have a integer >attribute called status, and you don't currently know the value, what >do you assign to status? I guess you could look through the possible domain >of status and then pick a value outside that domain to represent unknown, >but then you have to heavily document this so maintenance people know what >you did, don't you think? OK, I'll bite... suppose I have a variable called status. Why is NULL better than a value of "unknown" (assuming it's a character value)? I would claim that in fact "unknown" is superior, as it is at least somewhat self documenting (NULL, by definition, is NOT); furthermore, I can easily spot the "unknown" rows in any printout or report; finally, I can easily count the rows of particular status in the table with one select statement (whereas I need two with NULL values allowed, as rows with NULL values do not group together). [more stuff deleted] >Well the above is really placing two distinct constraints on the update. >Constraint 1: qqsv cannot have a known value of 0. >Constraint 2: qqsv cannot have an unknown value. > >If qqsv was guaranteed to be known, you can always declare it as *not null*. This may relate to the Informix SQL comment below. >Then, Constraint 2 would become unecessary. If you suggest an arbitrary value >for unknowns, what would the statement be? > > ubdate foo set bar = 37 where qqsv != 0 and qqsv != -99999999 > >This is easier than using null? I don't think so. I don't think it's any more difficult! Maybe this isn't the best example, but I guess it depends on why the NULLs are in qqsv in the first place. If they got there because of a low level of quality control in the data input phase, I might prefer to see the data input stuff (screen forms or whatever) cleaned up; if they got there because someone genuinely wanted to show that qqsv didn't have a value entered into it, I guess it's really a tossup as to whether one uses NULL or some other value (that could potentially have more `local' meaning to the application). As you say above, outer joins make a strong case for the existence of a "well defined" undefined value. > >|> >|> [text deleted...] >|> >|> Note that I use Informix SQL, so your lack of mileage may vary on the above > > >????? The above statement makes no sense. ????? Aww, c'mon! All I'm saying is that I'm basing my comments on the RDBMS I use, not on the ANSI standard, nor on Quel/Ingres which was mentioned earlier on. >Maybe you have difficulty undersanding the proper use of null. It means, "I >don't know what this value is." I'll use an example from Date, "... a >shipment record might contain a null quantity value (we know that the >shipment exists, but we do not know the quantity shipped)..." I'd be the first to admit that I may have difficulty understanding the use of NULL. But think about your example; why should the input screen have let someone enter the fact that a shipment was made without knowing the number of items shipped? How would you propose to send the customer a bill, if you don't know whether you sent her/him 3 SPARCStations or 3000? Now suppose that you're the boss of the shipping department, and you find out that your programmer decided to allow NULLs in the amount_shipped field by casually noting that your warehouse contained no more SPARCStations because someone on the loading dock didn't bother to fill in the waybill properly and the fast-knuckled keypuncher blithely skipped over the amount_shipped field, so not only are you going to have to call 200 customers to find out how many you sent them, but you're going to have to call head office and tell them that your order for more stock should have had more than zero items on it. And in any case, I would claim that a zero would do as well as a NULL here, since most shipping departments would hardly bother to ship zero units (let alone generate a shipping report with zero units on it). If you're doing an outer join, a better(?) example of NULLs cropping up might be orders entered but not yet shipped... > >If you find that you have lots of nulls, then maybe your schema is not >properly >normalized. Try checking for transitive dependencies, I have found that they >can contribute to an excessive amount of nulls. Naw, all my schemae are perfectly normalized :-) :-) :-) >Null is a very important semantic concept. There have been many times I would >have loved to have null when writing 3GL applications (I had to rely on -999). >As for additional constraints, anytime you deal with an uknown you have take >extra measures to account for it. > >What do you think? Well, back to the status = "unknown" vs status is null; I don't feel that there is any innate superiority of the null value here, as long as one has control over the value of the field (ie, it's not the result of an outer join). Seems to me that one is semantically using two fields in a lot of cases: qqsv and qqsv_status. Finally, if I have a table with 6,000,000 records in it, with status NULL in 5,999,999 and status = "complete" in the other, and I use ISQL to execute the following: output to printer select status,count(*) from foo group by status order by status; I will end up with 100,000 PAGES of output on my laser printer. On the other hand, if those 5,999,999 records have status = "unknown", I'll get one page. Again, I (feel I) should emphasize that this particular behaviour may not occur in ALL SQLs, or even be predicted by the ANSI standard. Chris Hermansen Timberline Forest Inventory Consultants Voice: 1 604 733 0731 302 - 958 West 8th Avenue FAX: 1 604 733 0634 Vancouver B.C. CANADA clh@tfic.bc.ca V5Z 1E5 C'est ma facon de parler.