Path: utzoo!attcan!uunet!meaddata!gordon From: gordon@meaddata.com (Gordon Edwards) Newsgroups: comp.databases Subject: Re: Outerjoin implementation? Message-ID: <1490@meaddata.meaddata.com> Date: 1 Oct 90 15:04:51 GMT References: <1422@meaddata.meaddata.com> <1990Sep28.225442.4995@tfic.bc.ca> Sender: usenet@meaddata.com Reply-To: meaddata!gordon@uunet.uu.net Distribution: comp Organization: Mead Data Central, Dayton OH Lines: 185 In article <1990Sep28.225442.4995@tfic.bc.ca>, clh@tfic.bc.ca (Chris Hermansen) writes: |> In article <1422@meaddata.meaddata.com> meaddata!gordon@uunet.uu.net writes: |> [lots of stuff deleted. -gordon] |> in the table with one select statement (whereas I need two with NULL values allowed, |> as rows with NULL values do not group together). Really? This must be a quirk of Informix. I have used Informix a little, but most of my time has been spent in Sybase and Ingres, both of which group NULLs. |> |> [more stuff deleted] |> |> >Well the above is really placing two distinct constraints on the update. |> >Constraint 1: qqsv cannot have a known value of 0. |> >Constraint 2: qqsv cannot have an unknown value. |> > |> >If qqsv was guaranteed to be known, you can always declare it as *not null*. |> |> This may relate to the Informix SQL comment below. |> |> >Then, Constraint 2 would become unecessary. If you suggest an arbitrary value |> >for unknowns, what would the statement be? |> > |> > ubdate foo set bar = 37 where qqsv != 0 and qqsv != -99999999 |> > |> >This is easier than using null? I don't think so. |> |> I don't think it's any more difficult! Maybe this isn't the best example, but |> I guess it depends on why the NULLs are in qqsv in the first place. If they |> got there because of a low level of quality control in the data input phase, |> I might prefer to see the data input stuff (screen forms or whatever) cleaned |> up; if they got there because someone genuinely wanted to show that qqsv didn't |> have a value entered into it, I guess it's really a tossup as to whether one |> uses NULL or some other value (that could potentially have more `local' meaning |> to the application). As the DBA, it might not be possible to control applications. After the system is running for two months, a manager might want to interface a spreadsheet to the DBMS. At that point, unless you have constraints in your DBMS, you have no control. |> |> As you say above, outer joins make a strong case for the existence of a |> "well defined" undefined value. |> |> > |> >|> |> >|> [text deleted...] |> >|> |> >|> Note that I use Informix SQL, so your lack of mileage may vary on the above |> > |> > |> >????? The above statement makes no sense. ????? |> |> Aww, c'mon! All I'm saying is that I'm basing my comments on the RDBMS I use, |> not on the ANSI standard, nor on Quel/Ingres which was mentioned earlier on. |> OK. It sort of sounded like, "Informix is great and you other losers..." I just wanted to double check. |> >Maybe you have difficulty undersanding the proper use of null. It means, "I |> >don't know what this value is." I'll use an example from Date, "... a |> >shipment record might contain a null quantity value (we know that the |> >shipment exists, but we do not know the quantity shipped)..." Previous comment by me was a bit unfair, I apologize. |> |> I'd be the first to admit that I may have difficulty understanding the use of |> NULL. But think about your example; why should the input screen have let |> someone enter the fact that a shipment was made without knowing the number |> of items shipped? How would you propose to send the customer a bill, if |> you don't know whether you sent her/him 3 SPARCStations or 3000? Now suppose |> that you're the boss of the shipping department, and you find out that your |> programmer decided to allow NULLs in the amount_shipped field by casually |> noting that your warehouse contained no more SPARCStations because someone |> on the loading dock didn't bother to fill in the waybill properly and the |> fast-knuckled keypuncher blithely skipped over the amount_shipped field, |> so not only are you going to have to call 200 customers to find out how |> many you sent them, but you're going to have to call head office and |> tell them that your order for more stock should have had more than zero |> items on it. OK, so Date's example sucks. :-) Once again, as the DBA, you might not have control over front-end constraint enforcement. In your example above, I would have a number of checks to catch such situations (eg. end of day routines to catch NULL shipment entries, dock verification of shipment quantities, etc). |> |> And in any case, I would claim that a zero would do as well as a NULL here, since |> most shipping departments would hardly bother to ship zero units (let alone |> generate a shipping report with zero units on it). |> |> |> If you're doing an outer join, a better(?) example of NULLs cropping up might be |> orders entered but not yet shipped... |> |> > |> >If you find that you have lots of nulls, then maybe your schema is not |> >properly |> >normalized. Try checking for transitive dependencies, I have found that they |> >can contribute to an excessive amount of nulls. |> |> Naw, all my schemae are perfectly normalized :-) :-) :-) Isn't everyones? 8-0 |> |> >Null is a very important semantic concept. There have been many times I would |> >have loved to have null when writing 3GL applications (I had to rely on -999). |> >As for additional constraints, anytime you deal with an uknown you have take |> >extra measures to account for it. |> > |> >What do you think? |> |> Well, back to the status = "unknown" vs status is null; I don't feel that there |> is any innate superiority of the null value here, as long as one has control over |> the value of the field (ie, it's not the result of an outer join). Seems to me |> that one is semantically using two fields in a lot of cases: qqsv and qqsv_status. |> |> Finally, if I have a table with 6,000,000 records in it, with status NULL in 5,999,999 |> and status = "complete" in the other, and I use ISQL to execute the following: |> |> output to printer |> select status,count(*) from foo |> group by status |> order by status; |> |> I will end up with 100,000 PAGES of output on my laser printer. On the other hand, |> if those 5,999,999 records have status = "unknown", I'll get one page. |> |> Again, I (feel I) should emphasize that this particular behaviour may not occur in |> ALL SQLs, or even be predicted by the ANSI standard. |> You previous example may indeed be specific to Informix, and if this were true globally, I would put unknown in also. :-) I still maintain NULLs are important, however, in light of past conversations, I think I should add some remarks. 1. As I said before, if defaults are appropriate, use them. 2. I agree with those who say NULLs are not perfect. 3. If anything, specific UNKNOWN and INAPPROPRIATE values should be added. I thought your previous post recommended that the concept of NULL was wrong. If you are arguing that NULL needs to be more specific, then I probably agree with you. Sorry, for any misunderstandings. -- Gordon (gordon@meaddata.com)