Path: utzoo!attcan!uunet!meaddata!gordon
From: gordon@meaddata.com (Gordon Edwards)
Newsgroups: comp.databases
Subject: Re: Outerjoin implementation?
Message-ID: <1490@meaddata.meaddata.com>
Date: 1 Oct 90 15:04:51 GMT
References: <1422@meaddata.meaddata.com> <1990Sep28.225442.4995@tfic.bc.ca>
Sender: usenet@meaddata.com
Reply-To: meaddata!gordon@uunet.uu.net
Distribution: comp
Organization: Mead Data Central, Dayton OH
Lines: 185


In article <1990Sep28.225442.4995@tfic.bc.ca>, clh@tfic.bc.ca (Chris
Hermansen) writes:
|> In article <1422@meaddata.meaddata.com> meaddata!gordon@uunet.uu.net writes:
|> 
[lots of stuff deleted. -gordon]

|> in the table with one select statement (whereas I need two with NULL
values allowed,
|> as rows with NULL values do not group together).

Really?  This must be a quirk of Informix.  I have used Informix a little, but
most of my time has been spent in Sybase and Ingres, both of which group
NULLs.  
|> 
|> [more stuff deleted]
|> 
|> >Well the above is really placing two distinct constraints on the update.
|> >Constraint 1:  qqsv cannot have a known value of 0.
|> >Constraint 2:  qqsv cannot have an unknown value.
|> >
|> >If qqsv was guaranteed to be known, you can always declare it as
*not null*.
|> 
|> This may relate to the Informix SQL comment below.
|> 
|> >Then, Constraint 2 would become unecessary.  If you suggest an
arbitrary value
|> >for unknowns, what would the statement be?
|> >
|> >	ubdate foo set bar = 37 where qqsv != 0 and qqsv != -99999999
|> >
|> >This is easier than using null?  I don't think so.
|> 
|> I don't think it's any more difficult!  Maybe this isn't the best
example, but
|> I guess it depends on why the NULLs are in qqsv in the first place.  If they
|> got there because of a low level of quality control in the data input phase,
|> I might prefer to see the data input stuff (screen forms or whatever)
cleaned
|> up; if they got there because someone genuinely wanted to show that
qqsv didn't
|> have a value entered into it, I guess it's really a tossup as to whether one
|> uses NULL or some other value (that could potentially have more
`local' meaning
|> to the application).

As the DBA, it might not be possible to control applications.  After the system
is running for two months, a manager might want to interface a spreadsheet to 
the DBMS.  At that point, unless you have constraints in your DBMS, you have no
control.

|> 
|> As you say above, outer joins make a strong case for the existence of a
|> "well defined" undefined value.
|> 
|> >
|> >|> 
|> >|> [text deleted...]
|> >|> 
|> >|> Note that I use Informix SQL, so your lack of mileage may vary on
the above
|> >
|> >
|> >????? The above statement makes no sense. ?????
|> 
|> Aww, c'mon!  All I'm saying is that I'm basing my comments on the
RDBMS I use,
|> not on the ANSI standard, nor on Quel/Ingres which was mentioned earlier on.
|> 

OK.  It sort of sounded like, "Informix is great and you other losers..."  I
just wanted to double check.

|> >Maybe you have difficulty undersanding the proper use of null.  It
means, "I
|> >don't know what this value is."  I'll use an example from Date, "... a 
|> >shipment record might contain a null quantity value (we know that the 
|> >shipment exists, but we do not know the quantity shipped)..."

Previous comment by me was a bit unfair, I apologize.

|> 
|> I'd be the first to admit that I may have difficulty understanding
the use of
|> NULL.  But think about your example; why should the input screen have let
|> someone enter the fact that a shipment was made without knowing the number
|> of items shipped?  How would you propose to send the customer a bill, if
|> you don't know whether you sent her/him 3 SPARCStations or 3000? Now suppose
|> that you're the boss of the shipping department, and you find out that your
|> programmer decided to allow NULLs in the amount_shipped field by casually
|> noting that your warehouse contained no more SPARCStations because someone
|> on the loading dock didn't bother to fill in the waybill properly and the
|> fast-knuckled keypuncher blithely skipped over the amount_shipped field,
|> so not only are you going to have to call 200 customers to find out how
|> many you sent them, but you're going to have to call head office and
|> tell them that your order for more stock should have had more than zero
|> items on it.

OK, so Date's example sucks. :-)  Once again, as the DBA, you might not have
control over front-end constraint enforcement.  In your example above, I would
have a number of checks to catch such situations (eg. end of day routines to
catch NULL shipment entries, dock verification of shipment quantities, etc).

|> 
|> And in any case, I would claim that a zero would do as well as a NULL
here, since
|> most shipping departments would hardly bother to ship zero units (let alone
|> generate a shipping report with zero units on it).
|> 
|> 
|> If you're doing an outer join, a better(?) example of NULLs cropping
up might be
|> orders entered but not yet shipped...
|> 
|> >
|> >If you find that you have lots of nulls, then maybe your schema is not
|> >properly 
|> >normalized.  Try checking for transitive dependencies, I have found
that they
|> >can contribute to an excessive amount of nulls.
|> 
|> Naw, all my schemae are perfectly normalized :-) :-) :-)

Isn't everyones? 8-0

|> 
|> >Null is a very important semantic concept.  There have been many
times I would
|> >have loved to have null when writing 3GL applications (I had to rely
on -999).
|> >As for additional constraints, anytime you deal with an uknown you
have take
|> >extra measures to account for it.
|> >
|> >What do you think?
|> 
|> Well, back to the status = "unknown" vs status is null; I don't feel
that there
|> is any innate superiority of the null value here, as long as one has
control over
|> the value of the field (ie, it's not the result of an outer join). 
Seems to me
|> that one is semantically using two fields in a lot of cases: qqsv and
qqsv_status.
|> 
|> Finally, if I have a table with 6,000,000 records in it, with status
NULL in 5,999,999
|> and status = "complete" in the other, and I use ISQL to execute the
following:
|> 
|> 	output to printer
|> 	select status,count(*) from foo
|> 	group by status
|> 	order by status;
|> 
|> I will end up with 100,000 PAGES of output on my laser printer.  On
the other hand,
|> if those 5,999,999 records have status = "unknown", I'll get one page.
|> 
|> Again, I (feel I) should emphasize that this particular behaviour may
not occur in
|> ALL SQLs, or even be predicted by the ANSI standard.
|> 

You previous example may indeed be specific to Informix, and if this were true
globally, I would put unknown in also. :-)

I still maintain NULLs are important, however, in light of past conversations,
I think I should add some remarks.

1.  As I said before, if defaults are appropriate, use them.

2.  I agree with those who say NULLs are not perfect.

3.  If anything, specific UNKNOWN and INAPPROPRIATE values should be added.

I thought your previous post recommended that the concept of NULL was
wrong.  If
you are arguing that NULL needs to be more specific, then I probably agree with
you.

Sorry, for any misunderstandings.

-- Gordon     (gordon@meaddata.com)