Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!think.com!hsdndev!dartvax!L.Carl.Pedersen
From: L.Carl.Pedersen@dartmouth.edu (L. Carl Pedersen)
Newsgroups: comp.databases
Subject: Re: Relational Databases (and why they don't exist)
Message-ID: <1991Jun11.143708.15641@dartvax.dartmouth.edu>
Date: 11 Jun 91 14:37:08 GMT
References: <1991Jun07.063749.7904@chinet.chi.il.us> 
 <1991Jun08.124337.29813@uvmark.uucp> <2421@ccadfa.adfa.oz.au> 
 <25200@oolong.la.locus.com>
Sender: news@dartvax.dartmouth.edu (The News Manager)
Organization: Dartmouth College, Hanover, NH
Lines: 44

In article <25200@oolong.la.locus.com>
jfr@locus.com (Jon Rosen) writes:

> In article <2421@ccadfa.adfa.oz.au> ghm@ccadfa.adfa.oz.au (Geoff Miller) writes:
  (stuff deleted)
> >Normalisation does have advantages for more that the database theorist,
> >though.  Remember that there is a basic law of computing that, if you
> >give a user the chance to enter the same piece of information in two
> >different places you will inevitably end up with two different versions.
> >Normalisation is largely about making sure that each piece of information
> >is stored once and once only.
>   
>  Not true... normalization is mostly about reducing a data model to its
>  component parts... what this does is generally eliminate redundant
>  ATTRIBUTEs, but KEYS are not only repeated, but often repeated
>  many times...  In fact, I have done a comparison of several large
>  systems that were implemented in RAMIS, a mostly hierarchical 4GL
>  database and then reimplemented in DB2... The use of a hierarchy was
>  far more capable of eliminating redundancies than the relational
>  version...

I agree with the Geoff.  Normalization is primarily intended to
eliminate update anomolies.  In fact, the various normal forms all have
associated update anomolies.

Jon is certainly right, though, that keys end up being repeated all
over the place.  However, I think a relational purist would argue that
keys should be totally meaningless - internally generated surrogate
keys, so that you would never have to change them.  You would not, for
example, use a Social Security Number as a key, since it might turn out
to be wrong.  An SSN would be just another attribute of a person.  Some
people actually implement things this way, but, of course, this
approach has its own problems.  

I'm not certain about this, but I think Codd has published a proposal
for a relational system that essentially requires the use of surrogate
keys, but also manages them more or less automatically, making them a
more palatable.  And, of course, some of the newer object-oriented
systems address this issue by allowing one to reference the object
directly rather than a key pointing to the object.

I'd be interested in people's experience with using surrogate keys,
i.e., internally generated keys that are never revealed to the user,
instead of real-world keys like SSN's.