Path: utzoo!attcan!uunet!husc6!cs.utexas.edu!natinst!bloom!bobd From: bobd@bloom.UUCP (Bob Donaldson) Newsgroups: comp.databases Subject: Referential Integrity Keywords: CSDate, not just BLOCK Message-ID: <560@bloom.UUCP> Date: 22 Dec 88 17:25:47 GMT Lines: 135 The discussion on binary data types has prompted me to ask another feature- related question - Why don't any of the RDBMS packages we all know and love REALLY support referential integrity?? HOLD ON - before you all chime in with 'but MINE does', let me say that I do NOT mean the simplistic approach that is so often implemented that merely prevents you from inserting or deleting a record that would violate the principle. Let me explain. C.S. Date is the source for most of my terminology, but let me clarify what I mean so you can disagree with that, rather than what you think I might have meant. Foreign Key The attribute in relation A that uniquely identifies a tuple in some (probably different) relation B. Target The specific tuple identified by some instance of a foreign key BLOCK An action taken by the system that prevents a user-specified action from completing if it violates referential integrity CASCADE An action taken by the system that propagates a user-specified action to other tuples and/or relations in order to maintain referential integrity NULLIFY An action taken by the system to set foreign keys to NULL in order to maintain referential integrity without cascading. NOTE - SEE BELOW ON EXISTENCE DEPENDENCY BEFORE YOU JUMP TO A CONFUSION. Referential Integrity "If an attribute is designated as a foreign key, then the database may not assume a state such that that attribute contains a value which does not exist in some target tuple." Existence-Dependence Note that the definition of referential integrity does NOT rule out NULL foreign keys. This is because some real-world applications require existence INdependence, rather than existence dependence. This can best be illustrated with an example. A student may sign up for a course, and the information about this might be in the ENROLLMENT table. Clearly, if either the student or the course do not exist, the ENROLLMENT entry will be incon- sistent. The same student may (or may not) have an advisor. This information might be stored in the STUDENT table. Clearly, if he HAS an advisor, the advisor must exist (i.e. he must have an entry in the PROFESSOR table), but it is also possible that no advisor is declared (STUDENT.advisor = NULL). The first case illustrates that ENROLLMENT is existence-dependent on STUDENT and COURSE. The second ilustrates that STUDENT is existence-independent of PROFESSOR. NOW, if anybody is still with me - Here is what I WANT: create table ENROLLMENT attribute student_id foreign key into STUDENT (id) /* referential integrity; join field is STUDENT.id */ delete of target CASCADES /* If the student quits, delete the enrollment too */ modify of target BLOCKED /* Don't allow the student id to change (at least if he is currently enrolled */ attribute course_id foreign key into COURSE (num) /* referential integrity; join field is COURSE.num */ delete of target CASCADES /* If you cancel the course, get rid of the associated enrollments */ modify of target CASCADES /* If you renumber the courses, fix the enrollments too */ attribute ... ... create table STUDENT attribute student_id attribute advisor foreign key into PROFESSOR (name) /* referential integrity; join field is PROFESSOR.name */ delete of target NULLIFIES /* If my advisor quits, let him (just don't throw me out too) */ modify of target CASCADES /* If my professor suffers an identity crisis and changes his name, don't leave me hanging there without an advisor */ attribute ... ... I obviously stretched a little for these examples; PLEASE DON'T CRITIQUE THE DATABASE DESIGN! That's not the point. My point is that while I recognize that such things as CASCADES are both demanding in terms of resources and also potentially dangerous, I maintain that the database designer NEEDS to specify these various nuances of referential integrity SOMEHOW. Currently, we must choose from several unattractive options: #1 Document the decision and pray that all the application programmers build the code into all of their applications AND strictly control all SQL documentation so that no real users will get in and foul something up. #2 Implement the subset of RefInt provided by your favorite RDBMS and get an answering machine for your phone to handle user complaints when they have to do something like renumber all the COURSES (or some other keystone relation). #3 Use record numbers or some such for ALL keys and try to claim (with a straight face) that you are implementing a relational model #4 Hire somebody to check the database daily for inconsistencies and figure out how to fix them. Any comments? criticisms? PRODUCTS??? I'm braced for the response this is sure to draw. -=- Bob Donaldson ...!cs.utexas.edu!natinst!radian!bobd Radian Corporation ...!sun!texsun!radian!bobd PO Box 201088 Austin, TX 78720 (512) 454-4797 Views expressed are my own, not necessarily those of my employer.