Path: utzoo!utgpu!watmath!att!pacbell!ames!claris!apple!motcsd!hpda!hpcupt1!hpindwa!donovan
From: donovan@hpindwa.HP.COM (Donovan Hsieh)
Newsgroups: comp.databases
Subject: Re: Re: A few words on the "normalization"
Message-ID: <36270009@hpindwa.HP.COM>
Date: 7 Aug 89 18:17:53 GMT
References: <1707@ethz.UUCP>
Organization: Hewlett-Packard, Cupertino CA
Lines: 55


In the note, Robert Marti <marti@ethz.UUCP> at ETH Zuerich wrote :

> I see your point, and it's certainly valid.  However -- as I am sure you
> know -- a designer does not have to specify ALL functional and multi-
> valued dependencies to achieve correct third or fourth normal form.
> All she needs to do is provide a superset of an FD cover (or superset of
> a dependency basis in the case of MVDs) from which all FDs (MVDs) can be
> inferred.  For real world examples, finding these FDs and MVDs is
> essentially the same thing as finding the relationships between different
> entity types with their association cardinalities in an ER model.
> In either case, you get an incorrect model of the world if the designer
> overlooks some relationships between attributes (in the relational model)
> or entity types (in the ER model).

In theory, the suggested way to design a fully normalized relational database
scheme is as follows :

	1. Formulate all possible set of FDs & MVDs from the real world 
	   enterprise
	2. Compute the "optimal covers" from the set of FDs & MVDs to derive
	   the optimal set of FDs & MVDs. Note that a minimal set is not
	   necessarily optimal.	
	3. Apply whatever algorithms you may choose to normalize the set
	   of optimal FDs & MVDs to the highest possible normal form.

But in reality, the difficulties are :

	1. You'll probably never be able to prove that you have obtained
	   all possible FDs & MVDs for the real world enterprise except for
	   those toy databases. Not to mention what is the superset of the
	   the FDs & MVDs. It is different to say that "I have found the
	   complete set of FDs & MVDs" and "I can produce the superset of
	   FDS & MVDs for a GIVEN set of FDs & MVDs from which all original
	   dependencies can be inferred".

	2. At least for the current time, there is probably no polinomial
	   time algorithm for finding an "optimal cover" for a set of FDs &
	   MVDs. This is a NP-complete problem.

	3. Even if you have successfully completed the above two steps, the
	   net result of a mechanical normalization process does not always
	   produce the most efficient schema for real world queries. This is
	   due to that all normalization algorithms are purely "syntactical"
	   instead of "semantic". They do not have the knowledge of what are
	   the real world database requirements (from the perspective of
	   query execution efficiency or referential integrity). This is
	   why "Semantic" and "Object-Oriented" data models have become
	   popular.


Donovan Hsieh
Hewlett-Packard
Business Network Division