Path: utzoo!utgpu!watmath!att!pacbell!ames!claris!apple!motcsd!hpda!hpcupt1!hpindwa!donovan From: donovan@hpindwa.HP.COM (Donovan Hsieh) Newsgroups: comp.databases Subject: Re: Re: A few words on the "normalization" Message-ID: <36270009@hpindwa.HP.COM> Date: 7 Aug 89 18:17:53 GMT References: <1707@ethz.UUCP> Organization: Hewlett-Packard, Cupertino CA Lines: 55 In the note, Robert Marti at ETH Zuerich wrote : > I see your point, and it's certainly valid. However -- as I am sure you > know -- a designer does not have to specify ALL functional and multi- > valued dependencies to achieve correct third or fourth normal form. > All she needs to do is provide a superset of an FD cover (or superset of > a dependency basis in the case of MVDs) from which all FDs (MVDs) can be > inferred. For real world examples, finding these FDs and MVDs is > essentially the same thing as finding the relationships between different > entity types with their association cardinalities in an ER model. > In either case, you get an incorrect model of the world if the designer > overlooks some relationships between attributes (in the relational model) > or entity types (in the ER model). In theory, the suggested way to design a fully normalized relational database scheme is as follows : 1. Formulate all possible set of FDs & MVDs from the real world enterprise 2. Compute the "optimal covers" from the set of FDs & MVDs to derive the optimal set of FDs & MVDs. Note that a minimal set is not necessarily optimal. 3. Apply whatever algorithms you may choose to normalize the set of optimal FDs & MVDs to the highest possible normal form. But in reality, the difficulties are : 1. You'll probably never be able to prove that you have obtained all possible FDs & MVDs for the real world enterprise except for those toy databases. Not to mention what is the superset of the the FDs & MVDs. It is different to say that "I have found the complete set of FDs & MVDs" and "I can produce the superset of FDS & MVDs for a GIVEN set of FDs & MVDs from which all original dependencies can be inferred". 2. At least for the current time, there is probably no polinomial time algorithm for finding an "optimal cover" for a set of FDs & MVDs. This is a NP-complete problem. 3. Even if you have successfully completed the above two steps, the net result of a mechanical normalization process does not always produce the most efficient schema for real world queries. This is due to that all normalization algorithms are purely "syntactical" instead of "semantic". They do not have the knowledge of what are the real world database requirements (from the perspective of query execution efficiency or referential integrity). This is why "Semantic" and "Object-Oriented" data models have become popular. Donovan Hsieh Hewlett-Packard Business Network Division