Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!sri-spam!ames!amdahl!rtech!jeff From: jeff@rtech.UUCP Newsgroups: comp.databases Subject: Re: Normalized Databases Message-ID: <739@rtech.UUCP> Date: Thu, 2-Apr-87 01:45:58 EST Article-I.D.: rtech.739 Posted: Thu Apr 2 01:45:58 1987 Date-Received: Sat, 4-Apr-87 13:30:51 EST References: <43239@beno.seismo.CSS.GOV> Distribution: na Organization: Relational Technology, Alameda CA Lines: 114 Keywords: normal form, relational in article <43239@beno.seismo.CSS.GOV>, tiberio@seismo.CSS.GOV (Mike Tiberio) says: > > At our facility we have a certain large investment in a relational database > design that is not in the highest normal form. We have some folks proposing > that we redesign our database (and no doubt out database interface software) > to a greater or highest normal form. They state we will enjoy the following > qualities when this is done. > > 1) The amount of space required to store the data is lowest. This is usually true, but not necessarily so. > 2) The ability to update the database is greatest. > > 3) Explaining the database is simplest. Sometimes you can run into update problems in unnormalized databases, because a single relationship can be stored in more than one place. For example, suppose you are keeping track of parts, suppliers of parts, and orders from those suppliers, and one of your tables looks like this: Supplier | City | Part | Color | Quantity ----------------------------------------- J1 | NY | p1 | red | 100 J1 | NY | p2 | white | 200 S6 | LA | p2 | white | 50 L3 | DC | p2 | white | 300 L3 | DC | p3 | blue | 100 Now suppose the supplier J1 moves from NY to SF. You will have to update J1's city in two places. Suppose you want to keep track of a new supplier, R2, who doesn't currently supply you with anything. There will have to be a row in the table in which part, color, and quantity are blank. Suppose supplier S6 stops supplying the part p2. To delete that row from the table would erase all information about S6, which you don't necessarily want to do; instead, you must recognize the case where there is only one row left for this supplier, and blank out the part, color, and quantity for that row. The problem here is normalization. Without going into any detail on what it means, here is the above information in third normal form: Supplier Table Part Table Supplier | City Part | Color --------------- ------------ J1 | NY p1 | red S6 | LA p2 | white L3 | DC p3 | white On-order Table Supplier | Part | Quantity -------------------------- J1 | p1 | 100 J1 | p2 | 200 S6 | p2 | 50 L3 | p2 | 300 L3 | p3 | 100 Normalization is sometimes explained with a bunch of fancy jargon, but what it boils down to is this: Every table should stand for a single thing. In the example, the problem with the original table is that it stands for three things: suppliers, parts, and orders. Jamming these together into one table causes problems when we want to update some aspect of one of the things. A normalized database is easier to understand because the individual tables are simpler: each one stands for only one thing. The only thing that may become harder is figuring out how to join the various tables together; the main way to fight against this is to make sure that each of your tables stands for some real thing in the problem you're trying to solve (if one is not careful, one can come up with tables that don't model the problem, but instead model the underlying algorithm; it's a good idea to design your data first, and then figure out your algorithms). This makes it sound easier than it really is. Figuring out what your "things" are can be difficult. > What I am asking you folks in netland is for some guidance. I like the > intellectual concept of normal form relational databases, but I need to > know if there are any good reasons not to go that route. The main argument against normalization is performance. Normalization involves splitting unnormalized tables up into smaller tables. But this means you will have to join the tables to recreate the relationships in the original, unnormalized form. Joins are slower than fetches from single tables. In the above example, suppose that you often need to know which parts are coming from which cities. In the normalized database, you would have to join the supplier and the on-order tables. If this took too much time, you might decide to unnormalize your database and merge the two tables into one: Supplier-Order Table Supplier | City | Part | Quantity --------------------------------- J1 | NY | p1 | 100 J1 | NY | p2 | 200 S6 | LA | p2 | 50 L3 | DC | p2 | 300 L3 | DC | p3 | 100 Some of the update and understanding problems come back, but it will run faster for the operation we're interested in. -- Jeff Lichtman at rtech (Relational Technology, Inc.) "Saints should always be judged guilty until they are proved innocent..." {amdahl, sun}!rtech!jeff {ucbvax, decvax}!mtxinu!rtech!jeff