Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!dciem!nrcaer!cognos!garyp From: garyp@cognos.uucp (Gary Puckering) Newsgroups: comp.databases Subject: Re: Database problem of the week Message-ID: <1304@smokey.UUCP> Date: Fri, 21-Aug-87 10:54:19 EDT Article-I.D.: smokey.1304 Posted: Fri Aug 21 10:54:19 1987 Date-Received: Sun, 23-Aug-87 10:12:39 EDT References: <923@woton.UUCP> Reply-To: garyp@cognos.UUCP (Gary Puckering) Organization: Cognos Inc., Ottawa, Canada Lines: 66 In article <923@woton.UUCP> riddle@woton.UUCP (Prentiss Riddle ) writes: >Currently the data is stored in a single large Informix file in which each >record consists of a key and 52 floating point lab values. The key is a >composite of a study group number, a subject number and a time. > > +-----------+-------------+------+-------+-------+- -+--------+ > | group no. | subject no. | time | val 1 | val 2 | ... | val 52 | > +-----------+-------------+------+-------+-------+- -+--------+ > >So far so good. What makes me wonder if this is the best way to store the >data is that over half of the lab values are nulls (stored as an impossibly >large real number). In some of the fields, as a matter of fact, over 95% of >the records have no data. One reason for this is that half of the fields >contain what are called "extra" variables: since different study groups are >subjected to different experiments, the meanings of these "extra" fields >differ from study group to study group (which is *very* confusing!) and most >study groups don't need them all. But even in the fields with fixed >meanings, measurements are not always recorded for every subject at every >time point. One technique you could use is to try to classify the study groups into different subclasses each with its own unique set of properties. For example: experiment(expId,groupNo,subjectNo,time) animalExperiment(animalExpId,whiteCellCount,respirationRate,basalTemp) birdExperiment(birdExpId,wingBeatRate) plantExperiment(plantExpId,transpirationRate,soil_ph) In this example, Ive introduced a surrogate key, expId, to substitute for the compound key (groupNo,subjectNo,time). This surrogate can simply be a unique number generated whenever a new experiment is inserted. In addition, you will probably want to define a unique index on the columns (groupNo,subjectNo,time). The other relations are specializations of the experiment relation. In fact, their primary keys (animalExpId, birdExpId and plantExpId) are defined on the same domain as expId. If an experiment is an animal experiment, then you will insert a row into both the experiment relation and the animalExperiment relation. The expId and the animalExpId will have the same value. If it is a bird experiment (a bird is an animal) then you'll insert a row into the experiment relation, the animalExperiment relation and the birdExperiment relation. Again, the id's will have the same value. To access your, data, you can join these relations in various ways. For example: select * from planExperiment p, experiment e where p.plantExpId = e.expId will get you all the plant experiment information. Using this technique, you can probably eliminate the use of nulls to represent inapplicable properties (when you have this case, you probably have a new subclass). This leaves the null feature free to be used to represent an unknown or missing value. -- Gary Puckering 3755 Riverside Dr. Cognos Incorporated Ottawa, Ontario {allegra,decvax,ihnp4,linus,pyramid} (613) 738-1440 CANADA K1G 3N3 !utzoo!dciem!nrcaer!cognos!garyp