Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!uwm.edu!zaphod.mps.ohio-state.edu!samsung!munnari.oz.au!metro!usage.csd.unsw.oz.au!ccadfa!ghm From: ghm@ccadfa.adfa.oz.au (Geoff Miller) Newsgroups: comp.databases Subject: Re: Support for imprecise data: survey Message-ID: <1785@ccadfa.adfa.oz.au> Date: 2 Aug 90 00:34:02 GMT References: <26231@usc.edu> Organization: Computer Centre, University College, UNSW, ADFA, Canberra, Australia Lines: 45 ami@kodkod.usc.edu (Ami Motro) writes: >Hello database experts, >I am interested in finding the level of support (if any) in present commercial >database systems for IMPRECISE DATA.... >Things I would like to know include, how does the user describe the imprecise >data to the system? How does the system retrieve in the presence of imprecise >data? Can the user specify imprecision in queries? And so on.... I'm currently working with Prime "Information", which is a Pick variant with some extra goodies, but I think my comments would be quite valid for generic Pick. I'm not quite sure whether you are referring to recording imprecise data (to take your example, entering some character in a database to indicate that a person has a phone number although it is not known) or whether you are talking about imprecise queries on precise data. The first would appear to be largely a matter of how you define your database and subsequent queries - the second can get a bit more interesting. One application on which we work is a military history database which records data on individual servicemen. We have no control over the raw data, which are scanned from the original records, so along with scanning errors (which we mostly detect) we have problems arising from the inconsistency of the original records. You might be surprised at how many ways the rank of Private can be recorded, let alone the number of equivalent ranks in specialist units (bombardier, fusilier, ...). What we have had to do in a number of cases is to select by exclusion, so that we exclude the records which obviously do not fit a particular criterion and then look at what we have left and at how the criteria can be refined. This can take many iterations, and sometimes we have to make the final selections by hand from a displayed list. In general we have found this approach to work, although admittedly it can get a bit tedious. We have also found it much better to use a series of SELECT statements rather than building up one enormous query (each SELECT works only on the records returned by the previous one). "Information" does of course support selections on the basis of 'NE ""' (not equal to null) and pattern matching and partial matching, so I don't think we have had any insoluble problems in this area. Geoff Miller (ghm@cc.adfa.oz.au) Computer Centre, Australian Defence Force Academy