Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!uwm.edu!zaphod.mps.ohio-state.edu!samsung!munnari.oz.au!metro!usage.csd.unsw.oz.au!ccadfa!ghm
From: ghm@ccadfa.adfa.oz.au (Geoff Miller)
Newsgroups: comp.databases
Subject: Re: Support for imprecise data: survey
Message-ID: <1785@ccadfa.adfa.oz.au>
Date: 2 Aug 90 00:34:02 GMT
References: <26231@usc.edu>
Organization: Computer Centre, University College, UNSW, ADFA, Canberra, Australia
Lines: 45

ami@kodkod.usc.edu (Ami Motro) writes:

>Hello database experts,

>I am interested in finding the level of support (if any) in present commercial
>database systems for IMPRECISE DATA....

>Things I would like to know include, how does the user describe the imprecise
>data to the system? How does the system retrieve in the presence of imprecise
>data?  Can the user specify imprecision in queries?  And so on....

I'm currently working with Prime "Information", which is a Pick variant with
some extra goodies, but I think my comments would be quite valid for generic
Pick.

I'm not quite sure whether you are referring to recording imprecise data (to
take your example, entering some character in a database to indicate that a 
person has a phone number although it is not known) or whether you are talking
about imprecise queries on precise data.  The first would appear to be largely
a matter of how you define your database and subsequent queries  -  the second
can get a bit more interesting.

One application on which we work is a military history database which records
data on individual servicemen.  We have no control over the raw data, which
are scanned from the original records, so along with scanning errors (which 
we mostly detect) we have problems arising from the inconsistency of the 
original records.  You might be surprised at how many ways the rank of 
Private can be recorded, let alone the number of equivalent ranks in 
specialist units (bombardier, fusilier, ...).  What we have had to do in 
a number of cases is to select by exclusion, so that we exclude the records
which obviously do not fit a particular criterion and then look at what we
have left and at how the criteria can be refined.  This can take many 
iterations, and sometimes we have to make the final selections by hand 
from a displayed list.

In general we have found this approach to work, although admittedly it can get
a bit tedious.  We have also found it much better to use a series of SELECT
statements rather than building up one enormous query (each SELECT works only
on the records returned by the previous one).  "Information" does of course
support selections on the basis of 'NE ""' (not equal to null) and pattern
matching and partial matching, so I don't think we have had any insoluble
problems in this area.

Geoff Miller  (ghm@cc.adfa.oz.au)
Computer Centre, Australian Defence Force Academy