Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!decwrl!sgi!shinobu!odin!odin.corp.sgi.com!hargrove
From: hargrove@bee.corp.sgi.com (Mark Hargrove)
Newsgroups: comp.databases
Subject: Re: >Fault-tolerant Information Recall
Message-ID: <HARGROVE.90Apr11181954@bee.corp.sgi.com>
Date: 11 Apr 90 17:19:54 GMT
References: <1990Apr3.200220.9513@sctc.com>
	<HARGROVE.90Apr7093808@bee.corp.sgi.com>
	<1990Apr8.195542.14214@sctc.com>
Sender: news@odin.corp.sgi.com
Distribution: comp.databases
Organization: Silicon Graphics, Inc., Mtn. View, CA
Lines: 81
In-reply-to: endrizzi@sctc.com's message of 8 Apr 90 19:55:42 GMT

In article <1990Apr8.195542.14214@sctc.com> endrizzi@sctc.com (Michael Endrizzi ) writes:

 
   hargrove@bee.corp.sgi.com (Mark Hargrove) writes:

   >In article <1990Apr3.200220.9513@sctc.com> endrizzi@sctc.com (Michael Endrizzi ) writes:

   >contstructing "approximate" searches using regex() (or similar) style
   >patterns.  A query language that did approximate matching for me
   >automatically is a bit scary.  How would you control the degree of

   You are not the only one to express this.  I cannot figure out why.
   If I said that I have a spelling checker, and it will tell you which
   word is wrong, but WON'T offer alternatives that are "close" to the
   word in question, would you find this usefull???  

As a matter of fact, yes, I would (and do) find such a spelling checker
useful.  That doesn't imply that that I wouldn't find one that made 
suggestions useful as well.
 

  This is what current database technology offers. If you are not
   sure that the data or the query is perfect, there is no way to locate
   information that is "approximate" to the incomplete and/or incorrect
   information that you feed it.

   regex() searches are very positionaly dependent. It cannot detect
   errors such as tranposed characters very well. You bring up another
   interesting point. I feel that our model offers much superior recall
   rates with none of the "*h()|\(+adfadf?\)"  gibberish involved in
   regexp searching.

Maybe I missed something earlier.  Just what the heck *is* your model
anyway?  Are you offering a extension to a query language (like SQL)
that will return "matches" that are "close" in the same sense that a
spelling checker suggests "close" words?  Or are you offering a extension
that would return "Palo Alto", when I searched on CITY_NAME="Menlo Park"?

   Another question. You said you have spent hours performing regexp(),
   searches but turn around and said you don't find our model usefull. Explain
   this anamoly please.

It only appears to be an anomaly when you take what I said out of
context. I didn't say that I didn't find your model useful.  On the
contrary, I *started* my posting by saying that I thought this
was a powerful concept.  I'm simply worried that you're oversimplifying 
the problems with defining "approximate" matches.

   >"approximation?"  I could argue that every row in a database was

   And I would totally agree with you!!! 

   >How do you intend to define "approximately?"

   In our model it is a parameter to the search process.  There are
   2 metrices of recall in information retrieval:

	   1) Recall Rate
	   2) Precision Rate

   Recall rate is the number of records retrieved over all the
   records in the database.

   Precision rate is the percentage of records retrieved that are
   relevant (definition of relevant depends on user, application,
   scenerio).

   We have developed a metric that combines the two of these into
   a single metric called the Retrieval Rate. This is a user defined
   rate.  The user can adjust this rate to make the search process
   only return 100% exact matches which would make the search process
   act like a traditional RDBMS, or can lower this rate to search
   for "approxmiate" matches.  


I'm sorry, but I guess I'm too dumb to understand what this means.  How
do you decide *which* records to retrieve?  What is your approximation
function?  I simply don't believe you can define a general function which
will do "approximate" matching, UNLESS YOU'RE ONLY WORRYING ABOUT SPELLING
DIFFERENCES.  Is this all your "system" does?  If so, it's still a step
forward in retrieval problems, but not particularly revolutionary.