Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!rutgers!umn-d-ub!cs.umn.edu!sctc.com!endrizzi
From: endrizzi@sctc.com (Michael Endrizzi )
Newsgroups: comp.databases
Subject: Re: >Fault-tolerant Information Recall
Message-ID: <1990Apr8.195542.14214@sctc.com>
Date: 8 Apr 90 19:55:42 GMT
References: <1990Apr3.200220.9513@sctc.com> <HARGROVE.90Apr7093808@bee.corp.sgi.com>
Distribution: comp.databases
Organization: Secure Computing Technology Corporation
Lines: 73

hargrove@bee.corp.sgi.com (Mark Hargrove) writes:

>In article <1990Apr3.200220.9513@sctc.com> endrizzi@sctc.com (Michael Endrizzi ) writes:

>contstructing "approximate" searches using regex() (or similar) style
>patterns.  A query language that did approximate matching for me
>automatically is a bit scary.  How would you control the degree of

You are not the only one to express this.  I cannot figure out why.
If I said that I have a spelling checker, and it will tell you which
word is wrong, but WON'T offer alternatives that are "close" to the
word in question, would you find this usefull???  

This is what current database technology offers. If you are not
sure that the data or the query is perfect, there is no way to locate
information that is "approximate" to the incomplete and/or incorrect
information that you feed it.

regex() searches are very positionaly dependent. It cannot detect
errors such as tranposed characters very well. You bring up another
interesting point. I feel that our model offers much superior recall
rates with none of the "*h()|\(+adfadf?\)"  gibberish involved in
regexp searching.

Another question. You said you have spent hours performing regexp(),
searches but turn around and said you don't find our model usefull. Explain
this anamoly please.

>"approximation?"  I could argue that every row in a database was

And I would totally agree with you!!! 

>How do you intend to define "approximately?"

In our model it is a parameter to the search process.  There are
2 metrices of recall in information retrieval:

	1) Recall Rate
	2) Precision Rate

Recall rate is the number of records retrieved over all the
records in the database.

Precision rate is the percentage of records retrieved that are
relevant (definition of relevant depends on user, application,
scenerio).

We have developed a metric that combines the two of these into
a single metric called the Retrieval Rate. This is a user defined
rate.  The user can adjust this rate to make the search process
only return 100% exact matches which would make the search process
act like a traditional RDBMS, or can lower this rate to search
for "approxmiate" matches.  


					Thank you for responding,

						Dreez

=================================================================
=================================================================
               Michael J. Endrizzi
	Secure Computing Technology Corp.
	   1210 W. County Road E #100
	      Arden Hills, Mn. 55112
	        endrizzi@sctc.com
	          (612) 482-7425
	
*Disclaimer: The opinions expressed above are not of my employer
             but of the American people.
=================================================================
=================================================================