Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!rutgers!umn-d-ub!cs.umn.edu!sctc.com!endrizzi From: endrizzi@sctc.com (Michael Endrizzi ) Newsgroups: comp.databases Subject: Re: >Fault-tolerant Information Recall Message-ID: <1990Apr8.195542.14214@sctc.com> Date: 8 Apr 90 19:55:42 GMT References: <1990Apr3.200220.9513@sctc.com> Distribution: comp.databases Organization: Secure Computing Technology Corporation Lines: 73 hargrove@bee.corp.sgi.com (Mark Hargrove) writes: >In article <1990Apr3.200220.9513@sctc.com> endrizzi@sctc.com (Michael Endrizzi ) writes: >contstructing "approximate" searches using regex() (or similar) style >patterns. A query language that did approximate matching for me >automatically is a bit scary. How would you control the degree of You are not the only one to express this. I cannot figure out why. If I said that I have a spelling checker, and it will tell you which word is wrong, but WON'T offer alternatives that are "close" to the word in question, would you find this usefull??? This is what current database technology offers. If you are not sure that the data or the query is perfect, there is no way to locate information that is "approximate" to the incomplete and/or incorrect information that you feed it. regex() searches are very positionaly dependent. It cannot detect errors such as tranposed characters very well. You bring up another interesting point. I feel that our model offers much superior recall rates with none of the "*h()|\(+adfadf?\)" gibberish involved in regexp searching. Another question. You said you have spent hours performing regexp(), searches but turn around and said you don't find our model usefull. Explain this anamoly please. >"approximation?" I could argue that every row in a database was And I would totally agree with you!!! >How do you intend to define "approximately?" In our model it is a parameter to the search process. There are 2 metrices of recall in information retrieval: 1) Recall Rate 2) Precision Rate Recall rate is the number of records retrieved over all the records in the database. Precision rate is the percentage of records retrieved that are relevant (definition of relevant depends on user, application, scenerio). We have developed a metric that combines the two of these into a single metric called the Retrieval Rate. This is a user defined rate. The user can adjust this rate to make the search process only return 100% exact matches which would make the search process act like a traditional RDBMS, or can lower this rate to search for "approxmiate" matches. Thank you for responding, Dreez ================================================================= ================================================================= Michael J. Endrizzi Secure Computing Technology Corp. 1210 W. County Road E #100 Arden Hills, Mn. 55112 endrizzi@sctc.com (612) 482-7425 *Disclaimer: The opinions expressed above are not of my employer but of the American people. ================================================================= =================================================================