Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!decwrl!sgi!shinobu!odin!odin.corp.sgi.com!hargrove From: hargrove@bee.corp.sgi.com (Mark Hargrove) Newsgroups: comp.databases Subject: Re: >Fault-tolerant Information Recall Message-ID: Date: 11 Apr 90 17:19:54 GMT References: <1990Apr3.200220.9513@sctc.com> <1990Apr8.195542.14214@sctc.com> Sender: news@odin.corp.sgi.com Distribution: comp.databases Organization: Silicon Graphics, Inc., Mtn. View, CA Lines: 81 In-reply-to: endrizzi@sctc.com's message of 8 Apr 90 19:55:42 GMT In article <1990Apr8.195542.14214@sctc.com> endrizzi@sctc.com (Michael Endrizzi ) writes: hargrove@bee.corp.sgi.com (Mark Hargrove) writes: >In article <1990Apr3.200220.9513@sctc.com> endrizzi@sctc.com (Michael Endrizzi ) writes: >contstructing "approximate" searches using regex() (or similar) style >patterns. A query language that did approximate matching for me >automatically is a bit scary. How would you control the degree of You are not the only one to express this. I cannot figure out why. If I said that I have a spelling checker, and it will tell you which word is wrong, but WON'T offer alternatives that are "close" to the word in question, would you find this usefull??? As a matter of fact, yes, I would (and do) find such a spelling checker useful. That doesn't imply that that I wouldn't find one that made suggestions useful as well. This is what current database technology offers. If you are not sure that the data or the query is perfect, there is no way to locate information that is "approximate" to the incomplete and/or incorrect information that you feed it. regex() searches are very positionaly dependent. It cannot detect errors such as tranposed characters very well. You bring up another interesting point. I feel that our model offers much superior recall rates with none of the "*h()|\(+adfadf?\)" gibberish involved in regexp searching. Maybe I missed something earlier. Just what the heck *is* your model anyway? Are you offering a extension to a query language (like SQL) that will return "matches" that are "close" in the same sense that a spelling checker suggests "close" words? Or are you offering a extension that would return "Palo Alto", when I searched on CITY_NAME="Menlo Park"? Another question. You said you have spent hours performing regexp(), searches but turn around and said you don't find our model usefull. Explain this anamoly please. It only appears to be an anomaly when you take what I said out of context. I didn't say that I didn't find your model useful. On the contrary, I *started* my posting by saying that I thought this was a powerful concept. I'm simply worried that you're oversimplifying the problems with defining "approximate" matches. >"approximation?" I could argue that every row in a database was And I would totally agree with you!!! >How do you intend to define "approximately?" In our model it is a parameter to the search process. There are 2 metrices of recall in information retrieval: 1) Recall Rate 2) Precision Rate Recall rate is the number of records retrieved over all the records in the database. Precision rate is the percentage of records retrieved that are relevant (definition of relevant depends on user, application, scenerio). We have developed a metric that combines the two of these into a single metric called the Retrieval Rate. This is a user defined rate. The user can adjust this rate to make the search process only return 100% exact matches which would make the search process act like a traditional RDBMS, or can lower this rate to search for "approxmiate" matches. I'm sorry, but I guess I'm too dumb to understand what this means. How do you decide *which* records to retrieve? What is your approximation function? I simply don't believe you can define a general function which will do "approximate" matching, UNLESS YOU'RE ONLY WORRYING ABOUT SPELLING DIFFERENCES. Is this all your "system" does? If so, it's still a step forward in retrieval problems, but not particularly revolutionary.