Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!uunet!crdgw1!brspyr1!tim From: tim@brspyr1.BRS.Com (Tim Northrup) Newsgroups: comp.databases Subject: Re: Text archive for newspapers needed Keywords: database archive newspaper free text Message-ID: <7007@brspyr1.BRS.Com> Date: 16 Oct 90 22:21:14 GMT References: <24599@uflorida.cis.ufl.EDU> <1411@media01.UUCP> <295@accur8.UUCP> <1990Oct4.164051.908@sq.sq.com> Organization: BRS Information Technologies, Latham NY Lines: 38 lee@sq.sq.com (Liam R. E. Quin) writes: >>In article <1411@media01.UUCP> pkr@media01.UUCP (Peter Kriens) writes: >> We [need] a very good text archival system for newspaper based on Unix. >> [...] "huge" databases ( of 4 to 5 gigabyte) should be possible. ... Lot's of good things to think about when shopping for a full-text information retrieval package ... >I might also consider >* BRS Search, because it's one of the ``market leaders'', although my > experience is that this is one of the packages with a 300% index... ACKKK!!! This is certainly not our experience here, or with any of our current customers (that I know of, anyway). Our typical loaded/indexed database (with the 'C' based version of the product, which is what I am involved with) is 120-150% of the original input text. In most cases, the original text can be discarded after loading. This results in a 20-50% overhead usually, nowhere near the 300% mentioned. (As a quick example, we have Grolier's AAE loaded: input file ~65meg, indexed ~80meg). Of course, your milage may vary depending upon the data, but a 300% index is very, very, very, VERY RARE with BRS/Search. Now, if your including keeping the original text around and counting that into the index size, that's another matter (and not quite fair, IMHO). >Lee >-- >Liam R. E. Quin, lee@sq.com, SoftQuad Inc., Toronto, +1 (416) 963-8337 -- Tim Northrup +------------------------------------------+ +---------------------------------+ BRS Software Products, Inc. | UUCP: uunet!crdgw1!brspyr1!tim | 1200 Route 7, Latham NY 12110 | ARPA: tim@brspyr1.BRS.Com +------------------------------------------+