Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!mo From: mo@uunet.UU.NET (Mike O'Dell) Newsgroups: comp.arch Subject: Serial Search Machines Message-ID: <2667@uunet.UU.NET> Date: Wed, 28-Oct-87 15:41:19 EST Article-I.D.: uunet.2667 Posted: Wed Oct 28 15:41:19 1987 Date-Received: Sat, 31-Oct-87 08:05:10 EST Organization: UUNET Communications Services, Arlington, VA Lines: 38 Sequential text search machines have been around for a long time, but recently they have started to become affordable for many applications. The Gould Hypersearch, the new small GEScan, and the TRW FDF-II are just a few examples of this new generation of machines. They all suffer a few interesting problems however. Like everything highly specialized, they can win very big and lose very big. Their biggest assest is they can scan data with little or no preprocessing. This is tremendous for tasks like reading a bunch of wireservice news feeds looking for text matching a specific profile (contains certain terms, etc) because they can do it in real time by simply scanning the data as it goes by. For other tasks like searching large resident collections, however, their strength can become a great weakness because they must read through all the data for each and every search. If the collection is say one to two gigabytes, which is rather modest, dumping that much data off the filesystem and through the box for each query quickly gets pretty painful. More traditional search systems build some kind of index in one pass and then "save the work" by consulting the index instead of the data to process queries. Building these indicies is often computationally and I/O intensive, but if the system must answer many such queries with great frequency, the indices win because only a tenth of 1% as much data must be funnelled through the computer to perform the search. This is why large commercial systems like Lockheed DIALOG, BRS, NEXIS and MEDIS all use index-based systems. So, like everything, sequential search machines can be used and abused. They definitely have a place in the world, but they are not always an obvious replacement for index-based software systems. -Mike O'Dell