Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!mo
From: mo@uunet.UU.NET (Mike O'Dell)
Newsgroups: comp.arch
Subject: Serial Search Machines
Message-ID: <2667@uunet.UU.NET>
Date: Wed, 28-Oct-87 15:41:19 EST
Article-I.D.: uunet.2667
Posted: Wed Oct 28 15:41:19 1987
Date-Received: Sat, 31-Oct-87 08:05:10 EST
Organization: UUNET Communications Services, Arlington, VA
Lines: 38

Sequential text search machines have been around for a long
time, but recently they have started to become affordable
for many applications.  The Gould Hypersearch, the new small
GEScan, and the TRW FDF-II are just a few examples of this
new generation of machines.  They all suffer a few interesting
problems however.  Like everything highly specialized, they
can win very big and lose very big.

Their biggest assest is they can scan data with little or
no preprocessing.  This is tremendous for tasks like reading
a bunch of wireservice news feeds looking for text matching
a specific profile (contains certain terms, etc) because
they can do it in real time by simply scanning the data as
it goes by. For other tasks like searching large resident
collections, however, their strength can become a great 
weakness because they must read through all the data for
each and every search.  If the collection is say one to two
gigabytes, which is rather modest, dumping that much data
off the filesystem and through the box for each query quickly
gets pretty painful.

More traditional search systems build some kind of index
in one pass and then "save the work" by consulting the index
instead of the data to process queries.  Building these 
indicies is often computationally and I/O intensive, but
if the system must answer many such queries with great
frequency, the indices win because only a tenth of 1% as
much data must be funnelled through the computer to perform
the search.  This is why  large commercial systems like
Lockheed DIALOG, BRS, NEXIS and MEDIS all use index-based
systems.  

So, like everything, sequential search machines can be used
and abused.  They definitely have a place in the world, but
they are not always an obvious replacement for index-based
software systems.

	-Mike O'Dell