Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site wjh12.UUCP Path: utzoo!linus!vaxine!wjh12!grc From: grc@wjh12.UUCP (crane) Newsgroups: net.general,net.unix-wizards Subject: UNIX system to house 140 mbyte unformatted textual dbase? Message-ID: <476@wjh12.UUCP> Date: Sun, 27-May-84 19:56:09 EDT Article-I.D.: wjh12.476 Posted: Sun May 27 19:56:09 1984 Date-Received: Wed, 30-May-84 00:12:43 EDT Organization: Harvard University PSR, Cambridge MA Lines: 37 We have an unformatted textual database currently comprising 140 mbytes of text, which will grow to about 500 mbytes within the next two years. Inverted indices (50% overhead--on top of 140 mbytes of text) have been developed, but for some applications (such as fixed phrases or combinations of common words) it is necessary to perform a linear search on the entire corpus. a) i am interested in benchmarks to see how fast different machines can perform linear searches. in particular, i would like to know how fast the command "egrep xxx /usr/dict/words" (where /usr/dict/words ~= 200K) runs on a GOULD, PYRAMID, ZILOG or different 68K based systems. We have access to a VAX 11/750 and 780, PDP 11/44 and PIXEL 100. Benchmarks from any other systems would be greatly appreciated. The PIXEL is quite fast in core, but the disks are ruinously slow: an otherwise idle PIXEL 100 (with 40 mbyte disks) can only spend 30% of its time on an egrep. the rest of the time it is evidently twiddling electrons waiting for more disk blocks. does anybody out there have a Sun with the Fujitsu eagle? This dbase has a limited clientele, and the machine would not need to field more than 4 searches or so at a time, but we could easily use a more powerful system and would as soon not dedicate a system to this database. b) does anyone out there know of any good way to deal with searching this much data on a UNIX system? experiments in distributed processing that could provide wide access cheaply? this is a read only dbase, so we could avoid the UNIX file system and store the data in big blocks on a raw file system. has anyone got some special hardware hanging off of a UNIX system to perform this kind of task? Gregory Crane Harvard University