Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!murtoa.cs.mu.oz.au!munnari.oz.au!vuwcomp!dsiramd!csnz!paul From: paul@csnz.co.nz (Paul Gillingwater) Newsgroups: comp.databases Subject: Re: Free Text Databases Summary: Recommend BRS/Search Keywords: world Message-ID: <41@csnz.co.nz> Date: 5 Jun 89 19:27:44 GMT References: <1158@itivax.iti.org> <76@sc2a.unige.ch> <640@bloom.UUCP> Reply-To: paul@csnz.co.nz (Paul Gillingwater) Organization: Computer Sciences of NZ Limited Lines: 74 In article <640@bloom.UUCP> bobd@bloom.UUCP (Bob Donaldson) writes: +In article <76@sc2a.unige.ch>, fisher@sc2a.unige.ch (Markus Fischer) writes: +> In article <1158@itivax.iti.org>, kam@itivax.iti.org (Keith A. McNabb) writes: +> > Would anyone be able to recommend a good, efficient, and fairly +> > powerful FREE TEXT database for a PC-AT compatible (MS-DOS) or +> > Sun/UNIX environment? It should have no restrictions on record +> > length and should allow pre-existing flat ASCII files to be +> > easily incorporated. At the same time, it should support the +> > definition of various fields, so that searches may be more +> > selectively qualified, and it should support numeric operators. +> +> ... I simply used the +> word processor I was accustumed to : WordPerfect. This might seem a little +> strange, but it really has a subset of database functions : sort, extract +> by conditions, numerical variables and modify structure... Of course, one +> doesn't expect full statistical functions, or even calculations... +> +> The main problem is actually that the structure of the data cannot be defined, +> which means that the user must be carful to put each field in the right place. +> In other words, you will have trouble if several people are to enter or edit +> the data. +> +> The main advantage is of course the quality of printouts : you are working +> with a real word processor ! (i.e., the fields can contain formatting +> codes...) + +As a variation on this theme, I can suggest a hybrid. Use your favorite +word processor to generate each "database" entry, the store the data in a +'real' DBMS - I would suggest Empress/32 (runs under DOS & Sun/UNIX), since +it handles large variable length fields quite well. A little preprocessing Hmm... we are doing quite a bit of work with BRS/Search, which works on many machines, from MS-DOS to Sun UNIX, DG/AOS etc. The same files can be used by the DOS and UNIX versions without conversion. We have the tools that can import Word Perfect documents, with all formatting codes intact. The advantage of BRS over a "classic" RDBMS like Empress (hmm... she's come up in the world since she was "Mistress"! :-) is the search engine - every single significant word is searchable, because every word is added to a dictionary and indexed. Field length is not a problem - "paragraphs" may be 64kb long, and there is no loss of efficiency or wasted storage if you have one record with 20 bytes and another with 20 kb (which is a problem with the "classic" approach. Summary: if you are working with large amounts of free text, use the correct tool. Sure, the license fee is a bit steep, but you get what you pay for, and it's a solid product. I like it because I use it, not because I sell it. >would both do some QA/QC on the data entry & data format, and also allow >the extraction of fixed-length fields in the database which could then be >indexed. The wordprocessor files would be stored complete in a variable >length, unprocessed field (type = bulk in Empress). This allows you to >include all of the formatting codes, etc. I expect that other vendors have >similar capabilities, but check WHATEVER you choose carefully - I have found >a lot of un-documented or well-hidden limitations in the use of these >unstructured data types in some packages. > > >-=- >Bob Donaldson ...!cs.utexas.edu!natinst!radian!bobd >Radian Corporation ...!sun!texsun!radian!bobd >PO Box 201088 >Austin, TX 78720 (512) 454-4797 > >Views expressed are my own, not necessarily those of my employer. -- Paul Gillingwater, Computer Sciences of New Zealand Limited Bang: ..!uunet!dsiramd!csnz!paul Domain: paul@csnz.co.nz Call Magic Tower BBS V21/23/22/22bis 24 hrs +0064 4 767 326