Xref: utzoo comp.sources.wanted:12626 comp.text:7045 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!math.lsa.umich.edu!math.lsa.umich.edu!emv From: emv@math.lsa.umich.edu (Edward Vielmetti) Newsgroups: comp.sources.wanted,comp.text Subject: Re: Litterature search/index/information retrieval program wanted Message-ID: Date: 26 Jul 90 08:13:06 GMT References: <1990Jul21.212945.15889@bbt.se> Sender: usenet@math.lsa.umich.edu Organization: University of Michigan Math Dept., Ann Arbor MI. Lines: 58 In-Reply-To: pgd@bbt.se's message of 21 Jul 90 21:29:45 GMT In article <1990Jul21.212945.15889@bbt.se> pgd@bbt.se (P.Garbha) writes: I am looking for some software for unix to make some kind of on-line litterature reference search system. I have a >30MB database, with the full contents of a number of books. For this i want a system that sets up an index to every word in the books, and that let me search in this to locate references from the books. For example, if i put a search for "unix" and "pc", the program should come up with a list of all logical units (paragraphs, chapters, or something else) with these two words in (the same sentence, or near each other). From this list i can narrow in further, or look at the references, to finally get a printout. the "Pat" software from Open Text Systems should be able to do what you want. First, you have to mark up the documents to delimit boundaries like paragraphs, pages, or chapters; the information may be available in the texts as you have them (in which case no explicit markup would be needed) or you may need to construct it. Pat likes to use a simplified form of SGML to tag things. You then create an index to the whole text (every word is easy), make a pass over it to find the boundaries, and you're set. it supports queries like the ones you describe. the command line interface is kind of clunky, there are some nice glossy X11 interfaces like the one used to present the OED. best of all, once you do the index (which takes a longish time and a lot of disk) the lookups are amazingly fast, i.e. short enough so that you can run them interactively on a sparcstation 1 and not have time to play hack in another window. You might contact Tim Bray for nitty gritty like pricing. --Ed Edward Vielmetti, U of Michigan math dept comp.archives moderator >> telebit 3: 76 matches >> modem 4: 300 matches >> 3 near 4 5: 39 matches >> docs article including 5 6: 5 matches >> pr.docs.header 1824104, ..From comp.archives Wed Feb 14 20:20:55 EST 1990 Path: jarvis.csri.toronto.edu!mailrus!uwm.edu!zaphod.mps.ohio-state.edu!math.lsa .umich.edu!emv From: jeh@simpact.com Newsgroups: comp.archives Subject: [comp.os.vms] Re: is UUCP available for VMS? (yes! so is NEWS!) ....