Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!oz.cis.ohio-state.edu!jgreely From: jgreely@oz.cis.ohio-state.edu (J Greely) Newsgroups: comp.sys.next Subject: Re: Indexing in the Digital Librarian Message-ID: Date: 3 Aug 89 19:26:12 GMT References: <9187@pucc.Princeton.EDU> Sender: news@tut.cis.ohio-state.edu Reply-To: J Greely Organization: Ohio State University Computer and Information Science Lines: 63 In-reply-to: SEB@pucc.Princeton.EDU's message of 3 Aug 89 15:17:02 GMT In article <9187@pucc.Princeton.EDU> SEB@pucc.Princeton.EDU (Scott E. Barron) writes: >I am trying to add a folder of large text files to the Digital Librarian on >a NeXT. I cannot find any sufficient documentation for doing this, so the >process I used is as follows: Known problem. I tried to index the Internet RFCs (which run to about 19 meg), and never could get them in. Your procedure is correct, but I don't think you're going to get results under 0.9. If you want to experiment further, try the shell-level program index(1). It's not terribly stable, but it gives a bit more flexibility in indexing. >The files I am trying to index are very large, but not as big as the works >of Shakespeare. Do you mean that the total file size is smaller than 8 meg, or that the individual files are no larger than any in Shakespeare? The largest directory I have successfully indexed had only a total of 2 meg of text, with file sizes ranging from 45 bytes to 250 Kbytes. The index is a respectable 800K. >Yet, the DL locked up the first time I tried this, and the >second time it completed the index but the index exceeded 18MB!! The index >on Shakespeare is less than 4MB. What am I doing wrong? It's possible that the text contains a great many words that seem important enough to index (see pword(1)), or that the indexing options were oddly set. Another possibility is that the indexing program failed, but claimed to succeed. >Furthermore, the first time I tried to do a search on the newly created index, >the system locked up. Sounds like the third option! Was this the same copy of the Librarian that you created the index under? Had any other aggresively memory-hungry programs been running? >My biggest complaint about the NeXT is the lack of thorough documentation and >the need for non-UNIX ways to handle problems/errors. I've found the existing documentation to be quite reasonable, despite its very pre-release nature. Meanwhile, parsing the second half of your sentence (with some difficulty), I gather you want more visual administration tools. They are, as far as I know, in the process of being written/debugged. The goal is to have as few tasks as possible that require "old-fashioned" Unix administration. My back-and-forth with the bug handling group about the console (excuse me, the "Mach Window") has convinced me that they want *everything* done from the Workspace, preferably in a visual fashion. >Another problem is that there is no indicator light to show when the >printer is receiving data or when the hard disk is being accessed. Funny, I thought both of these were obvious. When the cube is making coffee, the hard disk is being accessed. When the coffee beans are being ground, the optical disk is in use. When the screen freezes, it's preparing to send data to the printer, and when the cargo jet takes off, it's about to begin printing. Simple. NeXT Support Guru for sale or rent. Inquire within. -=- J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)