Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!think!husc6!rutgers!clyde!cbatt!cwruecmp!nitrex!rbl From: rbl@nitrex.UUCP ( Dr. Robin Lake ) Newsgroups: comp.unix.questions,comp.unix.xenix,misc.wanted Subject: Re: Unix/Xenix Software to make an Index Message-ID: <391@nitrex.UUCP> Date: Wed, 19-Nov-86 15:17:29 EST Article-I.D.: nitrex.391 Posted: Wed Nov 19 15:17:29 1986 Date-Received: Thu, 20-Nov-86 01:33:33 EST References: <127@rd1632.UUCP> <63@cogent.UUCP> Reply-To: rbl@nitrex.UUCP ( Dr. Robin Lake ) Distribution: net Organization: The Standard Oil Co., Cleveland Lines: 33 Keywords: index, tree, text processing Xref: mnetor comp.unix.questions:120 comp.unix.xenix:17 misc.wanted:234 There have been several requests regarding automatic generation of indices from text. I did it once, about 10 years ago, when UNIX was younger and more forgiving (as was I!). The logic of the program goes as follows: In the early (V6) days of UNIX, the tutorial on C included an example program called "tree". It built a binary tree of words, counted the occurrance of each word and then (at EOF on stdin) printed an alphabetical list of words and their occurrances. A "straightforward" modification of this program involves changing the data structure of each tree node to allow a list of page numbers in place of the integer count of occurrances. As the incoming text is scanned, pick up the current page number. As a word occurs, enter it into the binary tree (if it's new) and add the page number to that word's page number list. At the EOF, traverse the tree, print the words and their associated page numbers. Voila! An index! The problem came when we ran out of memory on the PDP-11/45 we used then (at a very different institution). We never took the time to work out the problem of storing (sub)-trees onto disk files and then combining them at the EOF. (Sounds like a great homework assignment for a Data Structures course!). Source (highly commented) to tree.c available via e-mail on request. If enough (N > ?) requests come in, I'll post it. Sorry, but the indexing version is not available, having gone to bit heaven years ago. Questions to: Robin Lake Standard Oil R&D (216)-581-5976 cbatt!nitrex!rbl dexvax!cwruecmp!nitrex!rbl