Path: utzoo!mnetor!uunet!husc6!uwvax!oddjob!mimsy!eneevax!umd5!vrdxhq!bms-at!stuart From: stuart@bms-at.UUCP (Stuart D. Gathman) Newsgroups: news.software.b Subject: news software speedup Message-ID: <649@bms-at.UUCP> Date: 22 Mar 88 20:17:41 GMT Organization: Business Management Systems, Inc., Fairfax, VA Lines: 44 Keywords: implementation ideas I have been using the Bnews software for about 2 years now. It is very useful for in house as well as usenet purposes. I have had some ideas about implementing the article storage. Please send flames via E-mail. (Really!) A major problem with the current system is scanning article headers in many seperate files. (And unix doesn't like big directories to boot.) My idea is to have 2 files per newsgroups directory (other than sub- directories). All headers for a news group would be in one file with offsets into another file containing all articles. Processing incoming news would then be faster. Programs like 'vn' would be orders of magnitude faster. The only problem is 'expire'. I maintain that 'expire' would still be reasonable. It would work by reading the header file and writing a new version for each newsgroup. It can seek past articles to be deleted while copying the ones to be retained to a new file. When finished, move the new versions into place. This needs to be done only one newsgroup at a time, so there is no disk space problem. Not only that, but no history file is needed! (The actual arrival time can be stored in the header file if desired.) Since expire is run on a batch basis, its decreased performance is not a problem. (At least compared with the current slow performance of interactive programs.) The time to read the header file is comparable to that required to read the current history files. It can be reduced by storing only some headers in the header file. The rest can be stored in the article file. Checking an incoming article to see if it is already present would be faster than the current SysV scheme of the 10 history files because a single header file will be much smaller than a 10th of the history files. (A dbm file could still be used also.) This assumes that the newsgroup information is consistent in duplicate articles. If only the article ID should be relied on, a history database of some form is still necessary. This method also uses less inodes. A problem on small systems. The only improvement on this method I can think of would be to introduce an indexing file structure that handles variable length records. But this loses the advantage of simplicity. -- Stuart D. Gathman <..!{vrdxhq|daitc}!bms-at!stuart>