Path: utzoo!mnetor!uunet!husc6!uwvax!oddjob!mimsy!eneevax!umd5!vrdxhq!bms-at!stuart
From: stuart@bms-at.UUCP (Stuart D. Gathman)
Newsgroups: news.software.b
Subject: news software speedup
Message-ID: <649@bms-at.UUCP>
Date: 22 Mar 88 20:17:41 GMT
Organization: Business Management Systems, Inc., Fairfax, VA
Lines: 44
Keywords: implementation ideas

I have been using the Bnews software for about 2 years now.  It is
very useful for in house as well as usenet purposes.

I have had some ideas about implementing the article storage.  Please
send flames via E-mail.  (Really!)

A major problem with the current system is scanning article headers
in many seperate files.  (And unix doesn't like big directories to boot.)
My idea is to have 2 files per newsgroups directory (other than sub-
directories).  All headers for a news group would be in one file
with offsets into another file containing all articles.

Processing incoming news would then be faster.  Programs like 'vn' would
be orders of magnitude faster.  The only problem is 'expire'.  I maintain
that 'expire' would still be reasonable.  It would work by reading the
header file and writing a new version for each newsgroup.  It can seek
past articles to be deleted while copying the ones to be retained
to a new file.  When finished, move the new versions into place.  This
needs to be done only one newsgroup at a time, so there is no disk
space problem.  Not only that, but no history file is needed!  (The
actual arrival time can be stored in the header file if desired.)  Since
expire is run on a batch basis, its decreased performance is not
a problem.  (At least compared with the current slow performance of
interactive programs.)

The time to read the header file is comparable
to that required to read the current history files.  It can be reduced
by storing only some headers in the header file.  The rest can be stored
in the article file.  Checking an incoming article to see if it is
already present would be faster than the current SysV scheme of the
10 history files because a single header file will be much smaller than
a 10th of the history files.  (A dbm file could still be used also.)
This assumes that the newsgroup information is consistent in duplicate
articles.  If only the article ID should be relied on, a history
database of some form is still necessary.

This method also uses less inodes.  A problem on small systems.

The only improvement on this method I can think of would be to introduce
an indexing file structure that handles variable length records.  But
this loses the advantage of simplicity.
-- 
Stuart D. Gathman	<stuart@bms-at.uucp>
			<..!{vrdxhq|daitc}!bms-at!stuart>