Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!ames!oliveb!jerry
From: jerry@oliveb.UUCP (Jerry F Aguirre)
Newsgroups: news.software.b
Subject: Re: Is the history file really needed anymore?
Message-ID: <414@oliveb.UUCP>
Date: Wed, 21-Jan-87 15:35:25 EST
Article-I.D.: oliveb.414
Posted: Wed Jan 21 15:35:25 1987
Date-Received: Thu, 22-Jan-87 03:20:45 EST
References: <5504@ukma.ms.uky.csnet> <1307@ncr-sd.UUCP> <7529@utzoo.UUCP>
Reply-To: jerry@oliveb.UUCP (Jerry F Aguirre)
Organization: Olivetti ATC; Cupertino, Ca
Lines: 75
Keywords: netnews history dbm

In article <7529@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>Eliminating the history file is, on the whole, a silly idea.  Precisely
>what benefits is the change supposed to produce?  If your news system
>lets articles get filed without putting them in the history file, this
>is a defect of the implementation, not the concept.  Ditto for foulups in
>coordination between inews and expire.  (What makes you think that the
>implementation of the new concept will be any better?  Of course, it won't
>foul up in precisely the same way...)

The map is not the territory.  Whenever you keep a separate index to
something there is the potential for the index to not agree.  With the
index built into the data there is less potential for disagreement.  In
this case we have three different versions, the history file, the dbm
copy, and the articles themselves.

I see a constant series of complaints on the net about the history file
being out of sync.  The classic case seems to be running out of disk
space which causes the history file to wind up zero length.  It is not
quite clear to me what changes to the news software could reliably
handle the problem of running out of disk space.

With no history file the potential for duplicates of the same article
arriving can cause even more disk space problems as well as user
annoyance.

>The assumption that article-ids always have reasonable formats is terribly
>naive.  Have any of the people suggesting this done *systematic* *surveys*
>of article-ids?  I thought not.  The vast majority of article-ids are in
>simple and reasonable format; it's the not-insignificant minority that
>aren't that causes trouble.  (I speak from experience.)

The format of the message IDs and the difficulty in translating them
into filenames is not a serious problem.  It is trivial to design one
that can handle any character value, even non-ascii characters.  For
example: take the first character and split it into two hex digits, make
directories for those names (00 - ff).  Repeat this with the next
character until the number of files per directory is reasonable.

As far as characters for the final file name, Unix allows any characters
in a filename except the '/' (and of course null).  Message IDs are
spec'ed to allow any character except the blank and null.  Translate any
'/' into blanks and you have a legal Unix file name.  Length of the ID
is more of a problem and could be handled by taking the first N (14)
characters, making a directory of that name, etc.

Non-Unix systems would have different restrictions but the same
principle could be applied.  For instance VMS systems have a more
restricted filename character set but do not suffer from the directory
length problem.  An extension of the translate characters to hex
strategy could be used.

If the article were created without write permission (mode 444) then the
create itself provides all the (portable) locking necessary to prevent
two "simultaneous" receptions of the same article.

As far as overhead, it is hard to understand how a single call to create
a file can have a prohibitive overhead.  That is all that is required to
check the "history" AND make the new entry.  As a directory need only be
made for each N files (say 100) the overhead for that can not be
significant.  I have heard some mention of "searching" directories for
each received article.  I don't understand what prompted someone to
think this.  How, exactly, does the proposed method differ from the
current method of creating an article (and possibly the directory for
it)?

I think that most of the same arguments could be made about the current
hierarchy for storing the news.  Right now the news software uses links
for posting to multiple news groups when it could just as easily use
index files to cross post.  I am no expert on "notes" but doesn't it use
that kind of scheme.  I seem to remember postings arguing the same kind
of "advantages" for the notes system as are being given for the history
file.

					Jerry Aguirre
					Olivetti ATC