Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!samsung!olivea!jerry
From: jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre)
Newsgroups: news.software.b
Subject: Re: history.dbm contents?
Summary: dbm delete doesn't free space
Message-ID: <50803@olivea.atc.olivetti.com>
Date: 16 May 91 02:43:50 GMT
References: <62955@mcdchg.chg.mcd.mot.com>
Sender: news@olivea.atc.olivetti.com
Organization: Olivetti ATC; Cupertino, CA
Lines: 53

In article <62955@mcdchg.chg.mcd.mot.com> heiby@mcdchg.chg.mcd.mot.com (Ron Heiby) writes:
>While it might be really handy for some software that has a message-id
>and wants to convert that to pathname(s) or such to have that info in
>the DBM file that way, I can't find any software that actually makes
>use of that information.  I've written a fairly complete expire Perl

As mentioned some news readers do make use of lookup by ID.  The idea
is that it is possible to read the article mentioned in the refferences
line.  Of course the trend currently is to include the entire article
being refferenced eliminating the need to find the "parent" article.
:-)

>script, which I'm about ready to start testing.  It occurs to me,
>though, that if the actual *data* stored in the DBM file isn't used by
>anyone, just the *key* (whether or not the key exists), then it
>doesn't really matter whether that information is updated by expire.
>If no one uses it, expire could simply delete keys for the ancient
>articles and leave all the others untouched.  I would think that this
>would make for a noticeable speed improvement.

Yes, you could store 0 bytes of data and fulfill what is required for
duplicate suppression.  That should result in a smaller history.pag
file.  But consider, you are storing perhaps 30 bytes of key and 4 bytes
of data.  Cutting back from 34 to 30 bytes is not going to make a
significant improvement.

I have a "newalias" program for handling updates to my mail alias file
that does dbm adds and deletes instead of rebuilding the entire thing
from scratch.  It runs about 10 times faster that way.

But the history.pag file is a different case.  Even if we ignore the
period of inconsistancy of the pointers into the text file that whould
exist if it was being updated, there is still a more significant
problem.  The history.pag file depends on being "sparse" and, as it says
in the documentation, deleting an entry does not free the disk block.
If you went along deleting entries the distribution would eventually
result in every disk block in the history.pag file actually being
allocated.

In other words the physical size of the history.pag file would grow to
equal its logical size.  Given how the logical size of the history.pag
file shocks people until they find out it is not really that big I think
this would not be a good idea.  It might be OK for a few times but at
some point one would want to rebuild from scratch.

I have been using the dbz package with my B news and it works great.
The history.pag file is lots smaller and the expire is about 10 times
faster.  The dbz package takes advantage of the fact that both the key
and the data are in the text file so it only needs to store the offset.
Given that the key is lots bigger than the offset this is a bigger win
than not storing the offset.

				Jerry Aguirre