Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!samsung!olivea!jerry From: jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) Newsgroups: news.software.b Subject: Re: history.dbm contents? Summary: dbm delete doesn't free space Message-ID: <50803@olivea.atc.olivetti.com> Date: 16 May 91 02:43:50 GMT References: <62955@mcdchg.chg.mcd.mot.com> Sender: news@olivea.atc.olivetti.com Organization: Olivetti ATC; Cupertino, CA Lines: 53 In article <62955@mcdchg.chg.mcd.mot.com> heiby@mcdchg.chg.mcd.mot.com (Ron Heiby) writes: >While it might be really handy for some software that has a message-id >and wants to convert that to pathname(s) or such to have that info in >the DBM file that way, I can't find any software that actually makes >use of that information. I've written a fairly complete expire Perl As mentioned some news readers do make use of lookup by ID. The idea is that it is possible to read the article mentioned in the refferences line. Of course the trend currently is to include the entire article being refferenced eliminating the need to find the "parent" article. :-) >script, which I'm about ready to start testing. It occurs to me, >though, that if the actual *data* stored in the DBM file isn't used by >anyone, just the *key* (whether or not the key exists), then it >doesn't really matter whether that information is updated by expire. >If no one uses it, expire could simply delete keys for the ancient >articles and leave all the others untouched. I would think that this >would make for a noticeable speed improvement. Yes, you could store 0 bytes of data and fulfill what is required for duplicate suppression. That should result in a smaller history.pag file. But consider, you are storing perhaps 30 bytes of key and 4 bytes of data. Cutting back from 34 to 30 bytes is not going to make a significant improvement. I have a "newalias" program for handling updates to my mail alias file that does dbm adds and deletes instead of rebuilding the entire thing from scratch. It runs about 10 times faster that way. But the history.pag file is a different case. Even if we ignore the period of inconsistancy of the pointers into the text file that whould exist if it was being updated, there is still a more significant problem. The history.pag file depends on being "sparse" and, as it says in the documentation, deleting an entry does not free the disk block. If you went along deleting entries the distribution would eventually result in every disk block in the history.pag file actually being allocated. In other words the physical size of the history.pag file would grow to equal its logical size. Given how the logical size of the history.pag file shocks people until they find out it is not really that big I think this would not be a good idea. It might be OK for a few times but at some point one would want to rebuild from scratch. I have been using the dbz package with my B news and it works great. The history.pag file is lots smaller and the expire is about 10 times faster. The dbz package takes advantage of the fact that both the key and the data are in the text file so it only needs to store the offset. Given that the key is lots bigger than the offset this is a bigger win than not storing the offset. Jerry Aguirre