Path: utzoo!censor!dybbuk!yunexus!davecb From: davecb@yunexus.UUCP (David Collier-Brown) Newsgroups: news.software.b Subject: Re: Dynamic "smart" expiration? Message-ID: <6118@yunexus.UUCP> Date: 30 Dec 89 01:32:41 GMT References: <1989Dec27.033817.9953@smsc.sony.com> <1989Dec28.063932.13720@robohack.UUCP> <68634@looking.on.ca> <1989Dec29.213539.2801@utzoo.uucp> Organization: York U. Computing Services Lines: 88 >In article <68634@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: >>Either way time based expire is a loser. The purpose of expire is to >>keep down the amount of disk space (and sometimes inodes) used by >>news, isn't it? henry@utzoo.uucp (Henry Spencer) writes: [...] Given constraints on things >like resource consumption, and a user preference for predictable behavior, >it's not obvious that time-based expire is bad. To expand on the above a bit, news has never been a well-behaved user of space, strictly because of the temporal dimension... News is trying to present information in a timely fashion, keep it around until the majority of readers have a chance to read (and save) it and then discard it as "old news". This is hard. In the multi-site case, the delays make it **very** hard. We approximate the discarding of news after use by the expirey scheme, which is really trying to do two things 1) recover space (News appears to think it runs on an infinite-disk-machine (:-)) 2) provide a simple rule to its client base: for example, "you must read category c in one week, category s in 5 days and the rest daily, or you will miss material". The parameterization in expire reflects the author's desire to have the local site set the policy it needs, or not as the case may be. Excessive concern with space (ie, an implementation problem) can cause the behavior of the system to be mysterious and unpredictable to its users. Regrettably, the news flow variability tends to crash up against disk limits a lot, making a time-based expire dangerous: with older news systems news was simply lost when one got a burst that overflowed your disk. [Something with which I am all to familiar]. This leaves us with a contradiction: we have two needs, both quite real, which draw us in opposite directions. The reader needs the illusion of reliability and regular expiery limits. The system needs to trade off space against flow. This tends to make an elegant solution hard. My best attack on the problem is to define a hierarchy of requirements, and satisfy them in order: 1) news shall not drop articles on the floor [0] 2) articles shall be kept around for not less than the "standard" time to forward them to directly-connected systems, plus a safety factor [1+3] 3) articles in groups which are NOT being read locally shall be available for a period of time sufficient to allow a new subscriber to find on or more articles in the group, so they will not mistake the group as inactive. [1+x, x defined by mean time between messages] 4) articles in groups which are being read locally shall be kept for a period known to the readership, shall disappear soon after that time and are in general unrecoverable after they disappear. [1+y] This implys one can usefully probe user's .newsrc files to see if groups belong in category 3 or 4, but will have to deal in policy to make other decisions: a) What groups do you send & recieve, and how much space must you have just for transfer, plus packing and handling. (Indeed, must you have a separate uucp spool...). What agreements about new hierarchies & groups do you have with your feeds. b) What groups and hierarchies do you provide locally. (Why.) What is your minimum residence time. What minimum amount of space must you provide for them, if all were considered "unread". c) What groups/hierarchies are currently read. What additional space is required per day of residency. d) What is the expected increase in volume and readership per year. What does that do to all of the above. e) Do you have categories of groups (ie, comp vs talk). What is your criteria for this categorization. What will changes in category cost in space. So most of the questions are non-technical... and less than exciting to consider. At the technical level (as I implied before), the best model I can suggest is paging, with expire (the reaper!) putting the messages on the deletable list based on as complex a criteria set as you'd like, the news inspooler (space user-upper) trashing them to make room for unpacked new articles, and a optional rescuer grabbing them back if they are re-referenced later. [This last is a gut-feel speculation on my part]. --dave (out of time to write & ideas, simultaneously) c-b -- David Collier-Brown, | davecb@yunexus, ...!yunexus!davecb or 72 Abitibi Ave., | {toronto area...}lethe!dave Willowdale, Ontario, | Joyce C-B: CANADA. 416-223-8968 | He's so smart he's dumb.