Path: utzoo!mnetor!uunet!unisoft!gethen!farren From: farren@gethen.UUCP (Michael J. Farren) Newsgroups: news.software.b Subject: Re: news software speedup Message-ID: <847@gethen.UUCP> Date: 2 Apr 88 13:22:56 GMT References: <649@bms-at.UUCP> <10150@ncc.UUCP> <546@fig.bbn.com> <46774@sun.uucp> Reply-To: farren@gethen.UUCP (Michael J. Farren) Organization: There's Unix there in Oakland Lines: 65 In article <46774@sun.uucp> chuq@sun.UUCP (Chuq Von Rospach) writes: [Discussing a means whereby the batch input files are held whole, while the history file contains pointers to it] >o You lose the "Expires:" header. Stuff that is supposed to stay longer > can't. Why? If you modify the history file format (which you're going to have to do anyhow), you could simply add an "expire-date" field, which could, if you want to get fancy, be either the article's own expire date, or a calculated one based on how long your site wants to keep that article. Then, running expire is a simple scan of the history file, zapping the pointers that are out-of-date. Once in a while, you'd want to do a cross-check to see if all of the articles in a given batch are expired, and if so, then remove that batch file. This could be made quite efficient if the batching software pre-sorted the batches into some newsgroup heirarchical order, such as soc. batches, comp. batches, etc. >o You lose adjustable expirations. You can't expire talk.* faster, because > it's all stuck in with everything else. See above. Nothing says you can't expire some articles differently than others, it's just a matter of when you zap the history file (read: index). And if you're batching in discrete groups, you can just expire entire batch files, instead of individual articles (presuming you are using a local expiration, rather than the date in the article). >o It isn't clean for locally posted or non-batched articles. At the simplest > layer, they're simply batches with single articles. But if you've > got lots of local posting or non-batched articles floating around, > the system degenerates into a setup WORSE than the current system, > because the tree is completely flat. ooph. True. However, locally posted articles could be held in a temporary holding pattern, and the batch files generated when they're batched for transmission could then replace them, and be handled just like the others. Or, you could batch them up as they are posted, closing the batch file and opening a new one when the first one got too large. Non-batched articles would be a special case, and would have to be batched as they arrived, perhaps in a special "non-batched" batch, for local use only. If you're providing a full feed, they'd just get batched up for the next site down anyhow. Also - how many non-batched articles does a typical site see? I haven't seen any for months, but I don't know if I'm typical or not. You do lose some stuff with a scheme like this, such as the easy ability to manipulate individual articles (you'd have to extract them individually, which is a loss of efficiency), but you'd also gain some. You would no longer necessarily have to maintain a fairly enormous directory tree - batches could conceivably be kept in a much more compact structure. If the history file contains the Subject: line, you could build a utility quite easily which would allow "K"illing articles by each user in a much more efficient manner than the present one of looking at each individual article. And if you had enormous amounts of CPU (well, I can dream, can't I? :-) you could even implement some sort of compression scheme, allowing you to keep a lot more articles on-line at any given time, or use less disk, whichever you preferred. Cross-posted articles, by the way, would just be duplicate pointers to the same batched article. A little less efficient than the present, but not bad. -- Michael J. Farren | "INVESTIGATE your point of view, don't just {ucbvax, uunet, hoptoad}! | dogmatize it! Reflect on it and re-evaluate unisoft!gethen!farren | it. You may want to change your mind someday." gethen!farren@lll-winken.llnl.gov ----- Tom Reingold, from alt.flame