Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!ernie.Berkeley.EDU!jwl From: jwl@ernie.Berkeley.EDU (James Wilbur Lewis) Newsgroups: news.admin Subject: Re: List of sites with broken Followup (No References) Software Message-ID: <29126@ucbvax.BERKELEY.EDU> Date: 10 May 89 07:46:31 GMT References: <3222@looking.UUCP> Sender: usenet@ucbvax.BERKELEY.EDU Reply-To: jwl@ernie.Berkeley.EDU.UUCP (James Wilbur Lewis) Organization: University of California, Berkeley Lines: 53 In article <3222@looking.UUCP> brad@looking.UUCP (Brad Templeton) writes: -Recently I wrote some software to make use of the References line in news -articles. Much to my dismay, I found that most articles don't have a valid -References: line! The reason is a small number of sites with broken posting -programs that don't include a References: line on followups. I scanned -65,000 news articles, and 3700 of them matched this expression: - - if( !followup && subject has "^re:" ) - -- bad article -- -(The subject starts with re: but there is no References: line) - -Now only 5%, that's not so bad, right? Wrong. Every followup to one of -these bad articles ALSO has a broken reference chain, and is not linked to -the parent. All it takes is just a few sites to make the References: line -useless. - -(If your count is very small compared to your news output, it may be -a local bug for some types of posting, or it may be users who simply -typed in 're:' in a manual subject line.) - -65450 valid articles, 3696 invalid articles In the #2 spot: - 513 ucbvax.berkeley.edu ucbvax is a news server for a bunch of heavily-used local machines, and a lot of news gets posted from there. We use rn and Pnews. I think we can safely assume that Eric Fair is a competent (!) netnews administrator. I think there might be a couple of things going on that you may not have considered: Rather than people manually inserting a "re:" in the subject lines of articles which are really basenotes, it is possible that they are deleting the References: lines of followups, in order to avoid the dreaded "interp buffer overflow" from rn. If this is really what is going on, I suggest one of the following patches to the software: (1) modify rn to allow arbitrarily long Reference: lines, or truncate them automatically when they get unwieldy, or (2) only keep a reference to the article's immediate parent, since the remainder of the current References: line could be reconstructed from that. Another explanation might be in your counting program -- what does it do for malformed reference lines? Many people incorrectly do a global replacement of ">" for some alternate character, to defeat the inews 50% included-text rule. This messes up References: lines, which, while marginally less annoying than extraneous "inews fodder", is still a bother. -- Jim Lewis U.C. Berkeley