Path: utzoo!attcan!utgpu!watmath!mks.com!egisin From: egisin@mks.com (Eric Gisin) Newsgroups: news.admin Subject: News Versions, another approach Message-ID: <1989Nov15.213552.9499@mks.com> Date: 15 Nov 89 21:35:52 GMT Organization: Mortice Kern Systems Inc., Waterloo, Ontario, CANADA Lines: 28 I wrote a short awk program that identifies news versions based on Message-ID syntax. It wasn't too accurate because B news, Notes, VMS, and P news all use the same syntax. I did discover that there are several other versions that I could not identify, which I have called E, and F and Other. This analysis differs from the version-control-message method in two ways. First, it only recognizes sites that have posted in the last two weeks. Second, it also recognizes sites that do not respond to version messages, which makes this method more reliable. Many C news sites don't seem to respond. It is also including many mailing list gateways. Here are the results from running it on a 2.4MB history file (2 weeks). type # distinct domains B 2162 C 477 E 36 F 448 Other 325 Total 3448 Here are the REs I used in the awk program. ID is <$2@$3>. $2 ~ /^[0-9]+$/ { # B news $2 ~ /^(1989|[A-Z0-9]*\.89)[JFMASOND]/ { # C news $2 ~ /^[JFMASOND].*1989/ { # E news $2 ~ /^[0-9.]+AA/ { # F news { # Other