Xref: utzoo news.config:1220 news.admin:5773 alt.sources:608 Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!epimass!jbuck From: jbuck@epimass.EPI.COM (Joe Buck) Newsgroups: news.config,news.admin,alt.sources Subject: Re: new survey to supplement arbitron. Please run this program. Message-ID: <3215@epimass.EPI.COM> Date: 19 May 89 22:08:10 GMT References: <80@jove.dec.com> Reply-To: jbuck@epimass.EPI.COM (Joe Buck) Followup-To: news.config Organization: Entropic Processing, Inc., Cupertino, CA Lines: 108 Brian, your program, if invoked in the way you request, will process crossposted articles N times, where N is the number of groups present. Please, let's not waste net resources by conducting a large-scale survey with a basic error in it. Rather than do a "find" to locate article names, you can count crossposted articles only once by reading the history file to obtain article filenames. Since this is going to alt.sources, I obviously need to include a source: here is a perl program that eats a history file and spits out a sorted list of host pairs, showing the links your news has travelled through. ------------------------------ cut here ------------------------------ #! /usr/bin/perl # This perl program scans through all the news on your spool # (using the history file to find the articles) and prints # out a sorted list of frequencies that each pair of hosts # appears in the Path: headers. That is, it determines how, # on average, your news gets to you. # # If an argument is given, it is the name of a previous output # of this program. The figures are read in, and host pairs # from articles newer than the input file are added in. # So that this will work, the first line of the output of the # program is of the form # Last-ID: <5679@chinet.UUCP> # (without the # sign). It records the last Message-ID in the # history file; to add new articles, we skip in the history file # until we find the message-ID that matches "Last-ID". $skip = 0; if ($#ARGV >= 0) { $ofile = $ARGV[0]; die "Can't open $ofile!\n" unless open (of, $ofile); # First line must contain last msgid to use. $_ = ; ($key, $last_id) = split (' '); die "Invalid input file format!\n" if ($key ne "Last-ID:"); $skip = 1; # Read in the old file. while () { ($cnt, $pair) = split(' '); $pcount{$pair} = $cnt; } } # Let's go. die "Can't open history file!\n" unless open (hist, "/usr/lib/news/history"); die "Can't cd to news spool directory!\n" unless chdir ("/usr/spool/news"); $np = $nlocal = 0; while () { # # $_ contains a line from the history file. Parse it. # Skip it if the article has been cancelled or expired # If the $skip flag is true, we skip until we have the right msgid # ($id, $date, $time, $file) = split (' '); next if ($file eq 'cancelled' || $file eq ''); if ($skip) { if ($id eq $last_id) { $skip = 0; } next; } # # format of field is like comp.sources.unix/2345 . Get ng and filename. # ($ng, $n) = split (/\//, $file); $file =~ tr%.%/%; # # The following may be used to skip any local groups. Here, we # skip group names beginning with "epi" or "su". Change to suit taste. # next if $ng =~ /^epi|^su/; next unless open (art, $file); # skip if cannot open file # # Article OK. Get its path. while () { ($htype, $hvalue) = split (' '); if ($htype eq "Path:") { # We have the path, in hvalue. $np++; @path = split (/!/, $hvalue); # Handle locally posted articles. if ($#path < 2) { $nlocal++; last;} # Create and count pairs. for ($i = 0; $i < $#path - 1; $i++) { $pair = $path[$i] . "!" . $path[$i+1]; $pcount{$pair} += 1; } last; } } } # Make sure print message comes out before sort data. $| = 1; print "Last-ID: $id\n"; $| = 0; # write the data out, sorted. Open a pipe. die "Can't exec sort!\n" unless open (sortf, "|sort -nr"); while (($pair, $n) = each (pcount)) { printf sortf ("%6d %s\n", $n, $pair); } close sortf; -- -- Joe Buck jbuck@epimass.epi.com, uunet!epimass.epi.com!jbuck