Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!ucbvax!iWarp.intel.com!news From: merlyn@iWarp.intel.com (Randal L. Schwartz) Newsgroups: news.software.b Subject: Re: newshist problem? Summary: a slightly improved version of nstats enclosed Message-ID: <1991Jun1.193655.544@iWarp.intel.com> Date: 1 Jun 91 19:36:55 GMT References: <1991May30.194402.24916@ucselx.sdsu.edu> <1991May31.000943.10583@comp.vuw.ac.nz> Sender: news@iWarp.intel.com Reply-To: merlyn@iWarp.intel.com (Randal L. Schwartz) Distribution: usa Organization: Stonehenge; netaccess via Intel, Beaverton, Oregon, USA Lines: 321 In-Reply-To: mark@comp.vuw.ac.nz (Mark Davies) Nntp-Posting-Host: se.iwarp.intel.com In article <1991May31.000943.10583@comp.vuw.ac.nz>, mark@comp (Mark Davies) writes: | In the latest C News patches newshist changed to a shell script and stopped | supporting the -- to signify end of arguments (from getopt). If you look | in nstat where it invokes newshist you can simply take out the -- and | things will work, however be warned performance will be terrible as you | will have a separate invocation of dbz for each message-id in your log | file. | | To get better performance I made the following change to nstat. Rather | that use newshist I now use a little helper program called nhistls. I | don't have the original nstats here any more so I can't give you diffs but | the whole thing is pretty small so I will include it here: Gack. Starting a sh that did a bunch of echoes seemed a little wasted, so I did it in Perl. (The whole thing shows a zillion syscalls with trace, so there's bound to be a few more optimizations.) Here's my tweak... ################################################## snip snip #!/usr/bin/perl # # Nstats - Print C news statistics via Perl # # Version 1.2 (10/17/89) # # # # Author's notes: # # Constructive comments and enhancements are solicited (flames are not). # Please send suggestions or enhancements to denny@mcmi. # # Larry Wall has a Very Nice Work in Perl. Many thanks to him. # # Denny Page, 1989 # # # # Program notes: # # The simplest usage is 'perl nstats ~news/log'. I leave you to find # more complicated invocations. # # While a duplicate is actually a rejected message, it is treated # separately here. Rejected messages herein are messages that are not # subscribed to in the sys file or are excluded in the active file. # # Junked messages are not displayed in the system summaries. It's not # your neighbor's fault that you are missing active file entries. If # you are concerned about receiving junk groups, exclude them in your # sys or active file. They will then be summarized :-). # # The reason for a newsgroup being bad is assigned only once. If the # reason changes later in the log (such as the sys file being modified # such that a newsgroup is no longer rejected, but rather is filed in # junk), no notice will be taken. # # Calls to newshist are cached at 25. This may need to be adjusted at # some sites. # # Sitenames are truncated to 15 characters. This could be done better. # # # Output headers have the following meanings: # # System Name of the neighboring system. # Accept Number of accepted articles from system. # Dup Number of duplicate articles received from system. # Rej Number of rejected articles from system. # Sent Number of articles sent to system. # Sys% Accepted (or duplicate or rejected) articles as a # percentage of total articles from that system. # Tot% Accepted (or duplicate) articles as a percentage # of total accepted (or duplicate) articles. # Avl% Number of articles sent as a percentage of total # available (accepted) articles. # ############################################################ # # Revision history: # # 09/24/89 dny Initial version # 09/28/89 dny Added category totals # 10/02/89 dny Fixed link count bug in record_groups # 10/03/89 dny Cleaned up variable names # 10/16/89 dny Renamed variables - Perl 3.0 # 10/17/89 dny Fixed bug in rejection counts # 04/18/91 mark@comp.vuw.ac.nz # speedups replacing newshist now # that it is a shell script # 06/01/91 merlyn@iWarp.intel.com # replaced mark's shell script with Perl code # ############################################################ ############################################################ # Record the category of a list of message-ids sub record_groups { local(@ids) = @_; grep(!//, @ids); local($ids) = join("\n",@ids); open(newshist, "-|") || exec <<"PERL_EOF"; /usr/lib/newsbin/dbz -ix /usr/lib/news/history <) { if (s/^.+\t.+\t(.+)\n$/$1/) { $batchcnt--; foreach $link (split(/ /)) { $link =~ s/^([^\.\/]+).*/$1/; $category{$link}++; } } } $category{"*expired*"} += $batchcnt; close(newshist); } ############################################################ $#id_cache = -1; while (<>) { ($from, $action, $message_id, $text) = /^.+\s(\S+)\s(.)\s<(.+)>\s(.*)$/; $from = substr($from, 0, 15); # Accepted message if ($action eq '+') { $accepted{$from}++; foreach $site (split(/ /, $text)) { $site = substr($site, 0, 15); $sent{$site}++; } $id_cache[++$#id_cache] = $message_id; unless ($#id_cache < 50) { do record_groups(@id_cache); $#id_cache = -1; } next; } elsif ($action eq '-') { # Duplicate if ($text eq 'duplicate') { $duplicates{$from}++; next; } $rejected{$from}++; # Group not in sys if ($text =~ s/no subscribed groups in `(.+)'/$1/) { foreach $group (split(/,/, $text)) { if ($badgroup{$group}++ == 0) { $badgroup_reason{$group} = "not subscribed in sys"; } } next; } # Group excluded in active elsif ($text =~ s/all groups `(.+)' excluded in active/$1/) { foreach $group (split(/,/, $text)) { if ($badgroup{$group}++ == 0) { $badgroup_reason{$group} = "excluded in active"; } } next; } } # Junked message elsif ($action eq 'j') { $junk{$from}++; if ($text =~ s/junked due to groups `(.+)'/$1/) { foreach $group (split(/,/, $text)) { if ($badgroup{$group}++ == 0) { $badgroup_reason{$group} = "not in active (junked)"; } } next; } } # Ignore ihave/sendme messages elsif ($action eq 'i') {next;} elsif ($action eq 's') {next;} # Unknown input line print $_; } if ($#id_cache >= 0) { do record_groups(@id_cache); } # Collect all sitenames and calc totals foreach $system (keys(accepted)) { $systems{$system} = 1; $total_accepted += $accepted{$system}; } foreach $system (keys(duplicates)) { $systems{$system} = 1; $total_duplicates += $duplicates{$system}; } foreach $system (keys(rejected)) { $systems{$system} = 1; $total_rejected += $rejected{$system}; } foreach $system (keys(sent)) { $systems{$system} = 1; $total_sent += $sent{$system}; } $total_articles = $total_accepted + $total_duplicates + $total_rejected; # Print system summaries print "\nSystem Accept sys% tot% Dup sys% tot% Rej sys% Sent avl%\n"; foreach $system (sort keys(systems)) { $articles = $accepted{$system} + $duplicates{$system} + $rejected{$system}; if ($accepted{$system} > 0) { $accepted_pct = ($accepted{$system} * 100) / $articles + 0.5; $accepted_totpct = ($accepted{$system} * 100) / $total_accepted + 0.5; } else { $accepted_pct = 0; $accepted_totpct = 0; } if ($duplicates{$system} > 0) { $duplicates_pct = ($duplicates{$system} * 100) / $articles + 0.5; $duplicates_totpct = ($duplicates{$system} * 100) / $total_duplicates + 0.5; } else { $duplicates_pct = 0; $duplicates_totpct = 0; } if ($rejected{$system} > 0) { $rejected_pct = ($rejected{$system} * 100) / $articles + 0.5; } else { $rejected_pct = 0; } if ($sent{$system} > 0) { $sent_pct = ($sent{$system} * 100) / $total_accepted + 0.5; } else { $sent_pct = 0; } printf "%-15s %5d %3d%% %3d%% %4d %3d%% %3d%% %4d %3d%% %5d %3d%%\n", $system, $accepted{$system}, $accepted_pct, $accepted_totpct, $duplicates{$system}, $duplicates_pct, $duplicates_totpct, $rejected{$system}, $rejected_pct, $sent{$system}, $sent_pct; } if ($total_accepted > 0) { $accepted_pct = ($total_accepted * 100) / $total_articles + 0.5; } else { $accepted_pct = 0; } if ($total_rejected > 0) { $rejected_pct = ($total_rejected * 100) / $total_articles + 0.5; } else { $rejected_pct = 0; } if ($total_duplicates > 0) { $duplicates_pct = ($total_duplicates * 100) / $total_articles + 0.5; } else { $duplicates_pct = 0; } printf "TOTALS %5d %3d%% %4d %3d%% %4d %3d%% %5d\n", $total_accepted, $accepted_pct, $total_duplicates, $duplicates_pct, $total_rejected, $rejected_pct, $total_sent; # Display any bad newsgroups received @keys = sort(keys(badgroup)); if ($#keys >= 0) { print "\n\nBad Newsgroups Articles Reason\n"; foreach $group (@keys) { printf "%-35s %4d %s\n", $group, $badgroup{$group}, $badgroup_reason{$group}; } } # Display news categories received @keys = sort(keys(category)); if ($#keys >= 0) { print "\n\nCategories Received Articles\n"; foreach $group (@keys) { printf "%-35s %4d\n", $group, $category{$group}; } } ################################################## snip snip Just another Perl and Cnews hacker, -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/