Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site calma.UUCP Path: utzoo!decvax!decwrl!sun!calma!adams From: adams@calma.UUCP (Robert Adams) Newsgroups: net.news.group Subject: Re: Re: More stats as of Nov. 10, 1985 Message-ID: <79@calma.UUCP> Date: Tue, 26-Nov-85 23:13:45 EST Article-I.D.: calma.79 Posted: Tue Nov 26 23:13:45 1985 Date-Received: Wed, 27-Nov-85 22:25:24 EST References: <740@seismo.CSS.GOV> <2182@umcp-cs.UUCP> Reply-To: adams@calma.UUCP (Robert Adams) Followup-To: <2265@amdahl.UUCP> Distribution: na Organization: GE/Calma Co., R&D Systems Engineering, Milpitas, CA Lines: 67 > E. Michael Smith ...!{hplabs,ihnp4,amd,nsc}!amdahl!ems > It would be interesting to see what percentage of total > volume and what percentage of each group volume was made up of > headers and footers. By what percent would total net volume > drop if cute .signatures were eliminated and a standard > disclaimer were appended to all articles. (Rather than having > each person come up with the obligatory disclaimer...) > > This should save at least the couple of percent that the major > groups consume. I wrote a program to scan all news on our system and gather such statistics. What follows is the output of same. A "signature" is the lines after a line of "-- " which doesn't get them all but... "Included" lines are ones beginning with ">". Average article length is skewed because of a few large files -- maps and sources. Notice that 1/4 of the characters are in headers and that 4% of the total characters stored are in the Path: line. adams@calma.UUCP -- Robert Adams ...!ucbvax!calma!adams ------------------ cut here ---------------- files = 9328, lines = 523997, characters = 21082117 average lines per file = 56, average chars per file = 2260 header lines = 119323, characters = 5099675, percent = 24% signature lines = 28212, characters = 957140, percent = 5% inserted lines = 47111, characters = 2401935, percent = 11% percent percent Header occurances total chars of headers of total Relay-Version 9327 522336 10.2% 2.5% Posting-Version 9322 585931 11.5% 2.8% Path 9327 803948 15.8% 3.8% From 9327 344330 6.8% 1.6% Newsgroups 9327 268699 5.3% 1.3% Subject 9327 362454 7.1% 1.7% Message-ID 9327 285049 5.6% 1.4% Date 9327 260814 5.1% 1.2% Article-I.D. 0 0 0.0% 0.0% Posted 0 0 0.0% 0.0% Date-Received 9327 345084 6.8% 1.6% References 5279 286359 5.6% 1.4% Distribution 2743 46750 0.9% 0.2% Organization 8531 393317 7.7% 1.9% Lines 9327 82888 1.6% 0.4% Xref 2381 126036 2.5% 0.6% Approved 394 12167 0.2% 0.1% Nf-ID 603 27965 0.5% 0.1% Nf-From 605 30231 0.6% 0.1% Control 275 8581 0.2% 0.0% Reply-To 2515 109362 2.1% 0.5% Sender 1150 34935 0.7% 0.2% Xpath 61 1379 0.0% 0.0% Keywords 560 17544 0.3% 0.1% Summary 727 17643 0.3% 0.1% Followup-To 132 3405 0.1% 0.0% Expires 67 2054 0.0% 0.0% Cc 7 65 0.0% 0.0% Apparently-To 5 160 0.0% 0.0% In-reply-to 1 63 0.0% 0.0% This-Account 1 26 0.0% 0.0% Reply-tp 1 27 0.0% 0.0% Original-Subject 3 179 0.0% 0.0% Followups-to 2 50 0.0% 0.0% other 2 117 0.0% 0.0%