Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/17/84; site hao.UUCP
Path: utzoo!watmath!clyde!cbosgd!hplabs!hao!woods
From: woods@hao.UUCP (Greg Woods)
Newsgroups: net.news,net.news.group
Subject: Re: more interim results from worldwide net readership poll
Message-ID: <1998@hao.UUCP>
Date: Fri, 14-Mar-86 03:41:36 EST
Article-I.D.: hao.1998
Posted: Fri Mar 14 03:41:36 1986
Date-Received: Sat, 15-Mar-86 19:29:44 EST
References: <5192@glacier.ARPA> <1994@hao.UUCP> <5249@glacier.ARPA>
Organization: High Altitude Obs./NCAR, Boulder CO
Lines: 97
Xref: watmath net.news:4670 net.news.group:5209

> Baloney, Greg. Leaving out one site means nothing. Leaving out a backbone site
> means nothing. For that matter, leaving out 100 sites or 20% of the backbone,
> means nothing. 

   FIne. As long as you make this clear to everyone who would interpret
your results, then I have no problem with it.

> If a significant fraction of the net (perhaps more than 30%)
> cannot run the Bourne shell, then it is perhaps worth worrying about making
> a version of this data-gathering scheme that uses some other shell.

  And assuming, of course, that there is nothing you would like to find out
about that is related it *any* way to running or not running the Bourne shell.

> reporting in. By the time 30% of the net reports in, almost any 30%,

  This is a typical fallacy. It certainly DOES matter which 30%. After
all, the soapbox groups that hao no longer carries account for nearly
that much of the net traffic.

> I believe
> that the statistical quality of the readership data will be so much better
> than any other metric ever applied to the network

  True but irrelevant. All the other 'metrics' have been virtually non-
existent. Better than nothing isn't saying much.

> The biggest single problem with the network in its 6 years of existence
> has been that the loud, angry users get all the attention. 

   Not always true. Would you describe Spaf as loud and angry? :-)

> What I am doing
> is collecting data from and about people who would otherwise never respond.

   I applaud your effort and I support you 100%. Just don't go too far
and uphold your data as representing the whole network when you have a pitiful
2% of the data. A lot can happen in 98%. You first have to demonstrate
that there isn't a correlation between those who do/don't respond
to your survey (for whatever reason(s)) and whatever it is you are trying
to observe.

> I now have information from 1000 people's .newsrc files, and complaints from
> 7 people that my survey is unfair because it didn't handle their wierd 
> special case properly 

   ...and God knows how many more who didn't complain or whose complaint
or even a legitimate response got lost in the morass of uucp mail.
I also do not consider inability to run one particular shell a
'weird special case'. You are clearly biased, and that bias is likely
to be reflected in any results you come up with. At least post a C
program for Chrissake, if you claim to represent a network of totally
varied UNIX sites. C is about the only thing close to a standard that
exists. And even that has it's problems....

> I have no interest in
> hearing from all of the hackers. I want to hear from and about the people who
> would never dream of arguing about things like this, but who are net readers.

  If you are claiming to represent the whole network, why does it make any
effing difference what YOU are interested in? The hackers are a significant
portion of the readership!

> I claim that my shell script, even if it won't run on your machine, is picking
> up that data. 

  I challenge you to demonstrate that this is the case. I congratulate you
on starting the effort. No one else has even bothered to try, and you
deserve credit for that. But let's not get carried away. Any data
gathering scheme that depends on anything more than a C compiler that can
compile STANDARD C (i.e. no 20-character identifiers) can hardly be considered
as representing this entire network.

> This data is not perfect. I'll grant that. It might not even be accurate
> to within 20%. But it is 100 times more accurate than any other data anybody
> else has. Let's collect this round of data, and look at it, and then talk
> about ways of making marginal improvements on the data-gathering techniques.

  We need more than 'marginal' improvements. 1.4% is PITIFUL. I do grant
you that it's orders of magnitude better than anything we've had previously,
but that doesn't justify holding it up as representing the entire net.

P.S. Are you and I the only ones in on this? What does everyone else
think? Any concrete suggestions from the statistician types out there
as to how we might actually go about collecting a representative sample?

> Brian Reid
> Stanford
--
{ucbvax!hplabs | decvax!noao | mcvax!seismo | ihnp4!seismo}
       		        !hao!woods

CSNET: woods@ncar.csnet  ARPA: woods%ncar@CSNET-RELAY.ARPA

"If the game is lost, we're all the same; no one left to place or take the 
blame; Will we leave this place an empty stone, or a shining ball of earth,
we can call our home"