Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/17/84; site hao.UUCP Path: utzoo!watmath!clyde!cbosgd!hplabs!hao!woods From: woods@hao.UUCP (Greg Woods) Newsgroups: net.news,net.news.group Subject: Re: more interim results from worldwide net readership poll Message-ID: <1998@hao.UUCP> Date: Fri, 14-Mar-86 03:41:36 EST Article-I.D.: hao.1998 Posted: Fri Mar 14 03:41:36 1986 Date-Received: Sat, 15-Mar-86 19:29:44 EST References: <5192@glacier.ARPA> <1994@hao.UUCP> <5249@glacier.ARPA> Organization: High Altitude Obs./NCAR, Boulder CO Lines: 97 Xref: watmath net.news:4670 net.news.group:5209 > Baloney, Greg. Leaving out one site means nothing. Leaving out a backbone site > means nothing. For that matter, leaving out 100 sites or 20% of the backbone, > means nothing. FIne. As long as you make this clear to everyone who would interpret your results, then I have no problem with it. > If a significant fraction of the net (perhaps more than 30%) > cannot run the Bourne shell, then it is perhaps worth worrying about making > a version of this data-gathering scheme that uses some other shell. And assuming, of course, that there is nothing you would like to find out about that is related it *any* way to running or not running the Bourne shell. > reporting in. By the time 30% of the net reports in, almost any 30%, This is a typical fallacy. It certainly DOES matter which 30%. After all, the soapbox groups that hao no longer carries account for nearly that much of the net traffic. > I believe > that the statistical quality of the readership data will be so much better > than any other metric ever applied to the network True but irrelevant. All the other 'metrics' have been virtually non- existent. Better than nothing isn't saying much. > The biggest single problem with the network in its 6 years of existence > has been that the loud, angry users get all the attention. Not always true. Would you describe Spaf as loud and angry? :-) > What I am doing > is collecting data from and about people who would otherwise never respond. I applaud your effort and I support you 100%. Just don't go too far and uphold your data as representing the whole network when you have a pitiful 2% of the data. A lot can happen in 98%. You first have to demonstrate that there isn't a correlation between those who do/don't respond to your survey (for whatever reason(s)) and whatever it is you are trying to observe. > I now have information from 1000 people's .newsrc files, and complaints from > 7 people that my survey is unfair because it didn't handle their wierd > special case properly ...and God knows how many more who didn't complain or whose complaint or even a legitimate response got lost in the morass of uucp mail. I also do not consider inability to run one particular shell a 'weird special case'. You are clearly biased, and that bias is likely to be reflected in any results you come up with. At least post a C program for Chrissake, if you claim to represent a network of totally varied UNIX sites. C is about the only thing close to a standard that exists. And even that has it's problems.... > I have no interest in > hearing from all of the hackers. I want to hear from and about the people who > would never dream of arguing about things like this, but who are net readers. If you are claiming to represent the whole network, why does it make any effing difference what YOU are interested in? The hackers are a significant portion of the readership! > I claim that my shell script, even if it won't run on your machine, is picking > up that data. I challenge you to demonstrate that this is the case. I congratulate you on starting the effort. No one else has even bothered to try, and you deserve credit for that. But let's not get carried away. Any data gathering scheme that depends on anything more than a C compiler that can compile STANDARD C (i.e. no 20-character identifiers) can hardly be considered as representing this entire network. > This data is not perfect. I'll grant that. It might not even be accurate > to within 20%. But it is 100 times more accurate than any other data anybody > else has. Let's collect this round of data, and look at it, and then talk > about ways of making marginal improvements on the data-gathering techniques. We need more than 'marginal' improvements. 1.4% is PITIFUL. I do grant you that it's orders of magnitude better than anything we've had previously, but that doesn't justify holding it up as representing the entire net. P.S. Are you and I the only ones in on this? What does everyone else think? Any concrete suggestions from the statistician types out there as to how we might actually go about collecting a representative sample? > Brian Reid > Stanford -- {ucbvax!hplabs | decvax!noao | mcvax!seismo | ihnp4!seismo} !hao!woods CSNET: woods@ncar.csnet ARPA: woods%ncar@CSNET-RELAY.ARPA "If the game is lost, we're all the same; no one left to place or take the blame; Will we leave this place an empty stone, or a shining ball of earth, we can call our home"