Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.3 (USS@Tek, v1.1) based on 4.3bsd-beta 6/6/85; site zeus.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!bellcore!decvax!decwrl!pyramid!hplabs!tektronix!teklds!zeus!bobr
From: bobr@zeus.UUCP (Robert Reed)
Newsgroups: net.news,net.news.group
Subject: Re: net readership poll (discussion)
Message-ID: <89@zeus.UUCP>
Date: Fri, 28-Mar-86 17:54:13 EST
Article-I.D.: zeus.89
Posted: Fri Mar 28 17:54:13 1986
Date-Received: Tue, 1-Apr-86 05:22:27 EST
References: <5192@glacier.ARPA> <1994@hao.UUCP> <5249@glacier.ARPA>
Reply-To: bobr@zeus.UUCP (Robert Reed)
Organization: CAE Systems Division, Tektronix, Inc., Beaverton, OR.
Lines: 35
Xref: watmath net.news:4723 net.news.group:5313

> Some groups have very low volume, such that it is possible for no articles
> to be current in the group when the survey is run.  If that were the case,
> the survey would show no readers when in fact many people may read the
> group.
>	David Eppstein, eppstein@cs.columbia.edu, seismo!columbia!cs!eppstein

> What effect, if any, is there from hosts that do not permit access to all
> newsgroups? 
>	wmartin@brl-smoke.ARPA (Will Martin )

These are both valid concerns if the number of individual samples is small,
but as the sample size increases, both of these anomalies will get lost in
the noise.  The major problems in such a survey are:

	1.  The possibility of error in the collection mechanism.  For
	example, if there was a bug in the posted arbitron script, such that
	every site which reported had intrinsic and random errors in the
	reported data.  A mere systematic error (i.e., consistently
	reporting half the number of readers in each site report), if
	detected, could be accounted for and weighted out of the sample.

	One possible source for this kind of error exists in the nature of
	the responders.  If arbitron is run without root priviledges,
	readers whose home directories or .newsrc files are protected are
	counted as users but not readers.  Similarly, sites which have a set
	of machines, with accounts for all users but with prefered home
	machines for partitions of this user set, will have a similar skew
	in the user/readership ratio.  But in either of these cases the
	effect is systemic, reducing the percentages with without skewing
	towards any particular newsgroup.

	2.  Lack of sample size.  Most of the complaints about the
	readership poll have been concerns about skew from set of
	samples which do not reflect the interests of the complaintant.
	The easiest fix is to increase the sample set size.