Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version VT1.00C 11/1/84; site vortex.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!gamma!epsilon!zeta!sabre!petrus!bellcore!vortex!lauren
From: lauren@vortex.UUCP (Lauren Weinstein)
Newsgroups: net.news,net.news.group
Subject: polling and statistics
Message-ID: <896@vortex.UUCP>
Date: Tue, 18-Mar-86 13:45:57 EST
Article-I.D.: vortex.896
Posted: Tue Mar 18 13:45:57 1986
Date-Received: Wed, 19-Mar-86 04:08:58 EST
References: <1953@saber.UUCP>
Organization: Vortex Technology, Los Angeles
Lines: 62
Xref: watmath net.news:4680 net.news.group:5242

Just a technical point about polling.  Comparing Usenet polls of this
sort to Gallup, Nielson (they're the ones with the boxes on the TV's,
not Arbitron), etc. is inappropriate.  The polling companies use VERY
carefully selected samples to enable small sample populations to
represent larger total populations.  The sort of things we see on the net
are what would be termed "self-selected" polls, where anyone can
participate or not by their own choice.  While the results of such self-select
polls (similar to 900 telephone polls, when you think about it) may
be interesting, they normally will not be applicable in a statistical
sense to larger populations.  In other words, self-select polls tell
you what the people voting thought or say they did.
They don't say much about what the people who DIDN'T vote
are thinking or doing--they simply do not have much statistical validity 
for generalization to larger populations.

They can still be fun, though--900 numbers sure are popular.

There are some definite "traps" in self-select polling.  For example,
doing consistency checks (looking for "uniform" sorts of responses)
across a self-select population may be misleading, since the self-select
population may be selecting themselves due to unknowable (to the polling
entity) factors.  To put it bluntly, any conclusions drawn from a self-select
population must be considered at least partially suspect when
it comes to generalization.

As a statistics professor I once knew used to say, "If you didn't pick
the sample population yourself, according to valid statistical criteria,
nothing you do later [with a self-select population] can validate the
sample for generalization."

A classic case involves 900 polls.  It was found from some statistics that
people responding to dial-in 900 polls tended to have incomes above a certain
level.  Consistency checks on the data indicated that this was true across
the entire populaton of callers, regardless of geographic region.  Of course,
to presume that this means that the U.S. population as a whole has that sort
of income would be incorrect--it simply means that the people who CHOSE
to CALL generally fit the higher income category.

As for the glacier polls, I think they *are* interesting, but I would not
want to see any important decisions based on the information so
gleaned.  In particular, I feel that the cost factors could be incorrect
by large amounts (not necessarily are, but *could* be) since
nobody on the network really has even an approximate handle on the true 
cost factors involved in netnews transmissions.  There are only guesses.
The group readership info, on the other hand, is of a less critical 
nature and can probably be taken as being indicative of the
reading habits of certain segments of the Usenet population--not
the entire population, but certain segments.

While I most certainly do *not* suggest that this book has anything
whatever to do with the glacier poll, I might recommend that persons
unfamiliar with some of the fine points of statistics and polling
read the classic book, "Lying With Statistics."  It is pretty much
required reading for anyone who wants to be able to interpret many
of the "statistics" we find quoted by the mass media these days.

Once again, I am definitely *not* suggesting that there is any lying or 
planned bias of any kind in the glacier poll.  I only bring up 
this book since it *is* good reading for people interested in
statistics in general.

--Lauren--