Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version VT1.00C 11/1/84; site vortex.UUCP Path: utzoo!watmath!clyde!burl!ulysses!gamma!epsilon!zeta!sabre!petrus!bellcore!vortex!lauren From: lauren@vortex.UUCP (Lauren Weinstein) Newsgroups: net.news,net.news.group Subject: polling and statistics Message-ID: <896@vortex.UUCP> Date: Tue, 18-Mar-86 13:45:57 EST Article-I.D.: vortex.896 Posted: Tue Mar 18 13:45:57 1986 Date-Received: Wed, 19-Mar-86 04:08:58 EST References: <1953@saber.UUCP> Organization: Vortex Technology, Los Angeles Lines: 62 Xref: watmath net.news:4680 net.news.group:5242 Just a technical point about polling. Comparing Usenet polls of this sort to Gallup, Nielson (they're the ones with the boxes on the TV's, not Arbitron), etc. is inappropriate. The polling companies use VERY carefully selected samples to enable small sample populations to represent larger total populations. The sort of things we see on the net are what would be termed "self-selected" polls, where anyone can participate or not by their own choice. While the results of such self-select polls (similar to 900 telephone polls, when you think about it) may be interesting, they normally will not be applicable in a statistical sense to larger populations. In other words, self-select polls tell you what the people voting thought or say they did. They don't say much about what the people who DIDN'T vote are thinking or doing--they simply do not have much statistical validity for generalization to larger populations. They can still be fun, though--900 numbers sure are popular. There are some definite "traps" in self-select polling. For example, doing consistency checks (looking for "uniform" sorts of responses) across a self-select population may be misleading, since the self-select population may be selecting themselves due to unknowable (to the polling entity) factors. To put it bluntly, any conclusions drawn from a self-select population must be considered at least partially suspect when it comes to generalization. As a statistics professor I once knew used to say, "If you didn't pick the sample population yourself, according to valid statistical criteria, nothing you do later [with a self-select population] can validate the sample for generalization." A classic case involves 900 polls. It was found from some statistics that people responding to dial-in 900 polls tended to have incomes above a certain level. Consistency checks on the data indicated that this was true across the entire populaton of callers, regardless of geographic region. Of course, to presume that this means that the U.S. population as a whole has that sort of income would be incorrect--it simply means that the people who CHOSE to CALL generally fit the higher income category. As for the glacier polls, I think they *are* interesting, but I would not want to see any important decisions based on the information so gleaned. In particular, I feel that the cost factors could be incorrect by large amounts (not necessarily are, but *could* be) since nobody on the network really has even an approximate handle on the true cost factors involved in netnews transmissions. There are only guesses. The group readership info, on the other hand, is of a less critical nature and can probably be taken as being indicative of the reading habits of certain segments of the Usenet population--not the entire population, but certain segments. While I most certainly do *not* suggest that this book has anything whatever to do with the glacier poll, I might recommend that persons unfamiliar with some of the fine points of statistics and polling read the classic book, "Lying With Statistics." It is pretty much required reading for anyone who wants to be able to interpret many of the "statistics" we find quoted by the mass media these days. Once again, I am definitely *not* suggesting that there is any lying or planned bias of any kind in the glacier poll. I only bring up this book since it *is* good reading for people interested in statistics in general. --Lauren--