Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp Path: utzoo!linus!decvax!decwrl!sun!chuq From: chuq@sun.uucp (Chuq Von Rospach) Newsgroups: net.news Subject: Re: Statistics, polls: honest, no flames Message-ID: <3389@sun.uucp> Date: Fri, 21-Mar-86 16:53:01 EST Article-I.D.: sun.3389 Posted: Fri Mar 21 16:53:01 1986 Date-Received: Sun, 30-Mar-86 08:09:22 EST References: <2015@hao.UUCP> Distribution: net Organization: Third Person, Omniscient Lines: 97 > ... and that people are so blind to the inaccuracies (very well > explained by Lauren, so I have no need to repeat them) that they > are ready to start using these results to determine what groups we > keep and which we don't. Lauren's article (as rebutted by Brian) was a LOT more innacurate than the statistics he attempted to discredit. Yes, I'm MORE than ready to use the results of the statistics to try to streamline the net so that it will benefit the majority of the readers. This, of course, has to be distressing to people in the groups with exceptionally high volume and very few readers, since what Brian has really done is blow away the USENET attitudes regarding volume and utility -- there is now REAL evidence that volume and readership are completely unconnected, and we can track down (and potentially eliminate) the ego-based write mostly groups. >I think > the data are quite enlightening; that so many want mod.movies, > for example, suggests that maybe, just maybe, there is more > support for moderated groups than the public discussions on the > subject would tend to show. But this is a very GENERAL conclusion. Actually, I think this conclusion is incorrect. In many cases the mod groups have so little volume that people haven't gotten around to unsubscribing to it yet. > What scares ME is the possibility of axing certain groups based > solely on these results, like 'hey, look, net.blah has the highest > cost per reader, let's get rid of it'. This may seem silly, but I think that it is logical for streamlining of the net to be done by getting rid of the high volume/low readership groups -- the most affect for the least netwide trauma (except to the people who like to hear themselves type). > but he also > claims that his survey is exempt from the 'self-select sample' effects > brought up by Lauren. I do not agree with that assesment. There are > other self-select factors that everyone is ignoring. I didn't realize you were trained in statistics. How would you recommend improving the data then? No offense intended, but I prefer to listen to the people trained in the discipline... Now, before people accuse me of being too hard on Greg, let me make a few points. I'm not bitching directly at Greg on this, but at some attitudes that happen to be in his posting that seem to be generic on the net: First, Brian's stats are showing some real fallacies in the way things are done on Usenet. One is the assumption that volume == utility, which is being shown to be definitely not true. In many groups, a few very vociferous users can completely overwhelm the rest of the readership. Second, there is the implied 'it isn't good for me, so we can't do it'. Eventually we're going to have to make decisions about what the net is really here for, as volume and costs continue to rise. The LOGICAL thing is to streamline that which affects the least users, which is difficult to do currently because we've never before known who is reading thins -- only who is writing. We can now change that. Third, there is a consistent problem on the net because people say things like "I disagree with this and so it is wrong". Well, Brian knows a LOT about statistics. He has access to some of the best statisticians in the world at Stanford, and he's put a LOT of work into convincing himself that these stats are valid. Unless you know stats as well as him and know what he has really done, I can't think of a way in the world that you could convince me that he is wrong, especially when (as is typical on the net) you have NO facts to back your assertions. -- The ONLY problem I have with Brian's stats is the amount of work it takes (on a net-wide basis) to implement. I don't think they are practical to use on a regular basis as a way of making decisions on the net. They are definitely useful for occasionally figuring out what is going on out there, though, and I'd love to see them run every six months or so. I do think we need a new measure of group utility. Previously that measure has been total volume. I suggest we consider using total volume divided by the number of DIFFERENT posters over a given time. This could be implemented easily as part of the newslist data at seismo, and will give us a good ratio of total interest, assuming you believe that 1 megabyte posted by 20 people is more useful than 1 megabyte posted by three people. There are some groups where this breaks down (especially *.sources* and net.jokes, I would guess) but would be that total number of posters would be a good measure of the total number of readers. Comments? chuq -- :From catacombs of a past participle: Chuq Von Rospach chuqi%plaid@sun.ARPA FidoNet: 125/84 CompuServe: 73317,635 {decwrl,decvax,hplabs,ihnp4,pyramid,seismo,ucbvax}!sun!plaid!chuq I used to really worry about splitting my infinitives until I realized that most people had never heard of them.