Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp
Path: utzoo!linus!decvax!decwrl!sun!chuq
From: chuq@sun.uucp (Chuq Von Rospach)
Newsgroups: net.news
Subject: Re: Statistics, polls: honest, no flames
Message-ID: <3389@sun.uucp>
Date: Fri, 21-Mar-86 16:53:01 EST
Article-I.D.: sun.3389
Posted: Fri Mar 21 16:53:01 1986
Date-Received: Sun, 30-Mar-86 08:09:22 EST
References: <2015@hao.UUCP>
Distribution: net
Organization: Third Person, Omniscient
Lines: 97

>   ... and that people are so blind to the inaccuracies (very well
>   explained by Lauren, so I have no need to repeat them) that they
>   are ready to start using these results to determine what groups we
>   keep and which we don't.

Lauren's article (as rebutted by Brian) was a LOT more innacurate than
the statistics he attempted to discredit. Yes, I'm MORE than ready to use
the results of the statistics to try to streamline the net so that it will
benefit the majority of the readers. This, of course, has to be distressing
to people in the groups with exceptionally high volume and very few readers,
since what Brian has really done is blow away the USENET attitudes regarding
volume and utility -- there is now REAL evidence that volume and readership
are completely unconnected, and we can track down (and potentially
eliminate) the ego-based write mostly groups.


>I think
>   the data are quite enlightening; that so many want mod.movies,
>   for example, suggests that maybe, just maybe, there is more
>   support for moderated groups than the public discussions on the
>   subject would tend to show. But this is a very GENERAL conclusion.

Actually, I think this conclusion is incorrect. In many cases the mod groups
have so little volume that people haven't gotten around to unsubscribing
to it yet.

>   What scares ME is the possibility of axing certain groups based
>   solely on these results, like 'hey, look, net.blah has the highest
>   cost per reader, let's get rid of it'.

This may seem silly, but I think that it is logical for streamlining 
of the net to be done by getting rid of the high volume/low readership
groups -- the most affect for the least netwide trauma (except to the people
who like to hear themselves type).

>    but he also 
>   claims that his survey is exempt from the 'self-select sample' effects
>   brought up by Lauren. I do not agree with that assesment. There are
>   other self-select factors that everyone is ignoring.

I didn't realize you were trained in statistics. How would you recommend
improving the data then? No offense intended, but I prefer to listen to the
people trained in the discipline...

Now, before people accuse me of being too hard on Greg, let me make a few
points. I'm not bitching directly at Greg on this, but at some attitudes
that happen to be in his posting that seem to be generic on the net:

First, Brian's stats are showing some real fallacies in the way things are
done on Usenet. One is the assumption that volume == utility, which is
being shown to be definitely not true. In many groups, a few very
vociferous users can completely overwhelm the rest of the readership.

Second, there is the implied 'it isn't good for me, so we can't do it'.
Eventually we're going to have to make decisions about what the net is
really here for, as volume and costs continue to rise. The LOGICAL thing is
to streamline that which affects the least users, which is difficult to do
currently because we've never before known who is reading thins -- only who
is writing. We can now change that.

Third, there is a consistent problem on the net because people say things
like "I disagree with this and so it is wrong". Well, Brian knows a LOT
about statistics. He has access to some of the best statisticians in the
world at Stanford, and he's put a LOT of work into convincing himself that
these stats are valid. Unless you know stats as well as him and know what he
has really done, I can't think of a way in the world that you could convince
me that he is wrong, especially when (as is typical on the net) you have NO
facts to back your assertions.

--
The ONLY problem I have with Brian's stats is the amount of work it takes
(on a net-wide basis) to implement. I don't think they are practical to 
use on a regular basis as a way of making decisions on the net. They are
definitely useful for occasionally figuring out what is going on out there,
though, and I'd love to see them run every six months or so.

I do think we need a new measure of group utility. Previously that measure
has been total volume. I suggest we consider using total volume divided by
the number of DIFFERENT posters over a given time. This could be implemented
easily as part of the newslist data at seismo, and will give us a good ratio
of total interest, assuming you believe that 1 megabyte posted by 20 people
is more useful than 1 megabyte posted by three people. There are some groups
where this breaks down (especially *.sources* and net.jokes, I would guess)
but would be that total number of posters would be a good measure of the 
total number of readers. Comments?

chuq


-- 
:From catacombs of a past participle:   Chuq Von Rospach 
chuqi%plaid@sun.ARPA			FidoNet: 125/84
CompuServe: 73317,635
{decwrl,decvax,hplabs,ihnp4,pyramid,seismo,ucbvax}!sun!plaid!chuq

I used to really worry about splitting my infinitives until I realized
that most people had never heard of them.