Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!hao!ames!ucbcad!ucbvax!cartan!jiff.berkeley.edu!greg
From: greg@jiff.berkeley.edu (Greg)
Newsgroups: news.admin,news.misc
Subject: Statistics by article
Message-ID: <1393@cartan.Berkeley.EDU>
Date: Mon, 9-Nov-87 19:49:47 EST
Article-I.D.: cartan.1393
Posted: Mon Nov  9 19:49:47 1987
Date-Received: Wed, 11-Nov-87 21:36:39 EST
Sender: nobody@cartan.Berkeley.EDU
Reply-To: greg@jiff.berkeley.edu (Greg)
Organization: U. C. Berkeley
Lines: 57
Summary: Addressing technical issues.
Xref: mnetor news.admin:1369 news.misc:1115

Greg (greg@maypo.berkeley.edu) writes:
> rn would
> monitor which articles each reader actually reads, and how long each
> reader spends on those articles.

Mark Brader writes:
>The basic problem is that you can't tell, just because the image
>remains on the screen for a long time, that the reader is really
>reading that article.

A good point.  I propose some reasonable cutoff time, like five
minutes, i.e. if a reader spend two hours on one article, five minutes
is averaged in instead.  There will still be some error from readers
walking away from their terminals, performing shell escapes, and so
on.  I figure that the error will be roughly randomly distributed among
articles in proportion to their popularity anyway; to some extent this
variation will merely add a constant factor to the "true" statistics.

It may also be more appropriate to report the median reading time
rather than the mean.

>In addition, readers may not want such monitoring [because of invasion
>of privacy].

I find it difficult to fret over the fate of a list of numbers about me,
without my name attached, that are promptly averaged in with thousands
of other such numbers.  I think most readers are the same way.  The few
who object are free to turn off the feature.  I'd prefer the voting power
to privacy.  In any case, as with the Arbitron ratings, the Nielsens need
not poll EVERY user on the net.

>Besides that, admins might not care for the extra system load...

Chuq Von Rospach also brought up the load issue, in the context of network
load rather than system load:

>If you figure 7,000
>people read usenet once a day (very low numbers! VERY low numbers) and the
>package is 1,000 bytes, the receiving end needs to handle seven megabytes of
>data a day. And these numbers are [ridiculous underestimates]...
>It's a very nice idea. But from a technical point of view, it is a cure much
>worse than the disease.

I don't see why such an expensive scheme is necessary.  In my scheme
the network load for trading statistics is necessarily much smaller
than the load of the news groups themselves.  A news feed in a local
network doles out articles to rn programs running on various hosts; the
rn programs would reply with the stats for their news sessions.  The news
feed would then compress the statistics on its own by taking local
averages and sums.  Every week or so the Nielsen program would collect
data from all of the news feed hosts.  The data on all of the
articles would be in the same report; there would be about 20-40 bytes
of data per article.  Estimating that the articles themselves are 1K
long on average, the Nielsens would be only a small fraction of both
the global network load and the local system load.
--
Greg