Xref: utzoo news.admin:14353 comp.sources.d:6960
Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!emory!ox.com!ox.com!emv
From: emv@ox.com (Ed Vielmetti)
Newsgroups: news.admin,comp.sources.d
Subject: Re: UK Copyright libraries and Usenet
Message-ID: <EMV.91May16024531@poe.aa.ox.com>
Date: 16 May 91 06:45:36 GMT
References: <3amo22w164w@mantis.co.uk> <10508@skye.cs.ed.ac.uk>
	<4592.283113da@iccgcc.decnet.ab.com>
	<1991May16.050935.29882@newshost.anu.edu.au>
Sender: usenet@ox.com (Usenet News Administrator)
Followup-To: comp.sources.d
Organization: OTA Limited Partnership, Ann Arbor MI.
Lines: 90
In-Reply-To: cmf851@anu.oz.au's message of 16 May 91 05:09:35 GMT

In article <1991May16.050935.29882@newshost.anu.edu.au> cmf851@anu.oz.au (Albert Langer) writes:

   Some library somewhere certainly OUGHT to be preserving archives of
   all USEnet news groups. Since they are serial publications like any other,
   perhaps certain libraries have a statutory obligation to do so (apart 
   from NSA). In that case it may be worth drawing their attention to
   the ease with which their obligations could be carried out. (DAT tape
   is less than $10 per Gigabyte).

If you wait another week, we can keep this discussion going in
comp.archives.admin; send your votes in to kent@uunet.uu.net (Kent
Landsfield).   For now I'm redirecting the archivist perspective to
comp.sources.d (as good a place as any under the circumstances).

I'm reasonably confident that every news article that ever got to
Toronto has been squirreled away on tape.  You're right, it's cheap to
store it, but read on....

   The current informal arrangements for volunteer ftp sites to hold
   simple dumps of certain news groups and mailing lists without any 
   indexing etc are quite inadequate. A library should do it properly 
   with guarantees of permanent access and appropriate classification 
   and indexing for retrieval.

Libraries and paper-based archivists are generally ill at ease with
providing efficient retrieval for huge amounts of full-text data.
Given a million usenet articles, do you have a good way to sort
through them?  

   In fact the same applies to ftp archives generally. They are being
   administered on an unfunded basis by computer system administrators
   with no special library skills.
   --
   Opinions disclaimed (Authoritative answer from opinion server)
   Header reply address wrong. Use cmf851@csc2.anu.edu.au


No library school teaches you how to run an anonymous FTP site.  I
would argue that most of the things we consider "archive sites" are
really much closer to "samizdat" houses or medieval pamphlet shops,
run by a proprietor who has a certain charter in mind and a mission to
collect together like-minded objects (software, text, nudie pix).  The
materials are uneven, mutable, of uncertain value, hard to catalog,
and difficult to present effectively; you can't say "it's down on the
bottom shelf, the binding is green, and the three or four books to the
left of it are also good".

There are on the order of 1000 anonymous ftp sites in the whole world;
I'd estimate that means on the order of 1500-3000 individuals involved
in the process of organizing, collecting, and maintaining ftp
archives.  

The process is not entirely unfunded, to be fair; most of the raw
network bandwidth overhead needed (in the USA, at least) is funded by
the National Science Foundation, and many of the archive sites are run
from systems which have been purchased in part by gov't money of one
form or another.  Other systems have been supported by the expectation
of making money off the venture (uunet), software support for
customers (apple), or archive services for internal non-connected
networks (gatekeeper.dec.com).

What has not been funded quite yet is the production of
meta-information about where anonymous ftp sites are, what they carry,
and methods of searching through them.  It's patently absurd that the
best way for someone in Colorado to find out about archives in
California is that they send their queries to a system in Quebec
Canada (archie), over completely saturated "wet pieces of string" that
the Canadians have to pay for.  The NSF should find a way to fund that
effort or similar efforts.  Be glad that someone needed a master's
thesis, and that McGill had some idle equipment.

In the same fashion, there is a huge full-text index of comp.archives
postings somewhere in Canada, but it's not accessible to the world at
large because there isn't funding for network bandwidth or server
hardware to keep it running on the volume of queries we'd reasonably
expect to see from it.  For that matter, no one is funding the
production of comp.archives yet -- I'm looking at it as a venture that
has to be self-supporting (i.e. making money from user fees or
provided under contract to some funding agency) within the next two
years or I'll pack up shop and stop doing it.  

-- 
Edward Vielmetti, vice president for research, MSEN Inc.  emv@msen.com

"(6) The Plan shall identify how agencies and departments can
collaborate to ... expand efforts to improve, document, and evaluate
unclassified public-domain software developed by federally-funded
researchers and other software, including federally-funded educational
and training software; "
			"High-Performance Computing Act of 1991, S. 272"