Xref: utzoo news.admin:14353 comp.sources.d:6960 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!emory!ox.com!ox.com!emv From: emv@ox.com (Ed Vielmetti) Newsgroups: news.admin,comp.sources.d Subject: Re: UK Copyright libraries and Usenet Message-ID: Date: 16 May 91 06:45:36 GMT References: <3amo22w164w@mantis.co.uk> <10508@skye.cs.ed.ac.uk> <4592.283113da@iccgcc.decnet.ab.com> <1991May16.050935.29882@newshost.anu.edu.au> Sender: usenet@ox.com (Usenet News Administrator) Followup-To: comp.sources.d Organization: OTA Limited Partnership, Ann Arbor MI. Lines: 90 In-Reply-To: cmf851@anu.oz.au's message of 16 May 91 05:09:35 GMT In article <1991May16.050935.29882@newshost.anu.edu.au> cmf851@anu.oz.au (Albert Langer) writes: Some library somewhere certainly OUGHT to be preserving archives of all USEnet news groups. Since they are serial publications like any other, perhaps certain libraries have a statutory obligation to do so (apart from NSA). In that case it may be worth drawing their attention to the ease with which their obligations could be carried out. (DAT tape is less than $10 per Gigabyte). If you wait another week, we can keep this discussion going in comp.archives.admin; send your votes in to kent@uunet.uu.net (Kent Landsfield). For now I'm redirecting the archivist perspective to comp.sources.d (as good a place as any under the circumstances). I'm reasonably confident that every news article that ever got to Toronto has been squirreled away on tape. You're right, it's cheap to store it, but read on.... The current informal arrangements for volunteer ftp sites to hold simple dumps of certain news groups and mailing lists without any indexing etc are quite inadequate. A library should do it properly with guarantees of permanent access and appropriate classification and indexing for retrieval. Libraries and paper-based archivists are generally ill at ease with providing efficient retrieval for huge amounts of full-text data. Given a million usenet articles, do you have a good way to sort through them? In fact the same applies to ftp archives generally. They are being administered on an unfunded basis by computer system administrators with no special library skills. -- Opinions disclaimed (Authoritative answer from opinion server) Header reply address wrong. Use cmf851@csc2.anu.edu.au No library school teaches you how to run an anonymous FTP site. I would argue that most of the things we consider "archive sites" are really much closer to "samizdat" houses or medieval pamphlet shops, run by a proprietor who has a certain charter in mind and a mission to collect together like-minded objects (software, text, nudie pix). The materials are uneven, mutable, of uncertain value, hard to catalog, and difficult to present effectively; you can't say "it's down on the bottom shelf, the binding is green, and the three or four books to the left of it are also good". There are on the order of 1000 anonymous ftp sites in the whole world; I'd estimate that means on the order of 1500-3000 individuals involved in the process of organizing, collecting, and maintaining ftp archives. The process is not entirely unfunded, to be fair; most of the raw network bandwidth overhead needed (in the USA, at least) is funded by the National Science Foundation, and many of the archive sites are run from systems which have been purchased in part by gov't money of one form or another. Other systems have been supported by the expectation of making money off the venture (uunet), software support for customers (apple), or archive services for internal non-connected networks (gatekeeper.dec.com). What has not been funded quite yet is the production of meta-information about where anonymous ftp sites are, what they carry, and methods of searching through them. It's patently absurd that the best way for someone in Colorado to find out about archives in California is that they send their queries to a system in Quebec Canada (archie), over completely saturated "wet pieces of string" that the Canadians have to pay for. The NSF should find a way to fund that effort or similar efforts. Be glad that someone needed a master's thesis, and that McGill had some idle equipment. In the same fashion, there is a huge full-text index of comp.archives postings somewhere in Canada, but it's not accessible to the world at large because there isn't funding for network bandwidth or server hardware to keep it running on the volume of queries we'd reasonably expect to see from it. For that matter, no one is funding the production of comp.archives yet -- I'm looking at it as a venture that has to be self-supporting (i.e. making money from user fees or provided under contract to some funding agency) within the next two years or I'll pack up shop and stop doing it. -- Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com "(6) The Plan shall identify how agencies and departments can collaborate to ... expand efforts to improve, document, and evaluate unclassified public-domain software developed by federally-funded researchers and other software, including federally-funded educational and training software; " "High-Performance Computing Act of 1991, S. 272"