Newsgroups: comp.archives.admin
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!munnari.oz.au!manuel!cmf851
From: cmf851@anu.oz.au (Albert Langer)
Subject: Re: building an interstate (data) highway with no roadmaps
Message-ID: <1991Jun24.202941.21411@newshost.anu.edu.au>
Sender: news@newshost.anu.edu.au
Organization: Computer Services Centre, Australian National University, Canberra, Australia.
References: <2013@uqcspe.cs.uq.oz.au> <89gs5jr@Unify.Com> <11900.Jun2322.59.2491@kramden.acf.nyu.edu>
Date: Mon, 24 Jun 91 20:29:41 GMT

In article <11900.Jun2322.59.2491@kramden.acf.nyu.edu> 
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>I think the Mathematics Subject Classification model would apply quite
>well to archived files (and netnews!). 

Sounds like a useful model to start from - especially:

1. Use of more than one level.
2. Codes defined by a central authority.
3. Assignment of primary and any number of secondary codes.

I doubt that there will be much success with self-assignment by
authors of software packages since unlike mathematicians they are
not used to relying on literature searches for prior art anyway.

However there is no way to find out how viable that fourth feature of
the maths system is until we have the codes assigned by a central
authority. If it also turns out to be viable fine, otherwise I propose
the "cooperative cataloging" model used by libraries - i.e. the first
major archive site that stocks the package does the classifying and
others copy - that distributes the work among people who understand
the classification scheme, even though not as widely as by distributing
it to authors as well. (Once it has caught on, and people actually
USE the catalog classifications, one could THEN hope for some
self-cataloging by authors.)

By "major archive site" I really mean "cataloging site" - i.e. one
that is willing to do far more than the typical ftp site in actually 
maintaining organized cataloging information. This need not actually
be a site that has disk space available on the internet, though
considering that disk space is now only $2 per MB I don't see why
not. Another set of possible catalogers are the moderators and indexers of
the *sources* groups. (There was some discussion re a classification
scheme in comp.sources.d recently).

>Of course, the MSC (which is available for anonymous ftp on
>e-math.ams.com as mathrev/asciiclass.new) wouldn't apply directly to
>software; we'd have to draft a whole new set of categories. But the
>model will work.

As well as new categories I think we would have to add quite a lot
of features to the model e.g.

1. Version numbers. For whole and component parts.
2. *sources* message-id/subject headings/archive names
3. file sizes for source and object code software, docs, test and other data,
abstracts (README, HISTORY etc) and various combinations, with "standard" 
filenames.
4. refinement of 3 to include postscript/dvi and "source" forms of 
documentation, compressed and uncompressed versions with various
packaging methods etc.
5. Patches and what they apply to and result in.
6. Languages used (perhaps merely one of many classifications, but
could add file sizes and numbers for each).
7. Pre-requisite software. (Not a classification but a reference to
other cataloged packages with specific version numbers).
8. Pre-requisite hardware.
9. Release status. (alpha, beta, gamma etc)
10. Copyright information. (Whether "freely available" etc)
11. Systems tested on.
12. Systems it is believed to work on.
13. Systems it is believed not to work on.

Only the most important information need be provided initially, but
it should be possible to add other stuff including even review comments
or pointers to discussion in newsgroups. This could be provided for
at the same time as setting up system for cooperative cataloging since
coop cataloging implies being able to take an existing or non-existant
catalog record and add to it and have that then available for others
to use or add to. Adding "review comments" would be particularly useful.

It still strikes me that libraries are the institutions that should be
doing this. One thing though, if they aren't prepared to take it on yet,
perhaps they could make available the software used at no charge? There
are some very powerful systems in use for cooperative cataloging and
MARC records that cover everything from audio tapes to maps are just
as complex as anything we will need for software packages.

How about just submitting a couple of packages as "publisher" to the LC and
ask for the "Cataloging In Publication" data to be returned overnight
as is done for book manuscripts. Should produce some discussion :-).

U.S. copyright law clearly defines computer programs as "literary works"
and I can't see anybody claiming that something like "c news" 
or X windows is "merely ephemeral" so I guess they would HAVE to
catalog it.

The Library of Congress IS on the internet (loc.gov) - but if they
won't accept submissions by email or ftp somebody could just startup
a "publisher" to issue a series of tapes and diskettes for physical
delivery to them with each volume a separate monograph (not part of a 
single serial) containing one software package.

I'm quite serious about this, proper cataloging DOES cost about $200 per item
and it IS THEIR JOB. We should just be helping with specialist advice.

P.S. For anyone wanting to follow up - I just don't have time - a
contact at the LC is:

Sally H. McCallum, Chief
Network Development and MARC Standards Office
Library of Congress
smcc@seq1.loc.gov
(202) 707-6273

--
Opinions disclaimed (Authoritative answer from opinion server)
Header reply address wrong. Use cmf851@csc2.anu.edu.au