Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!zaphod.mps.ohio-state.edu!math.lsa.umich.edu!math.lsa.umich.edu!emv From: emv@math.lsa.umich.edu (Edward Vielmetti) Newsgroups: alt.sources.d Subject: Re: Archive-name Message-ID: Date: 22 Aug 90 02:39:22 GMT References: <1990Aug13.162617.25478@cbnews.att.com> <9008211455.AA02499@talos.pm.com> <193@n4hgf.Mt-Park.GA.US> Sender: usenet@math.lsa.umich.edu Organization: University of Michigan Math Dept., Ann Arbor MI. Lines: 75 In-Reply-To: wht@n4hgf.Mt-Park.GA.US's message of 21 Aug 90 22:25:04 GMT In article <193@n4hgf.Mt-Park.GA.US> wht@n4hgf.Mt-Park.GA.US (Warren Tucker) writes: Arkive-Nombre: diatribe-blabber/part01 >Also, how are users supposed to know what's a good name to put in >the Archive-name header? It is Very Handy when you are looking for a program named 'foo,' say, and you do not know that it was posted in Volume 4, Issues 12-14, patched months later in Volume 6, Issue 5 and patched again months later in Volume 7, Issue 10. Instead, you just need look up 'foo' to find: foo/part01 foo/part02 foo/part03 foo/patch01 foo/patch02 Well...there's a problem here, one which I understand librarians refer to as "authority control". Say you are looking for a program named "shar", which I understand is a very popular name for people to give to their programs. You think that your program is the One True Shar, but other people differ. The alt.sources archivist(s) have to make that decision, one way or another. One reasonable solution that has been used in the Gnu Emacs Lisp library collection is to prefix the name of the package with the author's name, so it would be wht-foo/part01 wht-foo/part02 wht-foo/part03 ... to disambiguate between authors. If a separate posting comes around with header information, it might even be sensible to override the author's ill-advised Archive-name choice with a better one. Even worse, you might forget or not have easy access to all of the various Archive-name headers that people have used throughout the course of the group, and thus give yourself the opportunity for accidental collisions. Another substantial problem with alt.sources is version control. The system is explicitly designed (hm, seems to have worked out to be) to let people post multiple revisions of a package in quick succession. Not all authors are equally conscientious about keeping version information around. My hack for this for comp.archives is to use the date as the version string, so for one-part stuff it might look like wht-foo/21-Aug-90 which is OK unless you get two in the same day or a multipart posting in. Alt.sources gets a fair amount of stuff, and it's pretty diverse; comp.archives even more so. As a result a naive application of Archive-name: as a file name to store the article in is going to break down as soon as your directory starts to fill up with 100's of entries, or 1000s even. So you need to split the archive into volumes, either one a year, quarter, or month depending on traffic So these files would be kept in e.g. /usenet/alt.sources/vol.90.3Q/wht-foo/21-Aug-90.Z which still lets you grep on foo or do ls /usenet/alt.sources/*/*foo* to find things. I don't know that Archive-Name is the be all and end all of things. Certainly if you could extract the README and other internal documentation and make it accessable for a full-text search, you'd enable even more arbitrary and complex searches. Similarly, if the author information were easily visible, you could search for things like "all programs written by wht"; that's not easy to do in any of the indexes of usenet archives that I'm aware of. Archive-name does have the very nice property of being a sensible way to store postings, one file per article, with reasonable grouping and meaningful file names. And popular archiving software understands it. Fertile field for research and standardization, perhaps. I understand that the ANSI committee Z39.67 on "computer software description" has a draft available, $25 to NISO, PO Box 1056, Bethesda MD, but that this document concerns itself mostly with descriptions and cataloging of shrink-wrapped commercial software and not the less orderly stuff that flows through usenet. (Haven't read it myself.) --Ed Edward Vielmetti, U of Michigan math dept moderator, comp.archives