Xref: utzoo bionet.molbio.genbank:74 news.software.nntp:459 Path: utzoo!utstat!helios.physics.utoronto.ca!jarvis.csri.toronto.edu!utgpu!watserv1!watmath!uunet!genbank!bionet!ig!benton From: benton@presto.IG.COM (David Benton) Newsgroups: bionet.molbio.genbank,news.software.nntp Subject: Re: Distributing GenBank over the Internet Message-ID: Date: 12 Dec 89 06:42:25 GMT References: <1989Dec7.213027.8591@phri.nyu.edu> <1364@uvm-gen.UUCP> Followup-To: bionet.molbio.genbank Organization: IntelliGenetics, Inc. Lines: 66 I won't reply to all the points made in two recent postings (from Stephen Cavrak and Roy Smith) on this topic, since I take no exception with most of them. I will try to fill in some gaps as to what GenBank is doing to distribute the database and what we have planned. > Generally this is a good idea and makes a lot of sense --- especially > if the database could be broken up to small pieces. The thought of > redistributing ALL of the database 4 times a year to ALL of the > subscribers should cause someones teeth to grind, however. The main reason I see for distributing the data by network is so we can increase the frequency from 4 times to 52 times a year. I don't know how many sites will actually want to FTP the entire quarterly release (compressed), but it's not much extra work for us to provide it, so why begrudge those who want it that way. I might just point out that neither IntelliGenetics nor the NIH makes any money from the distribution of GenBank on mag tapes or floppy diskettes, so we certainly have no objection to anyone getting the data off the net. > The suggestion to distribute information per demand makes more sense > interms of lowering the network traffic, but then how would the > individual user know her copy of a database were "up to date"? The > "news" model would almost be essential to this point. In the simple system now operating, she knows by virtue of the file and directory names she FTP'ed. This, of course, puts the burden of asking for the data on the recipient. > Taking the sugestion one step further, why "distribute" the database at > all ? Why not pursue a "server" model where queries against the > database could be directed to one (or several) "database servers". > > The other alternative is to just publish the database on CD-ROM and > distribute it that way. EMBL has distributed at least one release on CD ROM and we plan to start quarterly releases of GenBank on CD ROM in the spring of '90. Our main reason is to furnish the data to the large number of users who are now ordering GenBank on floppy diskettes. GenBank Rel 61 (Sep 1989) required 98 360-kb floppies (and the floppy disk format entries are stripped of comments and reference titles). > Just where the crossover point is today might be very interesting to > calculate. Just how many installations out there receive copies of > the data ? GenBank ships about 125 copies of each quarterly release on magnetic tape (3 file formats, 3 media types) and about 350 copies of each semi-annual release on floppy diskette (1 file format on XT, AT, and Mac disks). Some of the mag tape recipients are secondary distributors of the data (usually reformatted in some way). From the information they have sent us, it appears that something like 450 additional copies of the data are sent to individuals who do not get the data directly from GenBank. (Part 2 to follow.) Sincerely, David Benton GenBank Manager 415-962-7360 benton@genbank.ig.com