Path: utzoo!attcan!utgpu!watserv1!watmath!uunet!genbank!bionet!ig!benton From: benton@presto.IG.COM (David Benton) Newsgroups: bionet.molbio.genbank Subject: Re: Distributing GenBank over the Internet Message-ID: Date: 12 Dec 89 07:31:38 GMT References: <1989Dec7.213027.8591@phri.nyu.edu> <1364@uvm-gen.UUCP> <1989Dec11.160609.5436@phri.nyu.edu> Organization: IntelliGenetics, Inc. Lines: 73 > CD-ROM is nice, but doesn't really solve the problems that tape has. >You still have to get a physical object from point A to point B, and you >still have to produce those objects. How long does it take to press CD's >compared to the time it takes to cut tapes? Also, from what I know of CD's, >they are much slower than magnetic hard disks. Also, I'm not sure that >CD-ROM is really practical yet. Maybe in a couple of years, but it's still >pretty much of a specialty item today. No argument about getting the physical object from point A to point B, but Release 62 (in the works) is going to require 5 reels @ 1600bpi (yes, there are sites that want GenBank and cannot receive 6250) and Rel 63 (March, the next release on floppies) will probably require something like 125 360-kb floppies. Since we are distributing more that 100 copies in the 360-kb density, I don't think we can morally stop the pain of mastering, packaging, and shipping those floppy releases until we provide a viable alternative. (And even if our morals would let us, the GenBank Project Officer would remind us of our contractual obligation to release on floppies. We're hoping the floppy release (at least on the LD disks) will die a natural death after the CD ROMs are available.) I'll be able to answer your questions about actual production of the CD's after we've had the experience, but the CD ROM pressing operations are quoting prices for 1, 3, and 5 day turn-around times. The big difference between CD's and mag tapes is that the time to create one release is really independent of both the ammount of data in the release and the number of copies being produced. Right now, producing one copy of the mag tape release at 1600 bpi takes over an hour (even assuming the operator is waiting to remove the reel as soon as it rewinds). It is more economical to produce CD ROMs than to spin tapes now and the cost differential will only increase as the size of the database increases. Since the GenBank contract requires that the incremental costs of distributing the data be recovered from the users, the price of a GenBank release must go up every time the size of the database requires another reel of tape for the release or more floppies (every release for floppies). Even though there is a significant set-up charge for pressing CD's, we are hoping that there will be enough demand to keep the per-unit costs of the CD below or very near the cost of the most economical release on mag tape. So we are hoping for a greater ubiquity of CD ROM's in a few months than you are anticipating. As a distribution medium, CD ROM's transfer rates are certainly in the same ballpark as those of 1600-bpi mag tapes. In timing tests I've done on raw read rates, the CD ROM was about half the speed of a hard disk (both running on a 386 machine) if one used block reads with a large buffer (30-60 kbytes). CD ROM seek times are notoriously slow, but that simply means you must design the database file formats to reduce the number of seeks (keeping files contiguous on CD ROM is no problem). We hope the database format we've designed for the CD ROM will result in acceptable performance for those who want to use the CD as a database medium, but I won't go into the details of that format here. I hope my digression into the arcana of the GenBank contract's requirements has not stifled all discussion on this newsgroup, but has helped explain out motivation for the CD ROM release. I also hope that through the discussion, it's become clear that whether by FTP or mail or mag tape or floppies or CD ROM or through the interactive GenBank On-Line Service our goal is to make the data available to the greatest number of researchers in the most useful forms we can. In that spirit, I'd just add that we are always interested to learn how we can better serve the database's users and are happy to propose (to the NIH) any reasonable request for a modification in the service we provide. And I mean that, Sincerely, David Benton GenBank Manager 415-962-7360 benton@genbank.ig.com