Path: utzoo!utgpu!watserv1!watmath!uunet!zaphod.mps.ohio-state.edu!think.com!yale!cmcl2!phri!roy From: roy@phri.nyu.edu (Roy Smith) Newsgroups: bionet.molbio.genbank Subject: Re: A question for FTP users Message-ID: <1991Jan21.145215.6851@phri.nyu.edu> Date: 21 Jan 91 14:52:15 GMT References: <9101190214.AA17249@alanine.phri.nyu.edu> <1991Jan19.035659.9788@news.miami.edu> Sender: news@phri.nyu.edu (News System) Organization: Public Health Research Institute, New York City Lines: 61 jkramer@molbio.med.miami.edu (Jack Kramer) writes: > I am one of those who requested that the previous release UPDATE files > be kept on line for some overlap period after a new release. My primary > reason for this is that I maintain two major software packages [...] Each > of the packages uses a proprietary format for the data. Perhaps I misunderstood the original posting; I thought the request was to keep the *entire* previous release on line. Just the updates sounds more reasonable. But, the real reason I'm following up to Jack's posting is to flame the software vendors. The idea of each vendor having a proprietary format for Genbank is nuts. Do vendors really think it's a good idea for people who use two or more packages to have to keep two or more complete copies of the database on-line? Or do they just think that their package is so wonderful, so complete, and so able to fulfill the needs of every user at every site that nobody might ever want to possibly run any software other than either own? I could see how you could make a point for reformatting the database to be in some drastically better format (a relational data base, for example), but many of the reformats I've seen have been nothing more than trivial textual changes that don't make it better, they just make it different. For example, Ross Smith and I both maintain complete copies of GenBank (and other databases) on different machines on the same LAN. For a while, we've been talking about just having a single copy which one of us would NFS mount from the other's disk. A couple of days ago, I got to look at his copy of GenBank. It's still formatted as plain old ascii flat files, but his software vendor decided it was important to insert lines starting with >'s to delimit loci, instead of the "//" delimiter that the files have coming off the tape from IG. There were a couple of other other textual differences which I didn't study too closely, but it was obvious that none of them were fundamental changes; they didn't make the file substantially better than it was before, just different. Enough so that in order for us to share a single copy of the database, one of us would have to re-write a lot of our software to know about the format of the other's database. Assuming the only difference is purely reformating the text, then there is no excuse. If there is some added information, then it seems to me that best thing would have been to create a parallel flat file with the extra info; the vendor's programs could read both files and other programs that wanted to see a virgin GB file could see that too. If the vendor wanted some sort of index into the file, they could have made an index that pointed into the original file; again, programs that wanted the virgin file could just ignore the index. > This is not a complaint about GenBank. The anonymous ftp service is > a real lifesaver for me and I really appreciate all the cooperation > and service I have received from the GenBank staff. I'll go along with that. I've had some minor disagreements with the GenBank folks, but even the closest long-term colaborators don't always agree 100%. By and large, the GB people (both at IG and LANL) have gone out of their way to service every request we have made of them, even when those requests havn't been entirely reasonable. -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"