Path: utzoo!utgpu!watserv1!watmath!att!pacbell.com!ucsd!sdd.hp.com!zaphod.mps.ohio-state.edu!julius.cs.uiuc.edu!apple!bionet!AARDVARK.UCS.UOKNOR.EDU!BROE From: BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) Newsgroups: bionet.molbio.genbank Subject: Re: A question for FTP users Message-ID: <9101232140.AA25920@genbank.bio.net> Date: 23 Jan 91 14:24:00 GMT Sender: daemon@genbank.bio.net Lines: 94 Hi, Obviously the problem with the databases, their formats, and programs to access the databases continues. As with most things in life there are no simple solutions until, of course, the solution is found and then everyone says: "My, that solution was so simple, why didn't we think of it before". The solution is rather simple, a common, stable, database format. Without this, venders of software have 2 choices: 1. Reformat the databases to fit their software 2. Change their software to read the distributed databases. Until now the choice of the venders has been the former, mainly because the format of the databases was (and still is) in a state of change. It is more efficient to write a program to change the database format than it is to change the multitude of code for dealing with the databases. John and the folks at GCG have provided tools for converting GenBank to GCG format and for inter-converting individual sequences from one format to another. The Staden programs read the databases stored in the PIR format but can analyze individual sequences stored in any of several formats. I do not know what IG does in their package but am sure they have some similar approaches or do they use GenBank without reformatting? David K. has written: > As I am sure you are aware, it is not in GenBank's charter to >supply the databank in any commercial format. Reformatting costs >money regardless of who does it. If we were required to reformat the >database as you suggest, we would be obligated to provide it for >*every* commercial vendor. This is clearly impractical. Also since >many users do not have access to FTP, they would still have to rely on >tape or CDROM distributions. The net effect of this would be to delay >the production of GenBank tremendously. Reformatting GenBank clearly >belongs where it is right now, in the hands of the commercial vendors. Give me a break. How many many vendors is *every* ? Do folks really search the entire GenBank from their pc's? Some search the protein databases on their pc's/macs but the entire GenBank? Could we at least concentrate our discussion on MainFrame computer programs and databases on these. Maybe I'm mistaken but I count three MainFrame program sets as the vast majority used, GCG, IG, and NBRF/PIR. A few sites have the Staden programs but most of us who use the Staden programs use them for purposes other than database searching. In reality, Bill Pearson's FASTA and companion programs probably are used the most and they handle the GCG formatted databases. I think what we need here is a survey of what's out there. If we limit our discussion to Main Frame programs and FTP sites and not deal with individual users but rather with sites. I also do not think we should consider other forms of the databases, such as those which require pre-processing for the NLM BLAST programs or GCG's QUICKSEARCH. The problem is time and money. If GCG supplies users with tapes for $1600 they make money but they sure save me lots of time and I get ALL the databases we want and need in a format we can use. I also do not have to worry about transmission error which may corrupt an ftp-ed database. If I get the GenBank tapes I still have to pay (although less) but then I have to spend time re-formatting databases and also get additional tapes from PIR and maybe others which could bring the cost in tapes and effort to a figure greater than the cost from GCG. No matter what it looks like the NIH is going to pay the bills, either from individual grants or from contracts to GenBank/IG. I'd like to hear from the funding agencies and also like comments from those who supply databases to the rest of us. My overall conclusions are: (1) pay the money to GCG and get quarterly database updates on tape as it is the least hassle for me and our system folks. (2) encourage users to search the latest databases using FASTA-Mail,etc. (3) continue to join with others to encourage discussions which will result in a common, stable database format. Best to one and all, Bruce A. Roe Professor of Chemistry and Biochemistry INTERNET: BROE@aardvark.ucs.uoknor.edu BITNET: BROE@uokucsvx AT&TNET: 405-325-4912 or 405-325-7610 SnailNet: Department of Chemistry and Biochemistry University of Oklahoma 620 Parrington Oval, Rm 208 Norman, Oklahoma 73019 FAXnet: 405-325-6111 ICBMnet: 35 deg 14 min North, 97 deg 27 min West