Path: utzoo!utgpu!watserv1!watmath!uunet!wuarchive!bcm!bionet!vax.oxford.ac.uk!JREES From: JREES@vax.oxford.ac.uk Newsgroups: bionet.molbio.genbank Subject: Re: A question for FTP users Message-ID: <9101241737.AA27782@genbank.bio.net> Date: 24 Jan 91 12:41:00 GMT Sender: daemon@genbank.bio.net Lines: 56 Time I guess to say my piece also... It shouldn't even need questioning that reformatting the databases form one format to another is inappropriate, and I won't reiterate the arguements made before here as to, that has been done already. But I am going to add my voice to those who would see ALL the software packages accept the format distributed by the databases as the one to use for access to the databases in straight ascii format. Will Gilbert articulated this whole area very clearly on INFO-GCG last August, and perhaps he can be persuaded to repost to this discussion, but in essence it is very simple for everyone using the flat format files to interface to them in "native" format and to provide utilities to index the access in a fashion which facilitates access for their own package. Some programmers seem willing to do this (Rodger Staden for one has stated a willingness to use whatever format is chosen by PIR, and has no objection to using "native" format if they do), others seem very determined to go their own way in the face of opposition (perhaps rather silent oppostion until now) from those are actually on the receiving end of the effect of this dogma. Since there can be no programming advantage that I can see for the reformatting the question is why is noone willing to standardise? It is clear that the standard HAS to be that created by the database in question, and that software can be written to meet whatever format it is presented with, and that all the packages COULD use whatever format the database was presented to them in (EMBL, Genbank, PIR, Codata, whatever) by setting the appropriate parameter to the package at the start. Perhaps if this were done then there would be less time and effort wasted making multiple copies for everyday use. The problems the Genbank/IG have with disk space probably apply to most of use - my own operations running software and databases in Oxford and at MIT run 700MB on each machine, reformatting databases generally means finding 150 MB of spare space at a minimum, more if the active version is not deleted first, and this is getting worse as the databases get larger each year. Clearly the change we could have all hoped for in the construction of the relational format Genbank and Embl has not yet gained the active interest of the programming community as an option, it is my hope that this will be the way forward in the long term, and that those in a position to advance this will do so on this forum. Finally to avoid the point being missed, I am fully in support of the use for total reformattting where it achieves a significant change in the response to the user, the preprocessing required to run BLAST or GCG's Quick software is an investment well worth making - even when the overall cost in resource has been higher, but the problem under discussion here does NOT achieve that end. Jasper Rees Jrees@Vax.Oxford.ac.uk (%nsfnet-relay.ac.uk) Seqtest@Wccf.MIT.edu "One World, One database format ?"