Path: utzoo!attcan!uunet!mailrus!iuvax!bionet!kristoff From: kristoff@genbank.BIO.NET (David Kristofferson) Newsgroups: bionet.molbio.genome-program Subject: Re: local copies of genbank Message-ID: Date: 20 Mar 90 17:55:20 GMT References: <4448@mace.cc.purdue.edu> Organization: GenBank Online Service Lines: 50 Rick, I'm sure that there are lots of machines which are not yet overloaded. The issue will begin to develop as the Genome Project starts to produce larger amounts of sequence data than currently comes in each day. I also was not questioning whether or not it was difficult to do the updates; the issue will be disk space and CPU power as the database grows. You mentioned some of the problems with extracting sequences out of e-mail messages, but I should remind you that users can also directly access the IRX program on genbank.bio.net and download sequences of interest directly without mail headers. The time of less than half an hour applies just to FASTA searches. E-mail retrieval of sequences takes about 2-3 minutes based on some tests that I have run from the east coast to our machine in California. Regarding formatting, it is true that the sequences come in a form that is not immediately usable by your commercial software, but this also holds true for the daily updates over USENET or the weekly FTP files. Reformatting for GCG must be done in any case. Of course, it is undoubtedly easier for the systems manager to process a whole block of data at a time, but it is also trivial to have a small script which users can run to do this on sequences of interest. We are in agreement about the need for a local copy of the database if you run local analyses on the whole database other than FASTA and IRX. I should point out, however, that the functionality which you are using is also available on the GOS computer for those who get accounts on the system (the QUEST program). Regarding overload, this will undoubtedly happen here eventually too. As compared to a 10 MIPS machine, our system has four 22 MIPS processors. We have not been bogged down yet and can handle much more than 4 fasta searches at a time. Part of my point though ***which holds true even if you do have a local copy of GenBank*** is that you can save your local CPU power by offloading FASTA searches to our machines. That way your users will have better response for their other uses of your local software. Have a good vacation! -- Sincerely, Dave Kristofferson GenBank On-line Service Manager kristoff@genbank.bio.net