Path: utzoo!attcan!uunet!bionet!kristoff From: kristoff@genbank.BIO.NET (David Kristofferson) Newsgroups: bionet.molbio.genome-program Subject: Re: USENet and GenBank Updates Message-ID: Date: 19 Mar 90 20:55:49 GMT References: <9003171856.AA15058@genbank.bio.net> Organization: GenBank Online Service Lines: 117 From Prof. Bruce Roe: > I just put up Dr. Clark's FAMAIL shells and they are fantastic. So I have heard from others. The purpose of these is described further below along with some other important observations. > This brings up the question, why should we clutter up our disk drives > with all the databases when GenBank has them easily accessible via the > Internet? Is it because we want them here and do not want to depend Please note that users on BITNET, EARN, NETNORTH, JANET, etc. can also access the databases on ther GenBank computer. > on a network?? I guess that's why. I'd love to have daily database > updates on our VAX and take 5x more CPU time to search the databases > via WORDSEARCH or FASTA and bring every other user on our VAX to a > screeching halt while I search. That gives me power. (is sarcasm > allowed on the net Dave??). > > Seriously though, as the databases grow in size, maintaining > them locally is going to be very difficult and searching them locally > is going to be very very slow. So rather than having all of us deal > with these databases locally, why doesn't the NIH think of funding > various sites located nation-wide to be mini-genbanks with the appropriate > access and searching programs? I think this is called *distributive > computing* and it makes more sense to me than creating a local nightmare. > Frankly, Bruce, I am very much in agreement with you on this point. We (the GenBank On-line Service or GOS for short) have been providing these alternate means of database distribution simply because the demand is there for them and because it does not require much effort to do. HOWEVER, each time anyone has approached me with a new distribution scheme, I have always asked them the question: "WHY DO YOU WANT TO WANT TO WASTE YOUR DISK SPACE AND CPU POWER AND HAVE TO KEEP ON TOP OF THE UPDATE SITUATION DAILY WHEN THE DATABASES ARE MAINTAINED ON THE GENBANK ON-LINE SERVICE AND ACCESS TO THEM OVER THE NETWORK IS **FREE** FOR FASTA SEARCHING AND SEQUENCE ENTRY RETRIEVAL??? The NIH has funded a high speed computer with lots of disk space at GOS precisely for this purpose!!!" The only answer that seems to be valid is if someone needs the entire database present and updated each day locally for some kind of analysis other than FASTA or IRX searching. However, I believe that many sites may just latch on to local maintenance because **it is possible for the systems manager to do**. The usage will probably turn out to be 90%+ for FASTA searches anyway and the NIH will be continually faced with requests for more disks on which to store the data. When the database assumes much larger dimensions than it has currently, this may obviously not be the right way to proceed. It makes more sense economically for the NIH to provide easier access to the database by providing just enough computing power to enable people to get their jobs done expeditiously. Currently, the existing 80 MIPS Solbourne computer at the GenBank On-line Service (GOS) is more than up to this task. At some point this computer will become inadequate and the NIH will have to expand the available power. Again the most economical way to do this will be to set up "mini-GenBanks," as Dr. Roe proposed, by using hardware that is capable of doing the job and is already in place, if possible (I obviously am arguing *against* my own self-interest here since I could just say "give us more money and we'll solve the job here."). Possibly these additional sites could be located at the various Genome Centers under consideration for funding. Dr. Roe also mentioned Steve Clark's scripts for use with GOS. Although I have not seen them myself (we use Suns/Solbournes at GOS, not VAXen), I understand that they are for the VAX something similar to what we had on BIONET, namely a simple interface that appropriately constructs the required mail message for the GOS FASTA Server and then sends it off automatically. This is a straightforward program that relieves the user of having to compose the e-mail submission in the precise server format and greatly simplifies the process. Mailing of the message to the appropriate server address is also automated. Basically all that the user need do is answer a couple of prompts about the file containing his/her query sequence, what database they wish to search, and what parameter settings they wish to use. The search is then sent off to GenBank automatically, the GOS computer reads the message, runs the search automatically, and sends back the scores and requested alignments. Finally the user can then access the entry retrieval server to return anything of interest found during the search. I have run tests of the GOS systems from other machines on the Internet. The transit time for the mail is very short (a couple of minutes back and forth and sometimes less) and the time that it takes to search a 1000 base query against all of GenBank rel. 62 at ktup=4 was about 21-22 minutes! For the majority of users, this will probably be sufficient. It will also keep their local computer freed up for less compute-intensive tasks!! On the other hand, if everyone wants to tie up their machine's CPU's and disks with FASTA searches of the GenBank databank, we at GenBank do not have the right to refuse them this privilege 8-)!! If I were in charge of a local computer, I would make absolutely certain that there is a legitmate need other than FASTA and IRX searching *before* I took steps to maintain a constantly updated version of GenBank locally!! Users, consider yourselves forewarned. -- Sincerely, Dave Kristofferson GenBank On-line Service Manager kristoff@genbank.bio.net