Path: utzoo!attcan!uunet!bionet!kristoff
From: kristoff@genbank.BIO.NET (David Kristofferson)
Newsgroups: bionet.molbio.genome-program
Subject: Re: USENet and GenBank Updates
Message-ID: <Mar.19.12.55.48.1990.8917@genbank.BIO.NET>
Date: 19 Mar 90 20:55:49 GMT
References: <9003171856.AA15058@genbank.bio.net>
Organization: GenBank Online Service
Lines: 117

From Prof. Bruce Roe:

>    I just put up Dr. Clark's FAMAIL shells and they are fantastic.

So I have heard from others.  The purpose of these is described
further below along with some other important observations.

> This brings up the question, why should we clutter up our disk drives
> with all the databases when GenBank has them easily accessible via the
> Internet?  Is it because we want them here and do not want to depend

Please note that users on BITNET, EARN, NETNORTH, JANET, etc. can also
access the databases on ther GenBank computer.

> on a network??  I guess that's why.  I'd love to have daily database
> updates on our VAX and take 5x more CPU time to search the databases
> via WORDSEARCH or FASTA and bring every other user on our VAX to a
> screeching halt while I search.  That gives me power. (is sarcasm
> allowed on the net Dave??).
>
>    Seriously though, as the databases grow in size, maintaining
> them locally is going to be very difficult and searching them locally
> is going to be very very slow.  So rather than having all of us deal 
> with these databases locally, why doesn't the NIH think of funding
> various sites located nation-wide to be mini-genbanks with the appropriate
> access and searching programs?  I think this is called *distributive
> computing* and it makes more sense to me than creating a local nightmare.
> 

Frankly, Bruce, I am very much in agreement with you on this point.
We (the GenBank On-line Service or GOS for short) have been providing
these alternate means of database distribution simply because the
demand is there for them and because it does not require much effort
to do.

HOWEVER, each time anyone has approached me with a new distribution
scheme, I have always asked them the question:

"WHY DO YOU WANT TO WANT TO WASTE YOUR DISK SPACE AND CPU POWER AND
HAVE TO KEEP ON TOP OF THE UPDATE SITUATION DAILY WHEN THE DATABASES
ARE MAINTAINED ON THE GENBANK ON-LINE SERVICE AND ACCESS TO THEM OVER
THE NETWORK IS **FREE** FOR FASTA SEARCHING AND SEQUENCE ENTRY
RETRIEVAL???  The NIH has funded a high speed computer with lots of
disk space at GOS precisely for this purpose!!!"

The only answer that seems to be valid is if someone needs the entire
database present and updated each day locally for some kind of
analysis other than FASTA or IRX searching.  

However, I believe that many sites may just latch on to local
maintenance because 

	  **it is possible for the systems manager to do**.

The usage will probably turn out to be 90%+ for FASTA searches anyway
and the NIH will be continually faced with requests for more disks on
which to store the data.  When the database assumes much larger
dimensions than it has currently, this may obviously not be the right
way to proceed.  It makes more sense economically for the NIH to
provide easier access to the database by providing just enough
computing power to enable people to get their jobs done expeditiously.

Currently, the existing 80 MIPS Solbourne computer at the GenBank
On-line Service (GOS) is more than up to this task.  At some point
this computer will become inadequate and the NIH will have to expand
the available power.

Again the most economical way to do this will be to set up
"mini-GenBanks," as Dr. Roe proposed, by using hardware that is
capable of doing the job and is already in place, if possible (I
obviously am arguing *against* my own self-interest here since I could
just say "give us more money and we'll solve the job here.").
Possibly these additional sites could be located at the various Genome
Centers under consideration for funding.

Dr. Roe also mentioned Steve Clark's scripts for use with GOS.
Although I have not seen them myself (we use Suns/Solbournes at GOS,
not VAXen), I understand that they are for the VAX something similar
to what we had on BIONET, namely a simple interface that appropriately
constructs the required mail message for the GOS FASTA Server and then
sends it off automatically.  This is a straightforward program that
relieves the user of having to compose the e-mail submission in the
precise server format and greatly simplifies the process.  Mailing of
the message to the appropriate server address is also automated.
Basically all that the user need do is answer a couple of prompts
about the file containing his/her query sequence, what database they
wish to search, and what parameter settings they wish to use.  The
search is then sent off to GenBank automatically, the GOS computer
reads the message, runs the search automatically, and sends back the
scores and requested alignments.  Finally the user can then access the
entry retrieval server to return anything of interest found during the
search.  

I have run tests of the GOS systems from other machines on the
Internet.  The transit time for the mail is very short (a couple of
minutes back and forth and sometimes less) and the time that it takes
to search a 1000 base query against all of GenBank rel. 62 at ktup=4
was about 21-22 minutes!

For the majority of users, this will probably be sufficient.  It will
also keep their local computer freed up for less compute-intensive
tasks!!  On the other hand, if everyone wants to tie up their
machine's CPU's and disks with FASTA searches of the GenBank databank,
we at GenBank do not have the right to refuse them this privilege
8-)!!

If I were in charge of a local computer, I would make absolutely
certain that there is a legitmate need other than FASTA and IRX
searching *before* I took steps to maintain a constantly updated
version of GenBank locally!!  Users, consider yourselves forewarned.
-- 
				Sincerely,

				Dave Kristofferson
				GenBank On-line Service Manager

				kristoff@genbank.bio.net