Path: utzoo!attcan!uunet!wuarchive!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!cica!iuvax!noose.ecn.purdue.edu!mentor.cc.purdue.edu!mace.cc.purdue.edu!cjv From: cjv@mace.cc.purdue.edu (westerman) Newsgroups: bionet.molbio.genome-program Subject: local copies of genbank Message-ID: <4448@mace.cc.purdue.edu> Date: 20 Mar 90 16:17:44 GMT Reply-To: cjv@mace.cc.purdue.edu (westerman) Organization: Purdue University Lines: 79 As a system manager who plans to keep the Genbank database online on my systems and who plans *not* to, except rarely, utilize the fasta and retreival capiblities of genbank.bio.net, I'd like to respond to Dave Kristofferson's recent posting on why he thinks local copies of the database should be discouraged. First, my circumstances: 1) We are running the GCG (Wisconsin) sequence analysis package on VAX/VMS systems. 2) My systems are not overloaded; we have spare CPU power and disk space. 3) I do weekly updates of the database via ftp. This takes about 5 minutes of my time and about 1 1/2 hours of machine time (done in the background at very low priority, maybe 15 minutes of actual CPU time). 4) I have looked at/installed Clark's shells. They are very nice and hide the "dirty details" from the user. My objections to using genbank server are threefold: 1) Time. While it only takes a little bit more time to retreive a database entry from the server as it does from our local database (I estimate twice as long, which isn't bad considering the emailing that needs to be done), this delay is irritating when you sitting looking at a blank CRT. While I haven't done fasta timing tests, I suspect that the genbank computer is faster than mine; on the other hand, having 4 computers at my disposal means I can do 4 searches simitaneously. In any case, fasta searching is not time critical -- a search of 1/2 hour (via genbank) or 2 hours (maximum via my computers) still means that I must walk away from my desk and/or do something else; in any case I am not sitting around just waiting (unlike in the retreival case above). 2) Formatting Retreival results from genbank come back in a form that I cannot immediately use for further processing, instead I must extract the sequence from my mail and then convert the sequence to a form the GCG package can use. Granted, these steps are minor, but they are extra steps and irritating because of that. 3) Other uses of the database I have other programs that need to access the entire database besides fasta. One of these is the GCG program "FIND", which finds short matches in sequences; one of my group is using this program to try to find various promoter sites. By having a local copy of the database, we can do theoretical analysis of the database. A further comment: 4) I suspect that the reason genbank is currently a feasible option is that it is not overload, much in the same manner as my system is able to handle a minimum of 4 fasta searches at a time; however if we started getting over 6 searches we would start bogging down; and if genbank starting getting over XXX (60? ten times my load?) searches at a time, they would bog down too. (BTW: I have about a 10 MIPs system) I wish I could contribute further to this thread of netnews, but I am off on vacation for a week or so. -- Rick -- Rick Westerman AIDS Center Laboratory for Computational Internet: cjv@mace.cc.purdue.edu Biochemistry, Biochemistry building, (317) 494-0505 Purdue University, W. Lafayette, IN 47907