Path: utzoo!attcan!uunet!bionet!kristoff From: kristoff@genbank.bio.net (David Kristofferson) Newsgroups: bionet.molbio.genbank Subject: Re: ANONYMOUS FTP FROM BITNET Keywords: Internet Message-ID: Date: 2 Oct 90 20:20:11 GMT References: <9009290610.AA23614@genbank.bio.net> <1990Sep30.043924.19653@nlm.nih.gov> Organization: GenBank Online Service Lines: 125 David States gave an excellent description of the advantages of the Internet over BITNET, and I would heartily second the fact that sites should get on the Internet. Unfortunately it often takes some time, money, and effort for this to occur. I suggest that if someone at your campus is not already working on getting an Internet style network connection, then they should begin ***immediately*** before the data problem reaches overwhelming proportions. However there was one statement made in Mr. States' message which was less than accurate. > With TCP/IP, the turn around is still interactive rather than hours or > days as in FASTA-mail. As many of our readers know, FASTA-MAIL is a GenBank service. As our readers who have USED THE SERVICE also know, the turnaround on FASTA-MAIL, while not "interactive," is very fast, on the order of minutes, not "hours or days." I recently did a demonstration of the service in Mr. States' back yard at the NIH and got the results of my search back in about ten minutes. During this time my terminal was freed for other aspects of the demo, instead of sitting "interactively" looking at a "Working ..." message. Because many biologists still do not have Internet connections, FASTA-MAIL provides a needed service to them. We are also working on providing access by e-mail to the newer BLAST program which was developed at NCBI and appears to be a faster search algorithm. Another point that needs clarification: > You save the expense and aggravation of attempting to maintain > an up to date local database copy, and the net saves the traffic > of sending the whole database. GenBank's goal *is* to allow remote sites to have their own local copies of the database in a relational database management system and to have the local copies updated over the network, not by sending megabyte size files, but instead by providing sites with an initial copy of the database and then by sending "transactions" which automatically update individual entities in the local copies every time the master copy is changed. The software to provide these transactions to remote sites is currently undergoing testing and more will be announced about this later. While it is true that for things like FASTA searches it is a waste to maintain a local copy, I have heard enough comments from the community over the last several years that indicate that the desired set-up is to have a local copy for more specialized applications, but also to have access to a powerful remote facility for offloading routine, but CPU-intensive searches. Although I have personally managed centralized time-sharing services such as BIONET, it appears to be the case that these systems are not the wave of the future except for specialized applications. Right now remote database searching can be done for free on the GenBank On-line Service via FASTA-MAIL or interactively over the Internet or SprintNet by GOS account holders. So much for specifics, but now for a more general and much more important statement about Mr. States' remark about FASTA-MAIL. As I mentioned in a recent posting on BIONEWS, there are many discussions going on right now in "high places" related to the future of bio-computing, particularly as it impacts the Genome Project. The National Center for Biotechnology Information where David works is a key player in these debates and will be the agency that oversees the next GenBank contract which will start in 1992. One would hope that, given NCBI's important role, public statements by its employees should be very carefully considered and based on fact, not on distortions. If there is a better way of doing things, then it should be perfectly possible to demonstrate it by setting up and successfully running a service. NCBI has already provided us with some fine software such as IRX and BLAST, so I do not doubt their talents in software development. However, I sincerely hope that we will evolve into the future in this fashion **** rather than by attempting to put down existing systems through the spread of misinformation ****. GenBank has unfortunately been an easy target to shoot at because the first five year contract underestimated the size of the task, and the resulting lack of funds led to a tremendous data backlog. This backlog has been largely eliminated during the second five year contract and the NIH GenBank advisors commended both LANL and IntelliGenetics for their progress at the last advisory meeting. Word of this progress is slow to get out unfortunately and complaints are always remembered much longer than compliments. One can also still find responsible people quoting outdated GenBank backlog statistics in print. You have my solemn word that if flaws are pointed out we will OPENLY either attempt to correct them to the best of our ability or step aside if the system is so structurally flawed that an entirely new attempt is needed. However you may also rest assured that I will vigorously respond to any attempt at distortion of the facts. It is always easy to tear down through distortion, but this is not the kind of tactic that one would expect from those who are really professional and who really have better ways of doing things. Their results should be able to speak for themselves. I also suggest that the community pay close attention to any services offerred and provide their feedback ** before ** decisions are made. *** In the end, it will be the users who will be left with the results. *** Given the amount of data projected to be generated by the Genome Project a mistake made now would make the backlog of the initial GenBank attempt appear miniscule by comparison. Unfortunately the users are often the last to react because they are not brought in to the decision loop. I have argued before, and will do so again, that electronic newsgroups can be a new element in this review process. Although the decision must ultimately be the responsibility of a single person or small group, the technology nows exists to easily sample a wide range of opinion. Why not take advantage of this, particularly when so much is at stake? Why not utilize the collective experience residing on the net? Currently we have "developers meetings" where people are asked to digest a large amount of new information in the course of a day. Why not do this over the net so that people can react more intelligently than in a one day jet-lagged haze? After all, scientists are supposed to be progressive, right? ... right? -- Sincerely, Dave Kristofferson GenBank Manager kristoff@genbank.bio.net