Path: utzoo!utgpu!watserv1!watmath!uunet!bionet!lhc!lhc!hunter
From: hunter@work.nlm.nih.gov (Larry Hunter)
Newsgroups: bionet.molbio.genome-program
Subject: Re: General Reference
Message-ID: <HUNTER.90Dec10100343@work.nlm.nih.gov>
Date: 10 Dec 90 18:03:43 GMT
References: <1990Dec10.005756.2694@agate.berkeley.edu>
	<39971@ucbvax.BERKELEY.EDU>
Sender: usenet@nlm.nih.gov (usenet news poster)
Organization: National Library of Medicine
Lines: 53
In-Reply-To: aoki@postgres.Berkeley.EDU's message of 10 Dec 90 08:19:39 GMT


In response to Tzi-cker Chiueh's query about what a computer scientist
can do for/with genomic data, Paul Aoki writes:

   For some reason, AI techniques aren't very popular -- people like
   brute-force, optimal-cost methods.  Parallel programming is popular,
   since the dynamic programming computations are easily parallelized
   (one group is using a Connection Machine, another uses a Sequent, yet
   another uses the ICL DAP array processor).  Most database technology
   flies right out the window because the databases are still small
   enough that a system that goes to disk a lot will have horrible
   performance relative to more ad-hoc, main-memory- oriented search
   software.

Although Aoki's opinions are a helpful beginning, I have to take issue
with a couple of points.  There is actually quite a bit of AI being
done in genome-related areas, and the requirements of genome-related
databases (not only sequence, but protein structure, coarser grained
genetic maps, etc.) place significant pressure on existing database
technologies.

As for AI, I can point to more than 100 people listed in a database of
ai & molecular biology researchers that I maintain, doing work in very
diversse areas.  I have an article which surveys some of this work
(based on the talks given at 1990 AAAI Spring Symposium on AI &
Molecular Biology) which will appear in the next issue of the AI
Magazine.  You may note that the predicted secondary structure of the
principle neutralization determinant of HIV-1 on the cover of the 24
August 1990 issue of Science was generated by a neural network.

BTW, the AI/MB database, which contains information on research
interests and current projects of many people from around the world,
is publicly available.  It can be obtained by anonymous ftp from the
host lhc.nlm.nih.gov in the directory /pub/aimb-db, or by request to
the University of Houston email server.

Finally, although I am not an expert in database issues, I would
suggest contacting the National Center for Biotechnology Information
to find out about work in biosequence and other databases.  You can
download information from ncbi.nlm.nih.gov using anonymous ftp, or
send mail to federhen@ncbi.nlm.nih.gov.

Good luck.  There are many good computer science problems involved in
genome work; we need good computer scientists to attack them.
--
Lawrence Hunter, PhD.
National Library of Medicine
Bldg. 38A, MS-54
Bethesda. MD 20894
(301) 496-9300
(301) 496-0673 (fax)
hunter@nlm.nih.gov (internet)
hunter%nlm.nih.gov@nihcu (bitnet/earn)