Xref: utzoo bionet.molbio.genbank:445 sci.bio:4913 Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sample.eng.ohio-state.edu!purdue!haven.umd.edu!ncifcrf!fcs260c2!toms From: toms@fcs260c2.ncifcrf.gov (Tom Schneider) Newsgroups: bionet.molbio.genbank,sci.bio Subject: Re: Software for automated subseqence extraction Message-ID: <2139@fcs280s.ncifcrf.gov> Date: 30 Apr 91 14:28:35 GMT References: Sender: news@ncifcrf.gov Followup-To: bionet.molbio.genbank Organization: NCI Supercomputer Facility, Frederick, MD Lines: 50 In article eesnyder@boulder.Colorado.EDU (Eric E. Snyder) writes: >I am looking for some software that will allow me to extract subsequences >from genbank or PIR. The Delila system, old and senile as it is, was designed to extract large sets of subsequences (DNA only). >For example, I would like to be able to provide a keyword such as 'splice >site' and have the program search genbank and return with a list of sequence >names and the subsequence from each entry corresponding to my keyword. Because Delila was designed before GenBank, and GenBank structure is STILL not up to snuff, one must convert from GenBank to Delila format. This is a simple program called dbbk (written by Matt Yarus, son of Mike Yarus, you may be interested to know!). The Delila viewpoint is that the database consists of a set of organisms and their chromosomes. You must specify these, and then the piece of DNA you are interested in. The piece corresponds roughly to a GenBank entry. The idea is that Delila is a 'librarian' and you give 'her' instructions that define the fragments you want. She reaches into the library and pulls out -- what else? -- a book. Instructions might look like: title 'Demonstration of Delila instructions'; (* the title is required to name the resulting book *) (* this is a comment, just as in the computer language Pascal *) organism H.sapians; (* define the organism *) chromosome 3; (* I made this name up; unfortunately GenBank hasn't stored this information consistently *) piece x253; (* I made this name up also *) get from 536 -24 to 536 +30; The last instruction, 'get' says to Delila that you want the fragment that starts 24 bases before coordinate 536 and ends 30 bases after. By having the instructions written in a file, one can handle many of them. There is now a program that automatically creates Delila instructions from the GenBank features. This has allowed us to create hundreds to thousands of fragments for statistical analysis. Parts of the Delila system are available by anonymous ftp from ncifcrf.gov in pub/delila. See the README files. I will place more programs in the archive if you request them. Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov