Path: utzoo!attcan!uunet!bionet!lhc!ncifcrf!fcs260c2!toms From: toms@fcs260c2.ncifcrf.gov (Tom Schneider) Newsgroups: bionet.molbio.genome-program Subject: Re: feature table parsers - what's it all mean? Message-ID: <1894@fcs280s.ncifcrf.gov> Date: 3 Oct 90 18:26:47 GMT References: <4725@lure.latrobe.edu.au> Sender: news@ncifcrf.gov Organization: NCI Supercomputer Facility, Frederick, MD Lines: 28 I am happy to see a discussion on parsing GenBank after all these years. The feature table is only part of the problem. For example, entries of GenBank now end with a // (an idea taken from the embl database) so that programs could distinguish where entries ended. Before Matt Yarus suggested this to me and I brought the suggestion it to the GenBank staff, it was difficult to tell where the entry ended. Indeed, since there is no definition of GenBank, some programs give one hacked up entry formats that do not end with a // nor do they have the same format as is on the GenBank tapes. The authors of these programs don't understand that the output of their programs should have a // at the end simply because there is no standard definition of the format. SUMMARY: we need a FULL DEFINITION of GenBank, not just the feature tables!!! Example: we should be able to parse out the topology of an entry (circular or linear sequence) and the references. The topic is wider than most people have discussed so far, and I don't understand why GenBank has resisted creating the definition for so long. I made the suggestion that a parsable form with an associated DOCUMENT and DEFINITION is a requirement for the database AT LEAST 8 YEARS AGO. How many more years will we have to wait for this? Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov