Path: utzoo!attcan!uunet!bionet!LANL.GOV!pgil%histone From: pgil%histone@LANL.GOV (Paul Gilna) Newsgroups: bionet.molbio.genbank Subject: Re: Updates of GenBank via USENET: comparison with V65.0. Message-ID: <9010301510.AA03412@histone.lanl.gov> Date: 30 Oct 90 15:10:46 GMT Sender: daemon@genbank.bio.net Lines: 36 GenBank's collaboration with EMBL has reached a point where we are beginning to implement a data exchange syntax that will allow updating of data between the two databases. The goal in this process is to bring the two databases to the point where they are functionally equivalent, i.e., there will be no data in one database that are not represented equally in the other, and all updates will be propagated. Hence it is hoped that eventualy one need mount only one version of the database, knowing that all of the data from the other are represented. We would hope to have these mechanisms in place by early next year. In the meantime we struggle with the current means of merging each other's data. One aspect of this process at GenBank requires that we take a tape release from EMBL, work out what is new or updated, convert it to GenBank format and merge it with the GenBank database. In the past, this was a time consuming process that required some manual intervention by annotators. We have restrained ourselves from making much needed improvements in this procedure as we believe that the data exchange mechanisms will solve the current problems inherent in this process. However in an attempt to speed up the process of release merging, we automated the conversion step to the point where the necessary intervention could occur after the merge, rather than before. This change involved "parking" the converted EMBL entries in the unannotated division ( and currently, are the only class of entry that enter that division, GenBank, for the most part has ceased creating unannotated entries), and most are removed from there by the following release. We reasoned that there was not much point in mounting these entries on the servers, as they are merely limited versions of existing data already present on the EMBL and GenBank-On-line servers. It is thus very likely that the discrepancy between the USENET distributions and the tape release is accounted for by these entries. Regards, --paul