Path: utzoo!utgpu!watserv1!watmath!iuvax!bionet!LANL.GOV!pgil%histone From: pgil%histone@LANL.GOV (Paul Gilna) Newsgroups: bionet.molbio.genbank Subject: Time lag for sequence appearence Message-ID: <9001171536.AA02450@histone.lanl.gov> Date: 17 Jan 90 15:36:17 GMT Sender: daemon@genbank.BIO.NET Lines: 80 Rupert de Wachter (RRNA@ccv.uia.ac.be) from the University of Antwerp (Belgium) writes: I would like to have some more information about a few things: - can sequences automatically be retrieved using this same e-mail number or is there another access to the file server? - can we ask on-line help? - how would a retrieval using the accession number of a particular sequence, for example M22441, look like? - does a sequence appear on the server as soon as it is mentioned in a publication or is there any delay? Dear Dr. de Wachter, Our colleagues at IntellGenetics will handle your inquiries regarding the online system, I should like to address the final question in your list; "- does a sequence appear on the server as soon as it is mentioned in a publication or is there any delay?" There are three principal sources of nucleotide sequence data that are handled by the data entry and annotation staff here at LANL; (1), the printed publication, where data are manually entered by our data entry crew, (2), direct author submission, where the sequence data and associated bibliographic and biological information are provided directly to us by the scientist, and (3), incorporation of data from EMBL and DDBJ releases. In the former case (extracted from publication), the time taken for the data to appear on the on-line system is a function of the time taken to process a particular article through our data entry and annotation staff. As soon as our staff here are completed with an entry it is immediately passed to the servers both at Intelligenetics and at EMBL (as well as Houston). Currently we are averaging a six week turnaround from the date of publication to the appearence of a fully annotated "entry" on the on-line system. This is in contrast to the 13 month average for this source of data two years ago. In regard to the second source of data, i.e., from the author, if the data are received in computer-readable form, they should appear on the servers in fully annotated form within two weeks or less. If received in hard-copy form, they go through the process described above. The fact that we receive the bulk of our direct submissions AHEAD of publication, means that the data appear on the on-line systems and servers before or close to the date of publication. We often have the data in our hands far enough in advance of publication to have errors that we spot in our routine integrity checking procedures corrected by the author before publication; in a sense we provide a peer-review function for the sequence data itself, a review not often carried out in the conventional editorial review process. If the data submitted to us are associated with a manuscript that has yet to be accecpted by the journal editorial process, they will be classified as "unpublished" ( this removes complications which might occur if the journal chose not to accecpt the manuscript): the entry will be updated with the correct citation once we spot or are notified of publication. We now receive about 65-70% of our data direct from the community. About 70-80% of that are in electronic form, whether by e-mail or on floppy disc. While we here at Los Alamos currently incorporate data from EMBL and DDBJ releases within two weeks of receipt of the tapes, EMBL in addition supply new data to the GenBank on-line server on a similar daily basis. Finally for all submissions, we offer the choice to the author of holding that data confidential until such time as we are given permission to release the data or they are published. In some cases, there is a time lag before we spot the appearence of data in the literature and link this to data we are holding as confidential, but this should not normally exceed two weeks if the data appear in journals that we regularly scan. I hope this answers your question. Regards, Paul Gilna Ph.D., Biology Domain Leader GenBank, Los Alamos.