Path: utzoo!utgpu!watserv1!watmath!att!emory!wuarchive!usc!rutgers!shelby!hsdndev!husc6!cherry@frodo.mgh.harvard.edu From: cherry@frodo.mgh.harvard.edu (J. Michael Cherry) Newsgroups: bionet.molbio.genbank Subject: Fewer new sequences in Oct and Nov Keywords: GenBank EMBL Message-ID: <4938@husc6.harvard.edu> Date: 6 Dec 90 02:59:20 GMT Sender: news@husc6.harvard.edu Organization: Molecular Biology - Massachusetts General Hospital, Boston Lines: 69 The number of new sequence entries to both the GenBank and EMBL nucleic acid databases have decreased starting around the first of October. The table below shows the number of new entries by week for GenBank and EMBL. The GenBank numbers were obtained by checking the weekly update files on genbank.bio.net (directory ~ftp/pub/db/gb-newdata) and the EMBL numbers were obtained from the weekly listing of new entries available from NETSERVER@EMBL.BITNET (send the message: GET NUC:NEWENTRIES.NDX). A couple of notable observations from these numbers. The total number of new sequences has decreased about three fold for the Oct/Nov period as compared to the Aug/Sep period. EMBL is now (Oct/Nov period) releasing about three times the number of new sequences as GenBank. However in the Aug/Sep period the two databases released about the same number. I would be interested in your thoughts as to what might have caused the decrease in new sequences. One that comes to mind is that around the first of October GenBank switched to their RDBM system. [ If the decrease is simply a result of a decreased flow from GenBank, assuming that the rate EMBL is entering new sequences is constant, then this suggests that in the past GenBank had entered the majority of the new sequences. The number of new sequences dropped three fold from Aug/Sep to Oct/Nov for EMBL, and seven fold for GenBank. ] Another possibility is simply that the number of new sequences being published has decreased between the two time periods, but I do not believe it would have decreased 3 fold. Prehaps the weekly updates on genbank.bio.net are not complete, but this must also apply to the mechanism used to transfer new sequences from GenBank to EMBL because of the large decrease in the EMBL numbers. Prehaps EMBL has also had problems, I am not as familiar with the EMBL setup as I am with that of GenBank (which is not that familiar to start with) so I generally look to GenBank first. Finally keeping an open mind I must suggest that my numbers contain an error that I do not know about. If you think this final possibility is true I would be very happy to learn where I am in error. In any event it would appear that anyone that is trusting GenBank to contain all the known new sequences should reconsider. EMBL appears to be currently adding more new sequences than GenBank. Not listed below are the number of new entries in the embl-newdata files on genbank.bio.net. There were 677 sequences in the embl-newdata files for the Oct/Nov period, as compared with 464 sequences for GenBank. Mike Cherry cherry@frodo.mgh.harvard.edu Week ending GenBank (gb newdata) EMBL (FileServer) Aug-13-1990 499 546 Aug-20-1990 690 713 Aug-27-1990 248 68 Sep-3-1990 298 473 Sep-10-1990 306 308 Sep-17-1990 278 263 Sep-24-1990 575 481 Oct-1-1990 231 162 Oct-8-1990 10 109 Oct-15-1990 19 52 Oct-22-1990 35 135 Oct-29-1990 70 105 Nov-5-1990 85 172 Nov-12-1990 71 271 Nov-19-1990 90 158 Nov-26-1990 12 74 Dec-2-1990 72 131 Aug & Sep total 3125 3014 Oct & Nov total 464 1207 Grand Total (Aug-Nov) 3589 4221 Brought to you by Super Global Mega Corp .com