Path: utzoo!utgpu!watserv1!watmath!uunet!wuarchive!usc!snorkelwacker!bionet!benton From: benton@genbank.BIO.NET (David Benton) Newsgroups: bionet.molbio.genbank Subject: Re: GenBank Release 64 was incomplete Message-ID: Date: 26 Aug 90 02:03:50 GMT References: <9008241347.AA08860@acme.med.unc.edu> Organization: GenBank Online Service Lines: 98 Here is a more complete reply to some of the issues raised in Dana Fowlkes' original posting. Dana Fowlkes wrote: > We noted that some of the GenBank release 64 files were incomplete. These > had been downloaded about the end of July. For instance, the GBpln.seq > files had no yeast sequences. This caused grave misgivings at our institution > which has more than its share of yeast people. Some other files were also > incomplete. I checked with a friend who had also done a download at about > the same time. His files were identical to mine. I went back to GenBank > and checked to see if new files were available. Yes, they are and are > dated August 13. Thus, If you have downloaded Genbank 64 before August > 13, you need to check your files for their sizes compared to those > presently available. Based on Rick Westerman's posting and on our own checking of the the files originally in ~ftp/pub/db/gb-rel64 (recovered from the July 30 backup tape), I must conclude that the files which were there between mid-July and August 13 were complete. Although it isn't clear how Rick counted the "yeast-related sequences" in the plant division, the number he reported is of the right order. So, it seems that the complete files were on-line and at least one person successfully downloaded them. Since Dana Fowlkes reports a second incident of identically incomplete files, the cause would seem to be more serious than a transient failure of ftp. I would suggest that anyone who retrieves files check their sizes after uncompressing them. The file gbrel.txt contains a table of the file sizes both before and after decompression. Here is the part of the summary of Release 64 plant division which reports yeast entries (the full summary is in gbrel.txt): Organism Reports Entries Bases ----------------------------------- ------- ------- -------- Zygosaccharomyces fermentati 1 1 5416 Saccharomycopsis fibuligera 3 3 9339 Candida boidinii 2 2 1863 Candida glabrata 3 3 2758 Candida albicans 8 6 11668 Candida tropicalis 15 12 20761 Saccharomyces cerevisiae 976 807 1420238 Transposable element TY1 41 36 43719 Saccharomyces diastaticus 4 4 4319 Candida pelliculosa 1 1 5327 Candida maltosa 5 4 8167 Saccharomyces carlsbergensis 22 19 36227 Hansenula wingei 3 3 720 Saccharomyces fibuligera 2 2 6761 Yarrowia lipolytica 5 4 11065 Kluyveromyces lactis 36 27 83929 Hansenula polymorpha 3 3 8018 Kluyveromyces fragilis 1 1 4193 Zygosaccharomyces rouxii 5 3 15025 Schizosaccharomyces pombe 110 92 154243 Pichia pastoris 3 3 899 Cephalosporium acremonium 4 4 2093 Yeast sp. 33 32 15660 Candida utilis 4 4 7578 Saccharomyces uvarum 1 1 2001 Kluyveromyces drosophilarum 1 1 4757 Saccharomyces rosei 1 1 278 Saccharomyces kluyveri 3 2 2160 Zygosaccharomyces bailii 1 1 5415 If the data files were complete, why were new files put on-line on August 13? After the GenBank Release 64 tapes were shipped and the files posted in the ftp area, we, during the course of preparing the CD ROM release, found a number of errors in feature locations. These were corrected and the files placed in the ftp directory. These corrections had the effect of reducing the number of bytes in each of the annotated data divisions. The number of entries and lines in those files is unchanged. Likewise, the indexes are unchanged. Regrettably, I overlooked posting a note to this newsgroup announcing that new versions of the Release 64 data files had been posted and why. I apologize for any inconvenience this has caused GenBank users. While, I am no longer in a position to guarantee that this will not happen in the future, I can say that our policy in the past has been to rectify our errors and alert users as soon as possible after identifying the error. Dave Kristofferson assures me that the new GenBank management will be even more vigilant in the future. > Would it be possible for those who FTP files to e-mail a note about the > files we transfer so that Genbank could automatically let us know if there > have been corrections? Perhaps a better solution would be "automatic" posting to this newsgroup when corrections and changes are made to the on-line data. That way those who forget to send the e-mail note will have opportunity to receive the notification as well. Sincerely, David Benton GenBank Staff benton@karyon.bio.net