Path: utzoo!utgpu!watserv1!watmath!uunet!bionet!kristoff From: kristoff@genbank.BIO.NET (David Kristofferson) Newsgroups: bionet.molbio.genbank Subject: Re: Quality of submitted data Message-ID: Date: 15 Aug 90 22:00:13 GMT References: <1990Aug15.204400.26622@gpu.utcs.utoronto.ca> Organization: GenBank Online Service Lines: 36 Larry, Thanks for several rather entertaining examples! > There are many examples of sequences in the GenBank database which I know > to be incorrect (see above). Is there any way that my doubts can be > communicated to users of the database? Christian Burks and I started the GENBANK-BB (bionet.molbio.genbank) newsgroup over three years ago precisely for these kinds of issues. This forum allows rapid communication with the data bank staff and is open to as many users of the data bank as care to sign on. As I stated in my reply to Ellen, open lines of communication/feedback are obviously essential for any self-correcting mechanism to exist/succeed. > The accuracy of sequences that I > analyze ranges from 95-100% and half of the sequences have an accuracy of less > than 99.6% or 4 errors in every 1000 nucleotides. These are sequences of genes > in a highly conserved gene family where workers are able to compare their data > with published sequences. Imagine what the accuracy of sequences of newly > discovered genes must be? Whether this is a real problem depends on where the errors are. All scientific measurements are obviously prone to error but there is no easy way to put "error bars" on sequence data. However, unless the data is horrendously flawed, wouldn't you rather have data that was reasonably accurate versus no data at all? At the very least it could serve as a basis for further refinements. -- Sincerely, Dave Kristofferson GenBank On-line Service Manager kristoff@genbank.bio.net