Path: utzoo!utgpu!watserv1!watmath!uunet!bionet!kristoff
From: kristoff@genbank.BIO.NET (David Kristofferson)
Newsgroups: bionet.molbio.genbank
Subject: Re: Quality of submitted data
Message-ID: <Aug.15.15.00.13.1990.12645@genbank.BIO.NET>
Date: 15 Aug 90 22:00:13 GMT
References: <1990Aug15.204400.26622@gpu.utcs.utoronto.ca>
Organization: GenBank Online Service
Lines: 36

Larry,

	Thanks for several rather entertaining examples!

>     There are many examples of sequences in the GenBank database which I know
> to be incorrect (see above). Is there any way that my doubts can be 
> communicated to users of the database? 

Christian Burks and I started the GENBANK-BB (bionet.molbio.genbank)
newsgroup over three years ago precisely for these kinds of issues.
This forum allows rapid communication with the data bank staff and is
open to as many users of the data bank as care to sign on.  As I
stated in my reply to Ellen, open lines of communication/feedback are
obviously essential for any self-correcting mechanism to
exist/succeed.

> The accuracy of sequences that I 
> analyze ranges from 95-100% and half of the sequences have an accuracy of less
> than 99.6% or 4 errors in every 1000 nucleotides. These are sequences of genes
> in a highly conserved gene family where workers are able to compare their data
> with  published sequences. Imagine what the accuracy of sequences of newly
> discovered genes must be?

Whether this is a real problem depends on where the errors are.  All
scientific measurements are obviously prone to error but there is no
easy way to put "error bars" on sequence data.  However, unless the
data is horrendously flawed, wouldn't you rather have data that was
reasonably accurate versus no data at all?  At the very least it could
serve as a basis for further refinements.
-- 
				Sincerely,

				Dave Kristofferson
				GenBank On-line Service Manager

				kristoff@genbank.bio.net