Path: utzoo!attcan!uunet!cs.utexas.edu!yale!cmcl2!phri!murphy
From: murphy@phri.nyu.edu (Ellen Murphy)
Newsgroups: bionet.molbio.genbank
Subject: Re: Quality of submitted data
Message-ID: <1990Aug15.175713.17794@phri.nyu.edu>
Date: 15 Aug 90 17:57:13 GMT
Sender: news@phri.nyu.edu (News System)
Organization: Public Health Research Institute, New York City
Lines: 46


In article <Aug.15.01.48.02.1990.13594@genbank.BIO.NET> kristoff@genbank.BIO.NET (David Kristofferson) writes:
>
>	The basic fact which has been brought up by journal editors
>repeatedly is that the vast majority of reviewers who get a paper
>containing sequence data in hardcopy are not going to take the time to
>enter the data into a computer.

     Surely you are not suggesting that reviewers are expected to type
sequences into their computers whenever they get a sequence paper to
review?  And to what end?  Just to verify that what the author claims
to be an ORF really is?  Are we supposed to request copies of their
films so we can re-read their gels?  Sequencing gels are raw data like
any other, most of which never makes it into manuscripts.  As reviewers
we have to assume that the data as presented accurately reflects the data
collected, even if it is several stages removed.  That doesn't mean
that I won't comment on the interpretation; most people way overinterpret
the sequence features and homologies that they find.  If somebody
presents a sequence as a promoter, I ask for the S1 data or at least
the insertion of the word "putative".  However I haven't noticed
journal editors making much of a fuss about this.

     I do always request (and also do not usually get) a statement of what
percent was sequenced on both strands.  I also think that any
ambiguities should be pointed out, with an explanation of why one
reading was chosen over another.  I once did manage to correct an error
of this sort (the authors had an ambiguous base, chose to go with one
strand, ended up in the wrong frame for the C-terminal 20% of the
protein, and then chose the wrong ATG to compensate, since the size of
the protein was known).  The paper came to my attention at the galley
stage, and I noticed the error only because I had just finished the
sequence of a protein with 30% identity.  The ambiguity was not
mentioned in the paper, but was admitted on the telephone; it got
corrected before going to press, but just barely.  There's no way
anybody else could have caught this-certainly the reviewers of this
paper couldn't have been expected to, based on the data in the paper.

    Finally, a question:  how is one supposed to refer to sequences in
Genbank that have not been, and probably never will be, published
elsewhere?  I think it's fantastic that people are willing to send
unpublished sequences to Genbank and I don't want to discourage the practice
by not giving proper credit.

Ellen Murphy
The Public Health Research Institute
murphy@phri.nyu.edu