Path: utzoo!attcan!uunet!cs.utexas.edu!yale!cmcl2!phri!murphy From: murphy@phri.nyu.edu (Ellen Murphy) Newsgroups: bionet.molbio.genbank Subject: Re: Quality of submitted data Message-ID: <1990Aug15.175713.17794@phri.nyu.edu> Date: 15 Aug 90 17:57:13 GMT Sender: news@phri.nyu.edu (News System) Organization: Public Health Research Institute, New York City Lines: 46 In article kristoff@genbank.BIO.NET (David Kristofferson) writes: > > The basic fact which has been brought up by journal editors >repeatedly is that the vast majority of reviewers who get a paper >containing sequence data in hardcopy are not going to take the time to >enter the data into a computer. Surely you are not suggesting that reviewers are expected to type sequences into their computers whenever they get a sequence paper to review? And to what end? Just to verify that what the author claims to be an ORF really is? Are we supposed to request copies of their films so we can re-read their gels? Sequencing gels are raw data like any other, most of which never makes it into manuscripts. As reviewers we have to assume that the data as presented accurately reflects the data collected, even if it is several stages removed. That doesn't mean that I won't comment on the interpretation; most people way overinterpret the sequence features and homologies that they find. If somebody presents a sequence as a promoter, I ask for the S1 data or at least the insertion of the word "putative". However I haven't noticed journal editors making much of a fuss about this. I do always request (and also do not usually get) a statement of what percent was sequenced on both strands. I also think that any ambiguities should be pointed out, with an explanation of why one reading was chosen over another. I once did manage to correct an error of this sort (the authors had an ambiguous base, chose to go with one strand, ended up in the wrong frame for the C-terminal 20% of the protein, and then chose the wrong ATG to compensate, since the size of the protein was known). The paper came to my attention at the galley stage, and I noticed the error only because I had just finished the sequence of a protein with 30% identity. The ambiguity was not mentioned in the paper, but was admitted on the telephone; it got corrected before going to press, but just barely. There's no way anybody else could have caught this-certainly the reviewers of this paper couldn't have been expected to, based on the data in the paper. Finally, a question: how is one supposed to refer to sequences in Genbank that have not been, and probably never will be, published elsewhere? I think it's fantastic that people are willing to send unpublished sequences to Genbank and I don't want to discourage the practice by not giving proper credit. Ellen Murphy The Public Health Research Institute murphy@phri.nyu.edu