Path: utzoo!utgpu!watserv1!watmath!att!pacbell.com!decwrl!uunet!bionet!lhc!ncifcrf!fcs260c2!toms From: toms@fcs260c2.ncifcrf.gov (Tom Schneider) Newsgroups: bionet.molbio.bio-matrix Subject: Re: In defense of the Genome Boondoggle Message-ID: <2054@fcs280s.ncifcrf.gov> Date: 14 Feb 91 22:09:47 GMT References: <12145@ur-cc.UUCP> <2050@fcs280s.ncifcrf.gov> <5714@husc6.harvard.edu> Sender: news@ncifcrf.gov Organization: NCI Supercomputer Facility, Frederick, MD Lines: 178 In article <5714@husc6.harvard.edu> Ellington@Frodo.MGH.Harvard.EDU (Deaddog): >In article <2050@fcs280s.ncifcrf.gov> toms@fcs260c2.ncifcrf.gov (Tom >Schneider): >> learning how to identify genes from raw >> sequences alone. Predictions can be tested - which leads to rapid >> discovery of new genes. > >As does PCR amplification or hybridization: the analogue versions of your >digital statistical analyses. Wrong. Those techniques only allow one to jump from previously identified sequences in other species to the human sequence. This is a wonderful thing, but it does not allow one to take a pure raw sequence and identify the genetic control systems in it. The difference is that those techniques are only techniques, not theoretical understanding. And if you are going to poo-pa theoretical understanding, then I have some papers for you to read! Start with: @article{StormoPerceptron1982, author = "G. D. Stormo and T. D. Schneider and L. Gold and A. Ehrenfeucht", title = "Use of the `Perceptron' algorithm to distinguish translational initiation sites in {E. coli.}", year = "1982", journal = "Nucl. Acids Res.", volume = "10", pages = "2997-3011"} > The question is not whether some genes will >be identified, the question is (a) how many could already be identified >without the sequence of the genome, and (b) whether the (IMO paltry) >number that remain be worth the enormous cost? I'm sure that we can continue on the blind route we are following and find lots of interesting things eventually. The US road system comes to mind. Sure, we could have survived without a network of major roads. But having started on the big project, we were able to become much more integrated as a society, and now it is hard to imagine not having superhighways (or are they merely PARKways? And why is the place one parks the car in the DRIVEway?? :-). Similar things could be said about a uniform telephone system: we have (had??) the best in the world because people at Bell labs thought big. A third example is the improvement in making maps that Landsat and other satellites have given us. And, yes, Arpanet turned into internet. In all these cases we start off ad hoc and then eventually learn to do things systematically. Consider the cow paths you use to get to work! (I refer to the roads of Boston.) Would you like to use muddy winding paths? The genome project is merely a recognition that we are close to the time that we can make our maps in a direct logical way rather than piece meal. >Statisticians drool at the mounds of data to be created. And so might the rest of the biologists. They can use the data to direct their experiments more effectively. If they are afraid of math and computers (is that your problem?? :-) then there are plenty of theoretical-types whom they can team up with. >> avoid the terrible biases that we >> currently have in the GenBank database. > >I'm sorry, but this does not seem like a terribly important >problem. GenBank is skewed. Big deal. It gets the job done. >We find genes, we miss some stuff. Science slops along and >we still find those self-splicing introns and centromeres and >other cool things. Without the sequence of the human genome. >And with many people happily employed (for now) producing >gobs of worthwhile data. The problem is here, and getting worse. You apparently haven't tried to make a consistent dataset from the data in GenBank. It's a tough job! The point about the genome project is that we don't need to miss anything anymore. You seem to have the idea that some genes are not important, and that 'junk' DNA exists in the genome. Consider that this merely is a way for you to express to the rest of us how ignorant you are. (We are also, but we admit it. Do you admit that you are ignorant?) >I mean, what's a good example of what we have missed? We know >the Shine/Dalgarno sequences. Well, you missed the other statistically important features that were discovered by looking at the sites more carefully. See: @article{Gold1981, author = "L. Gold and D. Pribnow and T. Schneider and S. Shinedling and B. S. Singer and G. Stormo", title = "Translational initiation in prokaryotes.", year = "1981", journal = "Annu. Rev. Microbiol.", volume = "35", pages = "365-403"} @article{StormoInitiation1982, author = "G. D. Stormo and T. D. Schneider and L. M. Gold", title = "Characterization of translational initiation sites in {{E. coli.}}", year = "1982", journal = "Nucl. Acids Res.", volume = "10", pages = "2971-2996"} @article{Schneider1986, author = "T. D. Schneider and G. D. Stormo and L. Gold and A. Ehrenfeucht", title = "Information content of binding sites on nucleotide sequences", year = "1986", journal = "J. Mol. Biol.", volume = "188", pages = "415-431"} > We have learned far more from >mutation than we would by sequencing a bacterial genome (note: >sequencing the Coli genome is indeed a cool thing to do). This is a completely flip statement, with no foundation since you didn't quantitate your answer and the experiment has not been done. (But I do agree that getting that sequence will be cool.) Genetics is certainly a powerful way to approach biological problems. But once one has defined a biolgically interesting system, direct methods can produce answers that would be difficult if not impossible to get by genetics. For example, the sequence of a gene, or exactly what bases are important for a promoter to function. See: @article{Schneider1989, author = "T. D. Schneider and G. D. Stormo", title = "Excess Information at Bacteriophage {T7} Genomic Promoters Detected by a Random Cloning Technique", year = "1989", journal = "Nucl. Acids Res.", volume = "17", pages = "659-674"} >And will the "insides of introns" generate data for 2 PNAS papers and >a TIBS review, yes. The work of Andrez Konopka is an example you seem to have missed. >or will it actually be worth the billions of >dollars it will take to properly correct this horrific accounting >error? Your mistake here is to suggest that the genome project would only give these data. It would give much other data also. >> The second major justification is the enormous boost to sequencing >> technology that the project is making. > >Good sequencing technology stands on its own. It does not need the Genome >Boondoggle to help it along. You have missed the point. The project will focus more people on the problems of sequencing, and the art will improve as a result. >> We are eventually going to be able to sequence >> everybody's DNA in a few minutes. > >Matrix-teers: Is this nuts or what? I've never seen this before, but >if it is even remotely true, I'll eat the small plastic rats that reside >on the top of my terminal. Ever heard of nanotechnology? Well, bone up if you are ignorant. I'll forgive you, you don't need to eat those rats. Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov