Path: utzoo!utgpu!watserv1!watmath!uunet!wuarchive!uwm.edu!bionet!lhc!ncifcrf!fcs260c2!toms From: toms@fcs260c2.ncifcrf.gov (Tom Schneider) Newsgroups: bionet.molbio.bio-matrix Subject: Re: In defense of the Genome Boondoggle Message-ID: <2050@fcs280s.ncifcrf.gov> Date: 12 Feb 91 16:14:04 GMT References: <9102111942.AA08834@genbank.bio.net> <12145@ur-cc.UUCP> Sender: news@ncifcrf.gov Organization: NCI Supercomputer Facility, Frederick, MD Lines: 70 In article <12145@ur-cc.UUCP> elmo@troi.cc.rochester.edu (Eric Cabot) writes: >In article <9102111942.AA08834@genbank.bio.net> gunnell@FCRFV1.NCIFCRF.GOV ("Gunnell, Mark") writes: >>In article <9102111731.AA00773@genbank.bio.net> >>Ellington@frodo.mgh.harvard.edu (Deaddog) writes: >> >>> >>> Make me a list of similar worth that has to do with the Genome Boondoggle. >> >>Catalogue all human genes! Discover the functions of mapped genes; see how >>genes evolve; evaluate molecular evolution theories and how species originate; >>find amazing biological phenomena never before observed by human eyes. Yes, >>all these and more can ... etc.,etc. 8-) >You *must* be either kidding us or yourself! >But seriously, item 1 is hardly possible, item >2 is probably not possible, and the remaining items are not even >close to possible from a mere sequence determination of the (a?) >human genome. I think that Mark is exactly correct, and you have missed the point. Having a huge database full of human sequences opens vistas for those of us who know how to use statistical tools to analyse sequences. There are many things that can be done. Some of them include learning how to identify genes from raw sequences alone. Predictions can be tested - which leads to rapid discovery of new genes. I have been involved in two cases of this already (see Stormo et al NAR 10:2997 1982 for the first example of gene identification by computer; the second one is in preparation), and it will certainly will happen more as people use neural nets more. A straight sequencing of the genome will avoid the terrible biases that we currently have in the GenBank database. For example, the database is missing the insides of introns. If you think that these are not important, then you may well be in for some super surprises later. The phrase "junk DNA" is a statement of ignorance, not scientific fact. People currently chop off the bases near the 3' sides of introns and don't report them in the database. The proof is that they often end 10, 20 or 30 bases from the splice junction. This would not happen if people reported all their data. Unfortunately, this means that people have thrown out important parts of splice junctions BECAUSE THEY THOUGHT THEY WERE UN-IMPORTANT. Do you follow? People think something is not important, so they don't report it in the database, or limit the reports, so nobody discovers that it IS important! Another example is the reporting of only the coding sequence of a procaryotic gene, even though we KNOW that there is a region upstream (the Shine/Dalgarno) which is important for translational initiation. Any statistical analysis of human sequences must be done carefully to avoid biases from the highly over-represented immunoglobulin and MHC sequences. I'm sure you can think of other examples. A complete sequence, without any bias is the best way to get around this. I think that that alone justifies the project. The second major justification is the enormous boost to sequencing technology that the project is making. We are eventually going to be able to sequence everybody's DNA in a few minutes. This will have enormous medical implications, since it will remove much guess work from medicine. I also used to think that the project was foolish, but these reasons have convinced me that it is worthwhile. There is also the spirit of adventure. Fred Blattner once pointed out that it would be really neat (my words, not his) to have the entire sequence of E. coli - simply because it would be the first time that we knew the entire specification of a living organism. (Viruses don't count since they are dependent on the host.) >Eric Cabot elmo@uhura.cc.rochester.edu elmo@uordbv.bitnet Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov