Path: utzoo!utgpu!water!watmath!uunet!ig!daemon From: MJB1@PHX.CAM.AC.UK Newsgroups: bionet.molbio.evolution Subject: Sequence similarity: statistical analysis of Message-ID: <5774@ig.ig.com> Date: 6 Apr 88 12:58:37 GMT Sender: daemon@presto.ig.com Lines: 28 From: MJB1%PHX.CAM.AC.UK@CUNYVM.CUNY.EDU Surely the point here is that there are an infinite number of statistical models of sequence similarity. There is no problem in assigning significance under a particular model, thought there may well be a problem in assessing its biological relevance. I think the questions being asked should be (1) What is a good model for the similarity of molecular sequences. (2) How can one assess the biological relevance of statistical significance in relation to a particular model. Put in this way, one soon realises that the original problem has been framed in too broad a way. What are the conditions relating to the comparison, surely not just that we have sequenced too bits of DNA and want to know how similar they are (though it could be that if you insist). People should worry more about the conditions relating to the particular problem and try to get experimental evidence about biologically relevant parameters. To emphasise the point about conditions consider the old coin tossing problem. We all know that we come up heads half the time and tails half the time. But do we... the coin rolled down the drain and the result was indeterminate. My friend has made a ballistic machine which tosses the coin so that the way it lands depends which way it was placed on the machine before tossing. How much more complex then are the conditions under which DNA evolves. Trying to improve our knowledge about that for specific gene families would be a good thing to attempt. A completely general model is too broad and naive to be useful, I suspect.