Path: utzoo!utgpu!water!watmath!uunet!ig!daemon
From: MJB1@PHX.CAM.AC.UK
Newsgroups: bionet.molbio.evolution
Subject: Sequence similarity: statistical analysis of
Message-ID: <5774@ig.ig.com>
Date: 6 Apr 88 12:58:37 GMT
Sender: daemon@presto.ig.com
Lines: 28

From: MJB1%PHX.CAM.AC.UK@CUNYVM.CUNY.EDU

Surely the point here is that there are an infinite number of statistical
models of sequence similarity.  There is no problem in assigning significance
under a particular model, thought there may well be a problem in assessing
its biological relevance.  I think the questions being asked should be
(1) What is a good model for the similarity of molecular sequences.
(2) How can one assess the biological relevance of statistical significance
in relation to a particular model.

Put in this way, one soon realises that the original problem has been framed
in too broad a way.  What are the conditions relating to the comparison,
surely not just that we have sequenced too bits of DNA and want to know
how similar they are (though it could be that if you insist).

People should worry more about the conditions relating to the particular
problem and try to get experimental evidence about biologically relevant
parameters.  To emphasise the point about conditions consider the old coin
tossing problem. We all know that we come up heads half the time and tails
half the time.  But do we... the coin rolled down the drain and the
result was indeterminate.  My friend has made a ballistic machine which
tosses the coin so that the way it lands depends which way it was placed
on the machine before tossing.

How much more complex then are the conditions under which DNA evolves.
Trying to improve our knowledge about that for specific gene families
would be a good thing to attempt.  A completely general model is
too broad and naive to be useful, I suspect.