Path: utzoo!utgpu!watserv1!watmath!uunet!spool.mu.edu!uwm.edu!bionet!GENETICS.WASHINGTON.EDU!joe From: joe@GENETICS.WASHINGTON.EDU (Joe Felsenstein) Newsgroups: bionet.molbio.evolution Subject: Re: Some thoughts on what to do Message-ID: <9102051318.AA00877@evolution.genetics.washington.edu> Date: 5 Feb 91 12:18:07 GMT References: <9101260036.AA14419@genbank.bio.net> Sender: daemon@genbank.bio.net Lines: 40 John Gillespie wrote (relative to what to do next in molecular phylogney: > Here, here!! We have have known since '71 (Ohta and Kimura) that rates of > substitution vary. We also know the the frequency of the four nucleotides vary > through time. It is hard to imagine a characterization of the substitution > process that is farther from those assumed by most tree-construction > algorithms. Well, I can imagine LOTS of models that are even farther! Seriously, though, (1) variation of rate of evolution with time (lack of clockness) is definitely allowed in most methods of inferring phylogenies (Distance methods, ML, parsimony, invariants/evolutionary-parsimony), (2) variation of frequencies of nucleotides is not allowed in most programs but (a) if one is willing to accept the admittedly questionable independence of different sites, resampling methods such as the bootstrap allow one to investigate the empirical variability of inferences made with imperfect models, (b) check out Barry and Hartigan's 1987 paper in Statistical Science, which puts forward (among others) a model where the transition probability matrix varies arbitrarily from branch to branch and they can do maximum likelihood for it (in fact, it's easier than my ML). This would allow varying base frequencies in different parts of the tree. (c) we've got to do _something_, so we do what we know how. If John uses his considerable powers to formulate a model that is more realistic and continues to be computationally tractable, we will all be quite interested in it. A better model would have some specification of the distribution of possible equilibrium base frequencies and how quickly they can change as one moves along the tree, (3) Variation of nucleotide composition is real but I think a much more serious departure from reality in the models used for ML and distance methods is the equal rates of substitution at all sites. I have some ways one can specify unequal rates in my current ML programs and am working on ways the method can infer them instead of you having to specify rates. ----- Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195 Internet: joe@genetics.washington.edu (IP No. 128.95.12.41) Bitnet/EARN: felsenst@uwavm UUCP: ... uw-beaver!evolution.genetics!joe