Path: utzoo!mnetor!uunet!husc6!bloom-beacon!tut.cis.ohio-state.edu!mailrus!ames!umd5!purdue!i.cc.purdue.edu!k.cc.purdue.edu!l.cc.purdue.edu!cik
From: cik@l.cc.purdue.edu (Herman Rubin)
Newsgroups: sci.bio
Subject: Re: similarity searching; statistical significance
Message-ID: <759@l.cc.purdue.edu>
Date: 20 Apr 88 11:10:55 GMT
References: <18202@beta.UUCP>
Organization: Purdue University Statistics Department
Lines: 22
Keywords: DNA, RNA, protein, statistical significance
Summary: Statistical significance has little to do with practical significance


Statistical significance seems to be misunderstood by almost everyone in the
sciences.  A test at a given level, say 5%, is merely a filter such that the
probability that a _total fraud_ would pass it is (<) 5%.  It says nothing
about the probability that an important effect would show up, or that a very
unimportant effect would trigger rejection.

If one thinks carefully about the problem, the idea that two DNA sequences
are totally independent is ridiculous.  Thus the problem of testing whether
there is total independence is nonsense.  The problem is "when to accept a
hypothesis which must be false."  This problem is quite difficult; I am among
the few who have made some progress.  In a given situation, the assumptions
of the scientist may be enough together with the data that there is a clear
course of action.  However, simple rules do not exist.

The biologist wants a simple, unambiguous method which does not require the
making of assumptions.  This not only does not exist, but cannot exist.

-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet