Path: utzoo!mnetor!uunet!husc6!bloom-beacon!tut.cis.ohio-state.edu!mailrus!ames!umd5!purdue!i.cc.purdue.edu!k.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: sci.bio Subject: Re: similarity searching; statistical significance Message-ID: <759@l.cc.purdue.edu> Date: 20 Apr 88 11:10:55 GMT References: <18202@beta.UUCP> Organization: Purdue University Statistics Department Lines: 22 Keywords: DNA, RNA, protein, statistical significance Summary: Statistical significance has little to do with practical significance Statistical significance seems to be misunderstood by almost everyone in the sciences. A test at a given level, say 5%, is merely a filter such that the probability that a _total fraud_ would pass it is (<) 5%. It says nothing about the probability that an important effect would show up, or that a very unimportant effect would trigger rejection. If one thinks carefully about the problem, the idea that two DNA sequences are totally independent is ridiculous. Thus the problem of testing whether there is total independence is nonsense. The problem is "when to accept a hypothesis which must be false." This problem is quite difficult; I am among the few who have made some progress. In a given situation, the assumptions of the scientist may be enough together with the data that there is a clear course of action. However, simple rules do not exist. The biologist wants a simple, unambiguous method which does not require the making of assumptions. This not only does not exist, but cannot exist. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet