Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!burl!ulysses!bellcore!decvax!decwrl!amdcad!lll-crg!seismo!mcvax!ukc!eeb From: eeb@ukc.UUCP (E.E.Bassett) Newsgroups: net.math.stat Subject: Re: Normal distribution probability problem. Message-ID: <587@eagle.ukc.UUCP> Date: Wed, 15-Jan-86 07:11:45 EST Article-I.D.: eagle.587 Posted: Wed Jan 15 07:11:45 1986 Date-Received: Thu, 23-Jan-86 20:52:17 EST References: <2792@ut-ngp.UUCP> Reply-To: eeb@ukc.UUCP (E.E.Bassett) Distribution: net Organization: U of Kent at Canterbury, Canterbury, UK Lines: 55 Two questions from the same set of facts. Assume you have two populations, A and B, for which you have full information (i.e., the value of every event in each population). 1) If you draw a sample of size n = 1 from each population, what is the probability that the sample from population A is larger than the sample from population B? 2) Now assume the only information you have about the two populations is based on samples of, say, size n = 10. Thus you have the mean and standard deviation of a sample from each population, but know nothing about the individual events within the populations. Now what is the probability that a single sample drawn from population A will be larger than one drawn from population B? I think the answer to (1) is simply a z-value and the probability is the area under the normal curve. But I haven't a clue on how to work it out if you only have sample data. I'm interested in theory as well as an algorithm. Any help will be appreciated. Thanks. The answer to (1) is easy: yes, the probability is what you term a z-value. To be specific, let A and B represent the random variables from the two populations; let A be distributed N(meana, vara) and B N(meanb,varb). (Note that the second parameter shown is the variance rather than its square root, the s.d.) Then, since linear combinations of normals are themselves normal, A - B is N (meana - meanb , vara + varb) , and the probability you require is simply Pr (A - B > 0). Substituting the (known) values for the means and variances of A and B is then easy. The answer to (2) is more tricky. Since you have sample information about the distributions you can estimate the means and standard deviations in the usual way, i.e. by the sample means and sample standard deviations. Substituting these in the formula obtained for (1) will now give an estimate of the probability you require. Of course, many statisticians - particularly the Bayesian variety - will argue that one should be able to express odds on A exceeding B in the situation described in (2). So stick in the odd prior distribution or two (actually four, if you can take the means and variances as independent), turn a handle or four and out comes the posterior probability. Haven't tried it, but it looks rather Fisher-Behrens'ish to me. Eryl Bassett Univ. of Kent Canterbury, U.K.