Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site ecsvax.UUCP Path: utzoo!watmath!clyde!burl!ulysses!unc!mcnc!ecsvax!hes From: hes@ecsvax.UUCP Newsgroups: net.math.stat Subject: Re: Normal distribution probability problem. Message-ID: <1070@ecsvax.UUCP> Date: Sun, 12-Jan-86 14:11:52 EST Article-I.D.: ecsvax.1070 Posted: Sun Jan 12 14:11:52 1986 Date-Received: Mon, 13-Jan-86 08:01:19 EST References: <2792@ut-ngp.UUCP> Distribution: net Organization: NC State Univ. Lines: 59 > Two questions from the same set of facts. Assume you have two > populations, A and B, for which you have full information (i.e., > the value of every event in each population). Since the Subject: line says that this is for the normal distribution - full information means knowing the mean and variance of each population. (If the question was about two finite populations, then we've got a different subject.) > > 1) If you draw a sample of size n = 1 from each population, > what is the probability that the sample from population A is larger > than the sample from population B? > It's a double integral, based on the conditional probability: Prob{Obs from A > Obs from B} = Int of distn of Oa [for each Oa find prob that Ob < Oa] where Oa is an obs(ervation) from pop A, and Ob from pop B. Each distn (distribution) is normal with appropriate mean and var. The prob inside the bracket is the left tail of B integrated from neg inf to Oa (and so is a cumulative normal). The first integral is taken over the neg inf to pos inf range of Oa. So we have: pos inf Oa Int Na Int Nb dOb dOa neg inf neg inf as the desired probability. Na and Nb are the two normal distribution functions, where the random variables are called Oa and Ob. In the language you use below, the left Na is an x value, and the integral of Nb is an area under the normal curve. This may simplify. > 2) Now assume the only information you have about the two > populations is based on samples of, say, size n = 10. Thus you have > the mean and standard deviation of a sample from each population, but > know nothing about the individual events within the populations. Now > what is the probability that a single sample drawn from population A > will be larger than one drawn from population B? > One certainly could use the same formula above, with estimates of the means and variances of populations A and B replacing the parameters in Na,Nb. That would yield an estimate of the desired probability. (One can't know the actual probability without knowing the parameters of the two populations.) I'd rather not conjecture as to the properties of the estimator of the probability. > I think the answer to (1) is simply a z-value and the > probability is the area under the normal curve. But I haven't a clue > on how to work it out if you only have sample data. I'm interested in > theory as well as an algorithm. Any help will be appreciated. > The theory is the double integral- the algorithm is left as an exercise for the reader. :-) (This integration should be reasonably easy to do numerically.) > Thanks. --henry schaffer