Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!rice!uw-beaver!milton!lisbon!almond From: almond@lisbon.stat.washington.edu (Russell Almond) Newsgroups: comp.ai Subject: Re: info wanted on learning probabilities Message-ID: <15562@milton.u.washington.edu> Date: 31 Jan 91 23:53:41 GMT References: <1991Jan30.213055.25485@cs.ucla.edu> Sender: news@milton.u.washington.edu Organization: U.W. Department of Statistics Lines: 77 I would also be interested in information on learning probabilities and I am willing to compile and post a reference list. Let me recap the problem as I understand it. Sehyeong Cho asks about learning a conditional probability $P(A|B)$. The standard Bayesian model call $P(A|B)$ some parameter $\theta$. {\it A priori\/}, that is before any data are available, a probability distribution is used to express our state of knolwedge (or ignorance) about the parameter $\theta$. This is called a "prior distribution." There is some disagreement in the statistical community about what the correct prior distribution for representing ignorance in this case is. Bayes and Laplace advocated using a uniform distribution, which is also a beta distribution with parameters (1,1). Jeffreys advocates using a beta (1/2,1/2). Others have advocated a beta(0,0) which is really not a probability distribution, but is the limit of a series of probability distributions. There are other possibilities which are not beta distributions, but they lead to greater complexity. Assume for the sake of simplicity we have chose as our prior distribution a beta distribution with (hyper)parameters $\alpha,\beta$. We then observe $n$ cases in which $B$ occurs and that in $x$ of them A occurs as well. Note that we must make an additional assumption here that our observation is unbiased; that is, we have no reason to believe that if $(A,B)$ occurs we are no more likely have it brought to our attention than if $(\neg A,B)$ occurs. This might not be the case in Sehyeong's original example, if for example, newspapers were more likely to omit reporting on SCUD launches if no deaths occur. Assuming this is the case, we are lead to believe that {\it a posteriori\/} our knowledge about the parameter has a beta distribution with (hyper)parameters $\alpha+x, \beta+n-x$. There is a slightly more complex belief function formulation of this problem (based loosly on Fisher's fiducial arguments) which results instead of an exact probability distribution for $\theta$ upper and lower bounds for $\theta$ in the form of a "bivariate beta" belief function. This is developed in Dempster[1966], and recaped in Almond[1989,1991]. The bounds capture the poseterior distributions corresponding to all three of the posterior distributions cited above. Using these bound has the advantage of simpler assumptions but the disadvantages of greater computational complexity and weaker decision-making power. Robert (Goldman) brings up the next logical question, which is the one that I am currently working on. Suppose we have build a probabilistic graphical model of the kind developed in Pearl[1988] or Lauritsen and Spieglehalter[1988]. We jointly elict the probabilities $\theta_1=P(A|B)$ and $\theta_2=P(A|\neg B)$, but there is uncertainty about those values. By the Bayesian paradigm, we should express that uncertainty by a (joint) probability distribution over the two parameters. This is the upshot of the 1990 Lauritzen and Speigelhalter paper which Robert cites. There are some non-trivial technical problems here of which L&S only scratch the surface. For example, L&S build a number of models which assume the independence of $\theta_1$ and $\theta_2$. This assumption is particularly suspect, even if only made for the sake of convenience. In the case of many graphical models, it may be the case that we known with some certainty that $P(A|B) > P(A|\neg B)$ or visa versa. L&S also note that when they observe incomplete data (that is observe A but not B) that $\theta_1$ and $\theta_2$ will be dependent {\it a posteriori\/}, even if they are independent {\it a priori\/}. David Madigan, Jeremy York and I have been noodling around with some alternative models, but we have not yet written anything up. I would be eager to talk with anybody else who is working on the problem. --Russell Almond University of Washington, Department of Statistics, GN--22 Seattle, WA 98195 (206) 543-4302 almond@stat.washington.edu