Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!rice!uw-beaver!milton!lisbon!almond
From: almond@lisbon.stat.washington.edu (Russell Almond)
Newsgroups: comp.ai
Subject: Re: info wanted on learning probabilities
Message-ID: <15562@milton.u.washington.edu>
Date: 31 Jan 91 23:53:41 GMT
References: <nj8Gq5ta@cs.psu.edu> <1991Jan30.213055.25485@cs.ucla.edu> <RPG.91Jan30172146@rex.cs.tulane.edu>
Sender: news@milton.u.washington.edu
Organization: U.W. Department of Statistics
Lines: 77

I would also be interested in information on learning probabilities
and I am willing to compile and post a reference list.

Let me recap the problem as I understand it.  

Sehyeong Cho asks about learning a conditional probability $P(A|B)$.
The standard Bayesian model call $P(A|B)$ some parameter $\theta$.
{\it A priori\/}, that is before any data are available, a probability
distribution is used to express our state of knolwedge (or ignorance)
about the parameter $\theta$.  This is called a "prior distribution."
There is some disagreement in the statistical community about what the
correct prior distribution for representing ignorance in this case is.
Bayes and Laplace advocated using a uniform distribution, which is
also a beta distribution with parameters (1,1).  Jeffreys advocates
using a beta (1/2,1/2).  Others have advocated a beta(0,0) which is
really not a probability distribution, but is the limit of a series of
probability distributions.  There are other possibilities which are
not beta distributions, but they lead to greater complexity.

Assume for the sake of simplicity we have chose as our prior
distribution a beta distribution with (hyper)parameters
$\alpha,\beta$.  We then observe $n$ cases in which $B$ occurs and
that in $x$ of them A occurs as well.  Note that we must make an
additional assumption here that our observation is unbiased; that is,
we have no reason to believe that if $(A,B)$ occurs we are no more
likely have it brought to our attention than if $(\neg A,B)$ occurs.
This might not be the case in Sehyeong's original example, if for
example, newspapers were more likely to omit reporting on SCUD
launches if no deaths occur.  Assuming this is the case, we are lead
to believe that {\it a posteriori\/} our knowledge about the parameter
has a beta distribution with (hyper)parameters $\alpha+x, \beta+n-x$.

There is a slightly more complex belief function formulation of this
problem (based loosly on Fisher's fiducial arguments) which results
instead of an exact probability distribution for $\theta$ upper and
lower bounds for $\theta$ in the form of a "bivariate beta" belief
function.  This is developed in Dempster[1966], and recaped in
Almond[1989,1991].  The bounds capture the poseterior distributions
corresponding to all three of the posterior distributions cited above.
Using these bound has the advantage of simpler assumptions but the
disadvantages of greater computational complexity and weaker
decision-making power.

Robert (Goldman) brings up the next logical question, which is the one
that I am currently working on.  Suppose we have build a probabilistic
graphical model of the kind developed in Pearl[1988] or Lauritsen and
Spieglehalter[1988].  We jointly elict the probabilities
$\theta_1=P(A|B)$ and $\theta_2=P(A|\neg B)$, but there is uncertainty
about those values.  By the Bayesian paradigm, we should express that
uncertainty by a (joint) probability distribution over the two
parameters.

This is the upshot of the 1990 Lauritzen and Speigelhalter paper which
Robert cites.  There are some non-trivial technical problems here
of which L&S only scratch the surface.  For example, L&S build a
number of models which assume the independence of $\theta_1$ and
$\theta_2$.  This assumption is particularly suspect, even if only
made for the sake of convenience.  In the case of many graphical
models, it may be the case that we known with some certainty that
$P(A|B) > P(A|\neg B)$ or visa versa.  L&S also note that when they
observe incomplete data (that is observe A but not B) that
$\theta_1$ and $\theta_2$ will be dependent {\it a posteriori\/}, even
if they are independent {\it a priori\/}.  

David Madigan, Jeremy York and I have been noodling around with some
alternative models, but we have not yet written anything up.  I would
be eager to talk with anybody else who is working on the problem.

	--Russell Almond
	University of Washington,
	Department of Statistics, GN--22
	Seattle, WA  98195
	(206) 543-4302
	almond@stat.washington.edu