Path: utzoo!mnetor!uunet!husc6!bbn!oberon!ll-xn!vanhove From: vanhove@XN.LL.MIT.EDU (Patrick Van Hove) Newsgroups: comp.ai Subject: Re: Probability Bounds from Bayes Theory: (A Problem). Message-ID: <838@xn.LL.MIT.EDU> Date: 24 Dec 87 16:32:56 GMT References: <1578@brahma.cs.hw.ac.uk> Organization: MIT Lincoln Laboratory, Lexington, MA Lines: 133 In-reply-to: tom@cs.hw.ac.uk's message of 11 Dec 87 14:10:21 GMT Here are my 2 pennies' worth LITTLE FLAME ------------ First, a few words about the problem statement. The data is well explained, but very few words are written about what one is looking for. In addition, the data "p(~l, ~t) = 0.95" which appears in the "PRIORS" doesn't seem to appear anywhere in the "PROBLEM". Now, this piece of data is crucial to the solution, because it tells us that, although 3% have tb and 4% have cancer, these overlap in such a way that the total is only 5%. MY SOLUTION ----------- One can approach this problem in the following way: there are three events: t, l, pv. The combination of these events results in 8 possible outcomes for every person. These can be represented, for example, by a Karnaugh map. The complete statistical description of these events is represented by the probabilities of these 8 events. The following constraints apply: The probabilities sum to 1. (1 =ity constraint) There are 3 known priors (3 =ity constraints) There are 3 known conditional (3 =ity constraints) All probabilities are non-negative (8 ~=ity constraints) Since there is no redundancy in the equality constraints (proof left to reader), there is a set of solutions depending on one parameter, and this set is not empty, as the inequalities can all be satisfied simultaneously. In a problem of this type, it may always be necessary to determine all the probabilities to find the desired solution, but in this particular case, it is. The solution is underdetermined, but the following conclusions can be obtained by grinding through the algebra (algebra given at the end of this message): The upper bound of p(t,l|pv) is attained when all the cases of t which are undetected correspond to patients with t and ~l. I found that upper bound [ p(t,l|pv) ] = .1802.. In that case, by the way, p(pv|t,l) is 1.0! The lower bound of p(t,l|pv) is attained when all the cases where t is undetected correspond to t and l , whereas all cases of t and ~l are properly detected. I found lower bound [ p(t,l|pv) ] = .1644.. In that case, p(pv|t,l) is 0.925! It is interesting to see that the extreme bounds for the requested probability are pretty tight. This is only possible because the prior knowledge that p(~l, ~t) = 0.95 implies a serious overlap between t and l. As both are detected "most of the time", their combination must be detected "most of the time" too. If there was no overlap between t and l, then the answer would be undetermined. As a consequence, if the data did not guarantee an overlap between t and l, the bounds on the desired probability would be very weak. An other interesting aspect of this solution is that it was done based only on probability theory; it is liekly that, knowing the physics of the real problem , one can formulate valid additional constraints. For example, it might be reasonable to assume that p(pv|l and t) is at least as large as p(pv|l and ~t) and as p(pv|~l and t); with this assumption, you would get tighter bounds. But thus assumption was not part of the problem statement. Finally, the algebraic statement of the problems has 8 variables and 7 equality constraints; if you give any additional linear equality constraint which is not redundant with the other 7, then you reduce to set of possibilities to a single solution, or none if the inequalities can't be met. THE SOLUTION BY ALGEBRA Notation: x1 = p( l, ~t, ~pv) x2 = p( l, ~t, pv) x3 = p( l, t, ~pv) x4 = p( l, t, pv) x5 = p(~l, t, ~pv) x6 = p(~l, t, pv) x7 = p(~l, ~t, pv) x8 = p(~l, ~t, ~pv) Constraints: (1): all xi >= 0 (2): Sum all xi = 1 (3): x1+x2+x3+x4 = 0.04 (4): x3+x4+x5+x6 = 0.03 (5): x7+x8 = 0.95 (6): [x2+x4] / [x1+x2+x3+x4] = 0.9 (7): [x4+x6] / [x3+x4+x5+x6] = 0.95 (8): x7 / [x7+x8] = 0.07 From all the equalities, one can express all the probabilities as a function of x4. x1 = -0.016 + x4 x2 = 0.036 - x4 x3 = 0.02 - x4 x5 = -0.0185 + x4 x6 = 0.0285 - x4 x7 = 0.0665 x8 = 0.8835 Since x1, x2, x3, x5, x6 must be non-negative, x4 must comply with the following bounds lower bounds: 0.016 , 0.0185 upper bounds: 0.036, 0.02, 0.0285 As a consequence, we must have that 0.0185 < x4 < 0.02 p(t and l | pv) = p(t and l and pv) / p ( pv ) = x4 / [ x2+x4+x6+x7 ] = x4 / ( 0.131 - x4 ) The bounds are: 0.1644.. < x4 < 0.1802.. Patrick Van Hove