Path: utzoo!attcan!uunet!cs.utexas.edu!news-server.csri.toronto.edu!neuron.ai.toronto.edu!ai.toronto.edu!tap
From: tap@ai.toronto.edu (Tony Plate)
Newsgroups: comp.ai.neural-nets
Subject: Re: NN solution of non-deterministic problems. Doable or stupid?
Message-ID: <90Aug2.153402edt.322@neuron.ai.toronto.edu>
Date: 2 Aug 90 19:34:46 GMT
References: <14121@shlump.nac.dec.com> <spoffojj.649522001@lgn> <6910@ptolemy.arc.nasa.gov>
Organization: Department of Computer Science, University of Toronto
Lines: 54

In article <6910@ptolemy.arc.nasa.gov> mehra@ptolemy.arc.nasa.gov (Pankaj Mehra) writes:
>In article <spoffojj.649522001@lgn> spoffojj@hq.af.mil (Jason Spofford) writes:
>The original query was:
>
>> Message-ID: <14121@shlump.nac.dec.com>
>> I'd like to know if it's possible to use neural nets to solve problems
>> that aren't fully deterministic, that is, similar inputs produce two or
>> more different outputs in different training cases.
>
>Look at Ivakhnenko and Lapa's book on Forecasting and Predicition
>Techniques. [I don't have the complete reference here.] Sometimes,
>you can model the determinsitic part and the stochastic parts
>separately. At other times, you might want to start from random
>intial behavior and bias it towards determinsitic behavior.
>You will most definitely need stochastic units in the network(s) you
>use.
>
>
>Pankaj Mehra
>University of Illinois

Just a short comment on the ``most definitely'' part:

It is quite possible to use deterministic nets to ``solve'' problems
that aren't fully deterministic (depending upon what is meant by
``solve''.)  For example, suppose we want a net to output the probability
of a coin turning up heads when tossed.  The network with one output
unit and no inputs whatsoever will perform this task, and can be
trained by gradient descent.

The set of training examples can be either one example, i.e., the observed
probability of turning up heads, e.g., {0.5}, or the unprocessed results
of a number of trials, e.g., {1,0,0,0,1,1,0,1}

In this case either the sum-of-squares or assymetric cross entropy is a 
suitable error function - the minimum for both occurs when the
output unit gives the observed probability.  However, for more
complex problems, the softmax output function together with the
assymetric cross entropy objective function are better in both
theory and practice.

John Bridle has quite a nice paper in NIPS 2 on using Nnets for stochastic
problems, he shows that for a particular type of network, when the objective
function is at its minimum value, the Mutual Information between the outputs
of the network and the training data is at its maximum.  (Btw, this gives
better discrimination than Maximum Likelihood model estimation methods).

Tony Plate
-- 
---------------- Tony Plate ----------------------  tap@ai.utoronto.ca -----
Department of Computer Science, University of Toronto, 
10 Kings College Road, Toronto, 
Ontario, CANADA M5S 1A4
----------------------------------------------------------------------------