Path: utzoo!attcan!uunet!cs.utexas.edu!news-server.csri.toronto.edu!neuron.ai.toronto.edu!ai.toronto.edu!tap From: tap@ai.toronto.edu (Tony Plate) Newsgroups: comp.ai.neural-nets Subject: Re: NN solution of non-deterministic problems. Doable or stupid? Message-ID: <90Aug2.153402edt.322@neuron.ai.toronto.edu> Date: 2 Aug 90 19:34:46 GMT References: <14121@shlump.nac.dec.com> <6910@ptolemy.arc.nasa.gov> Organization: Department of Computer Science, University of Toronto Lines: 54 In article <6910@ptolemy.arc.nasa.gov> mehra@ptolemy.arc.nasa.gov (Pankaj Mehra) writes: >In article spoffojj@hq.af.mil (Jason Spofford) writes: >The original query was: > >> Message-ID: <14121@shlump.nac.dec.com> >> I'd like to know if it's possible to use neural nets to solve problems >> that aren't fully deterministic, that is, similar inputs produce two or >> more different outputs in different training cases. > >Look at Ivakhnenko and Lapa's book on Forecasting and Predicition >Techniques. [I don't have the complete reference here.] Sometimes, >you can model the determinsitic part and the stochastic parts >separately. At other times, you might want to start from random >intial behavior and bias it towards determinsitic behavior. >You will most definitely need stochastic units in the network(s) you >use. > > >Pankaj Mehra >University of Illinois Just a short comment on the ``most definitely'' part: It is quite possible to use deterministic nets to ``solve'' problems that aren't fully deterministic (depending upon what is meant by ``solve''.) For example, suppose we want a net to output the probability of a coin turning up heads when tossed. The network with one output unit and no inputs whatsoever will perform this task, and can be trained by gradient descent. The set of training examples can be either one example, i.e., the observed probability of turning up heads, e.g., {0.5}, or the unprocessed results of a number of trials, e.g., {1,0,0,0,1,1,0,1} In this case either the sum-of-squares or assymetric cross entropy is a suitable error function - the minimum for both occurs when the output unit gives the observed probability. However, for more complex problems, the softmax output function together with the assymetric cross entropy objective function are better in both theory and practice. John Bridle has quite a nice paper in NIPS 2 on using Nnets for stochastic problems, he shows that for a particular type of network, when the objective function is at its minimum value, the Mutual Information between the outputs of the network and the training data is at its maximum. (Btw, this gives better discrimination than Maximum Likelihood model estimation methods). Tony Plate -- ---------------- Tony Plate ---------------------- tap@ai.utoronto.ca ----- Department of Computer Science, University of Toronto, 10 Kings College Road, Toronto, Ontario, CANADA M5S 1A4 ----------------------------------------------------------------------------