Xref: utzoo comp.ai.neural-nets:692 sci.chem:243
Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!njin!princeton!phoenix!mbkennel
From: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel)
Newsgroups: comp.ai.neural-nets,sci.chem
Subject: Re: Neural Net Applications in Chemistry
Message-ID: <8382@phoenix.Princeton.EDU>
Date: 11 May 89 20:34:22 GMT
References: <1989May10.095408.5836@gpu.utcs.utoronto.ca> <TED.89May10130744@kythera.nmsu.edu> <201@bach.nsc.com> <39817@bbn.COM>
Reply-To: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel)
Followup-To: comp.ai.neural-nets
Distribution: na
Organization: Princeton University, NJ
Lines: 130

In article <39817@bbn.COM> aboulanger@bbn.com writes:
>In article  <201@bach.nsc.com> Andrew Palfreyman asks:
>
>  In article <TED.89May10130744@kythera.nmsu.edu>, ted@nmsu.edu (Ted Dunning) writes:
>  > much of the work at lanl using neural net methods has been supplanted
>  > by doyne farmers local approximation method which (for many problems)
>  > is several orders of magnitude more computationally efficient.  the
>  > use of radial basis functions improves the value of this method
>  > considerably (in addition to making the link to nn techniques even
>  > stronger). 
>
>  This is very interesting, Ted. Could you post some references to the net
>  on Doyne Farmer's method, and perhaps a brief resume of his approach?
>  -- 
>
>I should mention that besides Doyne Farmer's method, James Crutchfield
>and Bruce S. McNamara have a method for recovering the equations of
>motion from a time series. The reference is:
>
>"Equations of Motion from a Data Series", James Crutchfield & Bruce
>McNamara, Complex Systems, 1(1987) 417-452.
>
>Unfortunately, I never did get a reference to Doyne's method.
>
>
>Chaotically yours,
>Albert Boulanger
>BBN Systems & Technologies Corp.
>aboulanger@bbn.com

The best reference for Farmers' local approximation method is in the
LANL preprint:  "Exploiting Chaos to Predict the Future and Reduce Noise"
by Farmer and Sidrowich.

Note that this paper only deals with predicting chaotic dynamical systems.

Having just done a thesis on the same sort of prediction using neural
networks, I might be able to give an outline of various methods.  There
are basically two major categories:

1)  Global methods.  In this scheme, one tries to fit some complex function
to all the data.  There are various functional forms.
    A)  Linear methods.  By "linear" I mean the output of the function
        depends linearly on the _free parameters_; all of these functions
        can represent nonlinear transformations from input to output, of
        course.  If the output is linear in the free parameters (weights),
        there is a well-known guaranteed, optimal & deterministic
	learning algorithm for the least squared error measure.

	1)  Global polynomials.  Basically a taylor expansion of degree
            d in m unknowns.  This is what Crutchfield & MacNamara use.
            Advantages: easy to compute.  Disadvantages, for complicated
            functions, you need large values of d and the number of
            free weights increases very rapidly, approx = d^m.
	    High-degree polynomials tend to blow up away from the domains
            on which they are fitted.  (trained)
	2)  Rational quotients of polynomials.  Similar to above, but if
            the degrees of num. and denom. are same, they don't blow up.

        3)  Radial basis functions, with fixed centers.  Here you choose
            the centers of your radial functions, and then learn the
            weights of the output layer using linear least squares, for
            example.  This can put put in terms of a network with a single
            hidden layer of neurons, where the first layer of weights is
            fixed, and the second layer of weights leads to a linear
            output unit.  See a recent paper by Broomhead & (somebody-or
            other) in Complex Systems.  There's also a pre-print
            by Martin Casdagli, using this to predict chaotic systems.
            ED note:  this method looks promising and should probably
            be investigated further, especially to see whether it works
            in applications other than chaotic systems prediction.

     B)  Nonlinear methods.  Now, you have to use an iterative
	 gradient-descent type of method.

	1)  Standard sigmoidal back-prop net.  Well known learning algorithm.
	2)  Radial basis function, with adjustable centers.  (This is what I
            used).  Lets you represent the mapping more accurately using
            the same size network (=free parameters) compared to sigmoidal
            net, I found.  Learn with conjugate gradient.

II)  Local methods.

     Now, instead of trying to fit a global function to all examples,
you make a simple individual fit _every_ time you need an output.
Here's how Farmer's method works:
        Given some input vector, find the N closest points in the
        data base to this input vector.  For each of these stored points,
        you know what the output was.  With these input-output pairs,
        make a linear or quadratic approximation, and fit the free parameters
        using linear least squares.  This should be fast because there aren't
        that many examples (10-30 say) and free parameters.

	"Learning" simply consists of putting the input data base into
a k-d tree that permits you to retrieve nearest neighbors is
O(log N) time, as needed to make predictions.

	This method has a nice intuitive interpretation:  given some new
input, you look around to see what things like that input did, and you
do something like that.  

Advantages:  Much faster computationally than an iterative gradient-descent
optimization, especially for large data bases of examples.  Doesn't
require any hard crunching to learn some function.  Probably more accurate
for large data bases, too, because most people wouldn't have the patience
or computer power to learn a large network to high accuracy.

Disadvantages:  The mapping isn't in a nice analytic functional form.
You can't realize it in silicon.  You need to carry around the data base
of examples, and making predictions is slower (a search & small fit vs.
a simple functional evaluation).

=================================
Personal opinion:
    If you have a large data base (say over 1K examples) of noise-free
continuous-valued examples, local linear and local quadratic methods
will probably be the best.  I don't know what the effect input noise
would have on this method, compared to neural networks, but I don't
think it would be that bad.  For binary values, it may not be as good,
for the whole method is rooted in the field of dynamical systems.
But I don't think anybody's tried yet, so I have no real evidence one
way or the other.
==================

Get the preprint (it's very good) from Doyne Farmer at LANL.  I believe
his e-mail address is "jdf%heretic@lanl.gov".  An earlier version of
this appeared in Physical Review Letters, so it's definitely real.

Matt Kennel
mbkennel@phoenix.princeton.edu