Xref: utzoo comp.ai.neural-nets:692 sci.chem:243 Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!njin!princeton!phoenix!mbkennel From: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) Newsgroups: comp.ai.neural-nets,sci.chem Subject: Re: Neural Net Applications in Chemistry Message-ID: <8382@phoenix.Princeton.EDU> Date: 11 May 89 20:34:22 GMT References: <1989May10.095408.5836@gpu.utcs.utoronto.ca> <201@bach.nsc.com> <39817@bbn.COM> Reply-To: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) Followup-To: comp.ai.neural-nets Distribution: na Organization: Princeton University, NJ Lines: 130 In article <39817@bbn.COM> aboulanger@bbn.com writes: >In article <201@bach.nsc.com> Andrew Palfreyman asks: > > In article , ted@nmsu.edu (Ted Dunning) writes: > > much of the work at lanl using neural net methods has been supplanted > > by doyne farmers local approximation method which (for many problems) > > is several orders of magnitude more computationally efficient. the > > use of radial basis functions improves the value of this method > > considerably (in addition to making the link to nn techniques even > > stronger). > > This is very interesting, Ted. Could you post some references to the net > on Doyne Farmer's method, and perhaps a brief resume of his approach? > -- > >I should mention that besides Doyne Farmer's method, James Crutchfield >and Bruce S. McNamara have a method for recovering the equations of >motion from a time series. The reference is: > >"Equations of Motion from a Data Series", James Crutchfield & Bruce >McNamara, Complex Systems, 1(1987) 417-452. > >Unfortunately, I never did get a reference to Doyne's method. > > >Chaotically yours, >Albert Boulanger >BBN Systems & Technologies Corp. >aboulanger@bbn.com The best reference for Farmers' local approximation method is in the LANL preprint: "Exploiting Chaos to Predict the Future and Reduce Noise" by Farmer and Sidrowich. Note that this paper only deals with predicting chaotic dynamical systems. Having just done a thesis on the same sort of prediction using neural networks, I might be able to give an outline of various methods. There are basically two major categories: 1) Global methods. In this scheme, one tries to fit some complex function to all the data. There are various functional forms. A) Linear methods. By "linear" I mean the output of the function depends linearly on the _free parameters_; all of these functions can represent nonlinear transformations from input to output, of course. If the output is linear in the free parameters (weights), there is a well-known guaranteed, optimal & deterministic learning algorithm for the least squared error measure. 1) Global polynomials. Basically a taylor expansion of degree d in m unknowns. This is what Crutchfield & MacNamara use. Advantages: easy to compute. Disadvantages, for complicated functions, you need large values of d and the number of free weights increases very rapidly, approx = d^m. High-degree polynomials tend to blow up away from the domains on which they are fitted. (trained) 2) Rational quotients of polynomials. Similar to above, but if the degrees of num. and denom. are same, they don't blow up. 3) Radial basis functions, with fixed centers. Here you choose the centers of your radial functions, and then learn the weights of the output layer using linear least squares, for example. This can put put in terms of a network with a single hidden layer of neurons, where the first layer of weights is fixed, and the second layer of weights leads to a linear output unit. See a recent paper by Broomhead & (somebody-or other) in Complex Systems. There's also a pre-print by Martin Casdagli, using this to predict chaotic systems. ED note: this method looks promising and should probably be investigated further, especially to see whether it works in applications other than chaotic systems prediction. B) Nonlinear methods. Now, you have to use an iterative gradient-descent type of method. 1) Standard sigmoidal back-prop net. Well known learning algorithm. 2) Radial basis function, with adjustable centers. (This is what I used). Lets you represent the mapping more accurately using the same size network (=free parameters) compared to sigmoidal net, I found. Learn with conjugate gradient. II) Local methods. Now, instead of trying to fit a global function to all examples, you make a simple individual fit _every_ time you need an output. Here's how Farmer's method works: Given some input vector, find the N closest points in the data base to this input vector. For each of these stored points, you know what the output was. With these input-output pairs, make a linear or quadratic approximation, and fit the free parameters using linear least squares. This should be fast because there aren't that many examples (10-30 say) and free parameters. "Learning" simply consists of putting the input data base into a k-d tree that permits you to retrieve nearest neighbors is O(log N) time, as needed to make predictions. This method has a nice intuitive interpretation: given some new input, you look around to see what things like that input did, and you do something like that. Advantages: Much faster computationally than an iterative gradient-descent optimization, especially for large data bases of examples. Doesn't require any hard crunching to learn some function. Probably more accurate for large data bases, too, because most people wouldn't have the patience or computer power to learn a large network to high accuracy. Disadvantages: The mapping isn't in a nice analytic functional form. You can't realize it in silicon. You need to carry around the data base of examples, and making predictions is slower (a search & small fit vs. a simple functional evaluation). ================================= Personal opinion: If you have a large data base (say over 1K examples) of noise-free continuous-valued examples, local linear and local quadratic methods will probably be the best. I don't know what the effect input noise would have on this method, compared to neural networks, but I don't think it would be that bad. For binary values, it may not be as good, for the whole method is rooted in the field of dynamical systems. But I don't think anybody's tried yet, so I have no real evidence one way or the other. ================== Get the preprint (it's very good) from Doyne Farmer at LANL. I believe his e-mail address is "jdf%heretic@lanl.gov". An earlier version of this appeared in Physical Review Letters, so it's definitely real. Matt Kennel mbkennel@phoenix.princeton.edu