Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!zaphod.mps.ohio-state.edu!sdd.hp.com!elroy.jpl.nasa.gov!usc!ucla-cs!math.ucla.edu!pico!barry From: barry@pico.math.ucla.edu (Barry Merriman) Newsgroups: comp.ai.neural-nets Subject: Re: Searching for 13th Hilbert Problem article Message-ID: <718@kaos.MATH.UCLA.EDU> Date: 9 Nov 90 21:18:29 GMT References: <39365@ut-emx.uucp> Sender: news@MATH.UCLA.EDU Organization: UCLA Dept. of Math, UCLA Inst. for Fusion and Plasma Research Lines: 58 In article pluto@zaius.ucsd.edu (Mark Plutowski) writes: >phbd641@ccwf.cc.utexas.edu (David Chao) writes: > >>In the article on Hilbert's 13th problem, reference is made to >>Kolmogorov's proof concerning the approximation of any continous function of >>N variables using only linear summations and nonlinear but >>continously increasing functions of only one variable. > >(BTW: the version I heard required on the order of 2N+1 functions -- is this >correct? This would be equivalent to requiring 2N+1 hidden units, and a single >linear output unit. The ``best version'' requires 2n+1 monotonic, (Lipschitz 1-) continuos functions of one variable (``transfer functions''), n constants (``weights''), which are independent of the continuous function of n variables being represented, and one continuous function of one variable,g, which has no special form, and is thus not representable by a single standard transfer function. This extra function, g, would have to be represented by a net of its own for this to a standard network. So, its like having n inputs, 2n+1 hidden units, one special g-unit, and a summation unit as an output, with a special O(n) connectivity pattern. >However, as I understand it, in Kolmogorov's theorem >each hidden unit may be required to have a different, and unknown, transfer >function. Hence, the theorem proves the existence of such a neural network, >but not constructively.) No---the proof of K. _is_ fully constructive (though he does a version that requires about 2n^2 neurons). Of course, he using limiting procedures in the constructions. (Amazingly, it only takes K about 4 pages---though it is very dense.) Still, note that this thereom is essentially useless for practical purposes, because the function g, while continuos, can be extremely rough, and should typically require about const^n values on its domain to be well-represented. (i.e., to represent g via a standard network would require exponentially---in n---many neurons. To see this, note that a general smooth function of n variables requires values on a fine grid in n-space to be well approximated. If the grid has spacing h, this is like (1/h)^n values. In the above representation, all this info gets shoved into g, so you would expect g to have substantial variations on a length scale of h^n, if the original function varied on a scale h in n dimesions (and depended strongly on all its variable!) Thus the complecity of computing g becomes overwhelming.) The point is NN are useful for representing very special functions of n-variables, namely those that can actualy be specified by little information, say O(n), or O(n^2). Kolmogorov's theorem makes no attempt to single out such functions, and just lumps them in with general functions. Thus, it has nothing special to tell us about the utility of neural nets. -- Barry Merriman UCLA Dept. of Math UCLA Inst. for Fusion and Plasma Research barry@math.ucla.edu (Internet)