Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!zaphod.mps.ohio-state.edu!sdd.hp.com!elroy.jpl.nasa.gov!usc!ucla-cs!math.ucla.edu!pico!barry
From: barry@pico.math.ucla.edu (Barry Merriman)
Newsgroups: comp.ai.neural-nets
Subject: Re: Searching for 13th Hilbert Problem article
Message-ID: <718@kaos.MATH.UCLA.EDU>
Date: 9 Nov 90 21:18:29 GMT
References: <39365@ut-emx.uucp> <pluto.658100520@zaius>
Sender: news@MATH.UCLA.EDU
Organization: UCLA Dept. of Math, UCLA Inst. for Fusion and Plasma Research
Lines: 58

In article <pluto.658100520@zaius> pluto@zaius.ucsd.edu (Mark Plutowski) writes:
>phbd641@ccwf.cc.utexas.edu (David Chao) writes:
>
>>In the article on Hilbert's 13th problem, reference is made to 
>>Kolmogorov's proof concerning the approximation of any continous function of
>>N variables using only linear summations and nonlinear but
>>continously increasing functions of only one variable. 
>
>(BTW: the version I heard required on the order of 2N+1 functions -- is this
>correct? This would be equivalent to requiring 2N+1 hidden units, and a single 
>linear output unit.  

The ``best version'' requires 2n+1 monotonic, (Lipschitz 1-) continuos functions
of one variable (``transfer functions''), n constants (``weights''), which are independent
of the continuous function of n variables being represented, and one continuous function
of one variable,g, which has no special form, and is thus not representable by a single
standard transfer function. This extra function, g, would have to be represented 
by a net of its own for this to a standard network. So, its like
having n inputs, 2n+1 hidden units, one special g-unit, and a summation
unit as an output, with a special O(n) connectivity pattern.

>However, as I understand it, in Kolmogorov's theorem 
>each hidden unit may be required to have a different, and unknown, transfer 
>function.  Hence, the theorem proves the existence of such a neural network, 
>but not constructively.)

No---the proof of K. _is_ fully constructive (though he does a version that
requires about 2n^2 neurons). Of course, he using limiting procedures in
the constructions. (Amazingly, it only takes K about 4 pages---though
it is very dense.)

Still, note that this thereom is essentially useless for practical
purposes, because the function g, while continuos, can be
extremely rough, and should typically require about const^n values on its
domain to be well-represented. (i.e., to represent g via a standard
network would require exponentially---in n---many neurons. To see
this, note that a general smooth function of n variables requires values
on a fine grid in n-space to be well approximated. If the grid has spacing
h, this is like (1/h)^n values. In the above representation, all this
info gets shoved into g, so you would expect g to have substantial
variations on a length scale of h^n, if the original function varied
on a scale h in n dimesions (and depended strongly on all its variable!)
Thus the complecity of computing g becomes overwhelming.) The point is
NN are useful for representing very special functions of n-variables,
namely those that can actualy be specified by little information, say O(n),
or O(n^2). Kolmogorov's theorem makes no attempt to single out such functions,
and just lumps them in with general functions. Thus, it has nothing
special to tell us about the utility of neural nets.


--
Barry Merriman
UCLA Dept. of Math
UCLA Inst. for Fusion and Plasma Research
barry@math.ucla.edu (Internet)