Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!ucsd!sdcsvax!beowulf!demers
From: demers@beowulf.ucsd.edu (David E Demers)
Newsgroups: comp.ai.neural-nets
Subject: Re: 3-Layer versus Multi-Layer
Message-ID: <6730@sdcsvax.UCSD.Edu>
Date: 28 Jun 89 18:57:45 GMT
References: <3417@cosmo.UUCP>
Sender: nobody@sdcsvax.UCSD.Edu
Reply-To: demers@beowulf.UCSD.EDU (David E Demers)
Organization: EE/CS Dept. U.C. San Diego
Lines: 33

In article <3417@cosmo.UUCP> jochenru@cosmo.UUCP (Jochen Ruhland) writes:
>During a local meeting here in Germany I heard somebody talking
>about a theorem that a three layer perceptron is capable to
>perform any given In/Out function with an maximum number of hidden
>units in the network.

For "perceptrons", there is no such proof, since multilayer
linear units can easily be collapsed into two-layers.  See, e.g.,
Minsky & Papert, "Perceptrons" (1969).  If, however, units can
take on non-linear activations, then it can be shown that a
three layer network can approximate any Borel-measurable
function to any desired degree of accuracy (exponential in
the number of units, however!).  Hal White et al have shown
this, and have also shown that the mapping is learnable.
This paper is going to appear this year in the
Journal of INNS, Neural Networks.

The source of this is frequently listed as the Kolmogorov
superposition theorem.  Robert Hecht-Nielsen has a paper
in the 1987 Proceedings of the First IEEE conference on
Neural Networks about this theorem.  The theorem is not
constructive, however.  It shows that a function from
R^m to R^n can be represented by the superposition of
{some number linear in m & n} bounded, monotonic, non-linear
functions of the m inputs.  However, there is no way of
determining these functions...

I am writing all of this from memory, all of my papers are
elsewhere right now...  but I know that others have similar
results.

Dave