Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ukma!rex!ck
From: ck@rex.cs.tulane.edu (Cris Koutsougeras)
Newsgroups: comp.ai.neural-nets
Subject: Original STEP discussion continued
Message-ID: <1082@rex.cs.tulane.edu>
Date: 8 Sep 89 07:38:10 GMT
Reply-To: ck@rex.cs.tulane.edu (Cris Koutsougeras)
Organization: Computer Science Dept., Tulane Univ., New Orleans, LA
Lines: 70

I am tempted to present my thoughts about the issues raised in the
"step function" discussion.

Let me try to lead through the concepts of "correctness" and "generalization".
Given a training set, suppose that a net learns a function F1 which "contains"
(or agrees if you wish) with the training set (TS). On the basis of the 
info in TS only, one has to accept F1 as correct. The issue of a proper
generalization has to do with the question whether the inference of F1 on
the basis of TS is "reasonable". Reasonable is not something easy to define.
What it intuitively means though is whether F1 reflects a relation 
characteristic of TS through which TS can be reconstructed. Also, of many
such relations characterizing TS (and each of which is such that it alone
can reconstruct TS) the most "reasonable" one would be (matter of definition)
the most simplest one. 

Lets take an example because otherwise I sound too arbitrary. Suppose I want
to teach someone the Greek word "louloudi" and I present as positive instances
some red flowers and as negatives some trees some houses and some non-flower
plants. Suppose also that none of the negative instances are red. Would 
anyone who would think that louloudi means a red flower or something red in
general be considered stupid ??? Well louloudi means flower and it is my fault
that I chose a TS in which the color plays a determinant role, because I should
expect that the "student" would try to identify the "simplest" discriminant 
rule which in this case hapens to be the color alone.

So I would relate the concept of a learnable function with the TS in use.
I would suggest that a given function is learnable with respect to a given
TS if it is the "simplest" function which is correct under TS and that the
"simplest" correct function is unique.

In the case of Back Prop as a measure for simplicity we can adopt the degree
of nonlinearity of the function. Lets see what happens in the BP operation.
Components (primitive features) of the input which are irelevant to the I/O
relation will have random values that make no sense (in other words would 
result in a very complicated description of the I/O relation) if we assume
that we have a good TS. On the other hand those  which are relevant would
have recurent properties (or values to keep it simple). The changes of the
weights implied by non-relevant components will be random and in the long run
will cancel each other simply causing an oscilation which will be smaller
and smaller with the annealing schedule. Those which are relevant though
will cause a steady and directed change. That is how BP identifies the 
relevant features and (easy to understant) by extension how they synthesize
a description. If the net does not have excessive non-linearity the relevant
features (and their recurrent values) will dominate the changes and the end
result. If there is, then any small bias of the values of irelevant features
will be identified and will remain in the net's internal representations.

It thus should become apparent that we should be talking about a function
learnable with respect to a TS and that learnability should be connected
with an optimality measure of some sort.

Most of what I expressed here are thoughts that puzzled me when I was 
writing my dissertation. My escape then was to develop another model
which was based on an effort to alleviate the problem of fixed non-linearity
of Rumelhart's net. But I haven't yet addressed the problem of learnability
in the general case as it was put earlier here. Which means that whoever
feels that s/he wants to seriously contribute on this, is welcome to
cooperate and lets get a paper out.


Cris Koutsougeras

==============================================================================
,___,   _  _   _              _   _   ___,               ___         ,__
  |     |  |   |       /\     |\  |   |                 |   `       /
  |     |  |   |      /--\    | \ |   |-        *       |            \
  |     |__|   |__,  /    \   |  \|   |___,             |___,  *    __/  *

==============================================================================