Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ukma!rex!ck From: ck@rex.cs.tulane.edu (Cris Koutsougeras) Newsgroups: comp.ai.neural-nets Subject: Original STEP discussion continued Message-ID: <1082@rex.cs.tulane.edu> Date: 8 Sep 89 07:38:10 GMT Reply-To: ck@rex.cs.tulane.edu (Cris Koutsougeras) Organization: Computer Science Dept., Tulane Univ., New Orleans, LA Lines: 70 I am tempted to present my thoughts about the issues raised in the "step function" discussion. Let me try to lead through the concepts of "correctness" and "generalization". Given a training set, suppose that a net learns a function F1 which "contains" (or agrees if you wish) with the training set (TS). On the basis of the info in TS only, one has to accept F1 as correct. The issue of a proper generalization has to do with the question whether the inference of F1 on the basis of TS is "reasonable". Reasonable is not something easy to define. What it intuitively means though is whether F1 reflects a relation characteristic of TS through which TS can be reconstructed. Also, of many such relations characterizing TS (and each of which is such that it alone can reconstruct TS) the most "reasonable" one would be (matter of definition) the most simplest one. Lets take an example because otherwise I sound too arbitrary. Suppose I want to teach someone the Greek word "louloudi" and I present as positive instances some red flowers and as negatives some trees some houses and some non-flower plants. Suppose also that none of the negative instances are red. Would anyone who would think that louloudi means a red flower or something red in general be considered stupid ??? Well louloudi means flower and it is my fault that I chose a TS in which the color plays a determinant role, because I should expect that the "student" would try to identify the "simplest" discriminant rule which in this case hapens to be the color alone. So I would relate the concept of a learnable function with the TS in use. I would suggest that a given function is learnable with respect to a given TS if it is the "simplest" function which is correct under TS and that the "simplest" correct function is unique. In the case of Back Prop as a measure for simplicity we can adopt the degree of nonlinearity of the function. Lets see what happens in the BP operation. Components (primitive features) of the input which are irelevant to the I/O relation will have random values that make no sense (in other words would result in a very complicated description of the I/O relation) if we assume that we have a good TS. On the other hand those which are relevant would have recurent properties (or values to keep it simple). The changes of the weights implied by non-relevant components will be random and in the long run will cancel each other simply causing an oscilation which will be smaller and smaller with the annealing schedule. Those which are relevant though will cause a steady and directed change. That is how BP identifies the relevant features and (easy to understant) by extension how they synthesize a description. If the net does not have excessive non-linearity the relevant features (and their recurrent values) will dominate the changes and the end result. If there is, then any small bias of the values of irelevant features will be identified and will remain in the net's internal representations. It thus should become apparent that we should be talking about a function learnable with respect to a TS and that learnability should be connected with an optimality measure of some sort. Most of what I expressed here are thoughts that puzzled me when I was writing my dissertation. My escape then was to develop another model which was based on an effort to alleviate the problem of fixed non-linearity of Rumelhart's net. But I haven't yet addressed the problem of learnability in the general case as it was put earlier here. Which means that whoever feels that s/he wants to seriously contribute on this, is welcome to cooperate and lets get a paper out. Cris Koutsougeras ============================================================================== ,___, _ _ _ _ _ ___, ___ ,__ | | | | /\ |\ | | | ` / | | | | /--\ | \ | |- * | \ | |__| |__, / \ | \| |___, |___, * __/ * ==============================================================================