Path: utzoo!utgpu!watserv1!watmath!att!att!dptg!ulysses!andante!mit-eddie!media-lab!minsky
From: minsky@media-lab.MEDIA.MIT.EDU (Marvin Minsky)
Newsgroups: comp.ai
Subject: Re: What Has Traditional AI Accomplished?
Message-ID: <3740@media-lab.MEDIA.MIT.EDU>
Date: 19 Oct 90 14:51:13 GMT
References: <69609@lll-winken.LLNL.GOV> <1990Oct15.143325.26044@unislc.uucp> <1990Oct16.135631.6444@cbnewsj.att.com> <69929@lll-winken.LLNL.GOV>
Reply-To: minsky@media-lab.media.mit.edu (Marvin Minsky)
Organization: MIT Media Lab, Cambridge MA
Lines: 133

In article <69929@lll-winken.LLNL.GOV> loren@tristan.llnl.gov (Loren Petrich) writes:

>	The simplicity of the basic algorithms keep making me wonder
>why NN's did not take off earlier -- the basic code for one takes up
>only a couple pages of Fortran or C. Try writing one yourself. I guess
>that (in)famous book by Minsky and Papert, _Perceptrons_, with its
>seemingly airtight theoretical arguments, is what had squelched the
>field for so long.

DAMMIT.  Try reading the book. What happened was that the field had
already flattened out, because, although Perceptrons could learn to
recognize certain patterns, they seemed unable to learn some other
kinds of patterns.  The book explicitly analyzes "three layer nets" --
input layer / coefficients / hidden layer / coefficients / and single
neuron output.  But, in fact, most theorems apply to unrestricted multilayer,
loop-free nets.  This does not seem to be well-known.  I assumed it
was obvious.

Since no one has found any errors in those "seemingly airtight
theoretical arguments", you should try to understand what point you're
missing!  It seems strange that I should have to do explain this in
comp.ai, at this late date.  "Perceptrons" explained that it will be
hard for such nets to recognize, for example, certain kinds of
group-invariant recognitions, without duplicating hardware for every
element of the group.

     EXAMPLE: in a simple 100 x 100 square retina, recognize all the
     images that could be reasonably described as depicting "A SQUARE
     INSIDE A CIRCLE".

Loren and others are absolutely right, in that the 80's showed that ML
(multilayer) nets could be made to learn many useful patterns.
"Perceptrons" was concerned with patterns that MLs couldn't learn, not
ones they could!!!!!!!!!!

So no collection of exciting stories of MLs learning things counters
the problems with what they can't learn -- like those distance
invariant relationships between parts of images.

In many cases, "successful" applications of MLs depend on
pre-processing a picture image, by first normalizing it in size, and
then centering it, before presenting it to the ML.  Fine - but don't
tell people that this refutes the Minsky-Papert theorems.  Instead,
now try todo that "circle-ionside-square" problem!  And then realize
that many real-world problems require multiple normalizations, which
cannot be pre-computed until you have picked out the sub-patterns.

In that connection, there is wisdom in Thomas G Edwards' remarks in
<6664@jhunix.HCF.JHU.EDU>:

  ... Cascade-Correlation is a NN algorithm which is able to solve
  many problems which were difficult for homogenous NNs to solve. ...
  I see a future where inductive learning by small homogeneous NNs
  is used in combination with more traditional AI type goal building.
  Cascade-Correlation is a step in that direction.  Divide-and-conquer
  of traditional AI is combined with the easy inductive learning of
  traditional NNs.  Of course, the trick is to couch this in a
  connectionist framework to continue to allow for fast parallel
  computation.

Divide-and-conquer is surely needed for circle-inside-square.  Note
that we still don't nkow how the brain does it.

Get with it, guys!  Of course there are many exciting things that can
be done with ML networks.  A good deal of the brain is made of them.
And there is a lot that require non-ML networks, and a lot of the
brain is non-ML.  Instead of bashing "Perceptrons", you should use it
as a model, and try to find more general statements about what ML and
other networks can do, and what are their limitations.

What we don't need are intemperate remarks like those in
<POLLACK.90Oct18014110@dendrite.cis.ohio-state.edu>, who seems to
deliberately misinterpret everything I have said in this group and
other places.  I don't know why he's so angry at me.

For example, in  one message to this group I said:

   "... Where is the "traditional, symbolic, AI in the brain"?  The
   answer seems to have escaped almost everyone on both sides of this
   great and spurious controversy!  The 'traditional AI' lies in the
   genetic specifications of those functional interconnections: the bus
   layout of the rel A large, perhaps messy software is there before your
   eyes, hiding in the gross anatomy.  Some 3000 "rules" about which
   sub-NN's should do what, and under which conditions, as dictated by
   the results of computations done in other NNs...."

Pollack replied, with this weird objection

   "I have to admit this is definitely a novel version of the
   homunculus fallacy: If we can't find him in the brain, he must be
   in the DNA! Of all the data and theories on cellular division and
   specialization and on the wiring of neural pathways I have come
   across, none have indicated that DNA is using means-ends analysis."

And then, he proceeded to make the same points that I have been
making, as though it were different from what I was saying:

   "Certainly, connectionist models are very easy to decimate when
   offered up as STRONG models of children learning language, of real
   brains, of spin glasses, quantum calculators, or whatever.  That is
   why I view them as working systems which just illuminate the
   representation and search processes (and computational theories) which
   COULD arise in natural systems.  There is plenty of evidence of
   convergence between representations found in the brain and backprop or
   similar procedures despite the lack of any strong hardware equivalence
   (Anderson, Linsker); constrain the mapping task correctly, and local
   optimization techniques will find quite similar solutions.

It is the same thing again.  Yes, you can find things nets do, but
it's like bad statistics in which you don't describe what you're
testing for until after the experiment is done.  Let's see an ML solve
circle-in-square.  Let's see one of Pollack's massively parallel
parsers solve circel in square.  Without any "strong hardware"
pre-figuring of the network.  In fact, Pollack's next paragraph begins
with

   "Furthermore, the representations and processes discovered by
connectionist models may have interesting scaling properties and can
be given plausible adaptive accounts."

Is he angry at me because the required scaling properties for human
visual perception are not among those posessed by the NN models he
advocates?  I don't know, by there must be some reason for his rage?
He finishes with,
 
   "On the other hand, I take it as a weakness of a theory of
   intelligence, mind or language if, when pressed to reveal its
   origin, shows me a homunculus, unbounded nativism, or some
   evolutionary accident with the same probability of occurrence as God.

Is this a paraphrase of the beginning of "Society of Mind", or does
Pollack think it is opposing it.  Come on Jordan.  We're on the same
side.  Yet you have been writing the most hostile and savage reviews
of my work.  What's the deal here?