Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!mit-eddie!media-lab!minsky
From: minsky@media-lab.MEDIA.MIT.EDU (Marvin Minsky)
Newsgroups: comp.ai.philosophy
Subject: Re: Reasoning Paradigms
Message-ID: <3593@media-lab.MEDIA.MIT.EDU>
Date: 6 Oct 90 04:55:54 GMT
References: <9963@ccncsu.ColoState.EDU> <3586@media-lab.MEDIA.MIT.EDU> <69347@lll-winken.LLNL.GOV>
Reply-To: minsky@media-lab.media.mit.edu (Marvin Minsky)
Organization: MIT Media Lab, Cambridge MA
Lines: 128


I agree with most of what loren@tristan.llnl.gov (Loren Petrich) said
in article 62.  The only problem I have is with his assertion 

> I feel that there is much more promise in NN's than in traditional
> AI, which has been dependent on working out decision rules explicitly.

It is not an either-or thing, in my view.  NN's are strong in learning
to recognize (some) patterns in which something depends on many other
things in relatively weak dependencies.  NN's can represent such
relationships when they have good linear approximations -- but,
probably, only in those domains.  We don't know a lot about how to
characterize them.  But lots of human pattern recognition machinery
probably uses this.

On the other side, the PROCEDURES that can be represented in NN's are
very limited, certainly in the non-cyclic nets that dominate the work
of the 80s.  This means that, without a lot of external script-like
control, it will be hard for them to reason about what they have
recognized.  A careful re-reading of "Perceptrons" will show that
virtually all the negative results therein still hold for multi-layer
noncyclic networks -- especially theoriems like the AND-OR theorem
which show why an NN that recognizes parts may not be able to (learn
to) recognize when those parts have particular relationships, etc.  

I could go on about this, but the point is this:

  1.  Yes: systems with compact rules with very few input terms are not
good at recognizing patterns which need many inputs.  So AI systems
restricted to compact rules must be supplemented by NN-like
structures.  
  2.  No: the NN-like structures cannot replace the "reasoning
systems" of "traditional AI", unless we supply architectures that
embody those goal-oriented processes.  For example, "annealing" does
not replace all other kinds of intelligent heuristic search.  

A tricky fallacy is to think, "Golly, I have now seen NN's solve a
hundred problems in the last five years that 'old AI' couldn't solve.
What's wrong with that is (i) you can look at it the other way: let's
see NNs learn to solve formal integration problems, or similar
problems that involve dissection of descriptions and (ii) many of
those problems NNs can solve can also be solved by other kinds of
analysis -- and, sometimes in ways that lend themselves to being
usable in OTHER situations.  In this sense, then, NN solutions, in
contrast, tend to be dead ends, simply because what you end
up with, after your 100,000 steps of hill-climbing, is an opaque
vector of coefficients.  You have solved the prob lem, all right.  You
have even _learned_ the solution!  But you don't end up with anything
you can THINK about!

Is that bad?  Your locomotion system "learns" to walk, all right.  (It
begins with an architecture of NN's that wonderfully work to adjust
your reflexes.)  But "you" don't know anything of how it's done.  Even
Professors of Locomotion Science are still working out theories about
such things.

So may you can make a pretty good dog with NNs.  And note that I put
NNs in the plural!  A dog, or a human, learns by using a brain that
consists of (I estimate) some 400 clearly distinctly different NN
architectures and perhaps 3000 distinct busses or bundles of
specialized interconnections.  What does that mean?

Answer: some of the job is done by NNs.  And some of the job is done
by compactly-describable procedural specifications.  Where is the
"traditional, symbolic, AI in the brain"?  The answer seems to have
escaped almost everyone on both sides of this great and spurious
controversy!  The 'traditional AI' lies in the genetic specifications
of those functional interconnections: the bus layout of the relations
between the low-level networks.  A large, perhaps messy software is
there before your eyes, hiding in the gross anatomy.  Some 3000
"rules" about which sub-NN's should do what, and under which
conditions, as dictated by the results of computations done in other
NNs (see the idea of "B-brain" in my book).

Someone might object that this may be an accident.  In a few years,
perhaps, someone will find a new learning algorithm through which a
single, homogeneous NN (highly cyclic, of course) can start from
nothing and learn to become very smart, without any of that
higher-level stuff encoded into its anatomy -- and all in some
reasonable amount of time.  That is the question, and I see no reason
to think that present-day results are very encouraging.

-----

Here is a simple, if abstract, example of what I mean.  Consider one
of the most powerful ideas in traditional AI -- the concept of
acheiving a goal by detecting differences between the present
situation ("what you have") and a target situation ("what you want").
The Newell and Simon 'GPS' system did such things (and worked in many
cases, but not all) by trying various experiments and comparing the
results, and then applying strategies designed (or learned) for
'reducing' those differences.

In order to do this, common sense would suggest, you need resources
for storing away the various recent results, and then pulling them out
for comparisons.  This is easily done with the equivalent of
registers, or short-term memories -- and it seems -- from a behavioral
viewpoint -- that human brains are equipped with modest numbers of such
structures.  Now, in fact, no one knows the physiology of this.  In
"Society of Mind" I conjecture that many of our brain NN's are
especially equipped with what I call "temporary K-lines" or "pronomes"
that are used for such purposes.  (Their activities are controlled by
other NN's that somehow learn new control-scripts for managing those
short-term memories.)  

Well, if you design NNs with such facilities, then it will not be very
hard to get them to solve symbolic, analytic problems.  If you don't
provide them with that sort of hardware, everything will get too
muddled, and (I predict) they'll "never" get very far.  It will be
like trying to teach your dog to do calculus.  An alternative will be
to design a fiendishly clever pre-training scheme which "teaches" your
NN, first, to build inside itself some registers.  This might indeed
be feasible, with a homogeneous NN, under certain conditions.  But it
wouldn't be exactly a refutation of what I said before, because it
would involve, not the NN itself "discovering" an adequate
architecture, but an external teacher's deliberately imposing that
architecture on the NNs future development.  (Even this is not
all-or-none, because there is clearly some such trade-off in human
development which, according to all accounts, will fail in the absence
of any attentive adult caretaker.

Oh well.

----------

In any case, I want to thank Loren for endless thoughtful observations
about many other topics.  I intend to think more about what he said here.