Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!samsung!noose.ecn.purdue.edu!iuvax!cogsci!dave
From: dave@cogsci.indiana.edu (David Chalmers)
Newsgroups: comp.ai.neural-nets
Subject: Re: A neural net who plays chess
Message-ID: <62519@iuvax.cs.indiana.edu>
Date: 6 Oct 90 08:20:21 GMT
References: <6758@uwm.edu>
Sender: news@iuvax.cs.indiana.edu
Reply-To: dave@cogsci.indiana.edu (David Chalmers)
Organization: Indiana University, Bloomington
Lines: 98

In article <6758@uwm.edu> markh@csd4.csd.uwm.edu (Mark William Hopkins) writes:

>   I've been sitting on this extremely simple but potentially very powerful
>design which would enable a neural net to learn to play chess effectively,
>starting even from scratch.  Haven't gotten around to coding it yet, or
>testing it, but if you'd like to try it yourself, I've posted a specification
>below.
>
>the search program (the "left hemisphere") acts as a 'predictor'.  It searches
>out the best configurations N moves ahead of the current configuration.  The
>neural net (the "right hemisphere") acts as an 'evaluator'.  It's purpose is
>to fill in the role formerly held by the hard-coded heuristics you usually
>would have otherwise found in a chess-playing program.  These two components
>reenforce each other in such a way that the evaluator gradually learns to
>"squash" more and more of the N-move search tree into a static global
>evaluation function.  This, of course, serves to increase the effectiveness of
>the predictor.

There is the seed of a very nice idea here.  i.e., the best evaluation 
function in chess is a fixed point of the "minimax search" operator in
function space.  (Spelled out, that means that for an optimal evaluation
function E from positions to values, evaluating the position via minimax
search, applying E at the end of the search tree, should provide exactly the
same result as applying E directly to the position.)  Combined with certain
constraints on E, e.g. E(checkmate position) = 1, E(checkmated position) = -1,
E(2 kings) = 0, this characterization should provide a very tightly
constrained evaluation function that would play a mean game of chess.  So,
is there any way to take advantage of this by using methods of fixed-point
analysis?  I confess to know almost nothing about computer chess, so I
don't know if the idea is commonplace there.

Your suggestion of bootstrapping the evaluator with the predictor is a nice
way of taking advantage of this property.  There are a couple of problems,
though...

>   (b) Learning
>   Upon completion of a move, the evaluator is updated with the training pair
>(value, configuration).

>(4) EXPECTED PERFORMANCE
>   Provided the network was designed with care, I expect to see impressive
>gains in playing ability over time.
>   (a) Static evaluation and search-tree squashing.
>   (b) Spontaneous emergence of openings.
>   (c) Spontaneous emergence of endgames.

You may be overestimating the capabilities of networks just a tad.  Network
training isn't magic... networks are ultimately pretty simple
structures, not great for computing incredibly complex mappings like the ones
required here (yes, I know that you can get universal approximation, but only
at the cost of generalization).  After all, if you expect the above to work,
why not just try: train the network directly on (position, move) pairs directly
derived from the games of Kasparov and other grandmasters.  We'll get a
grandmaster network, easy!  And once we've done that, why not try: train a
network to simulate the total performance of a person in all domains by
training it from data provided over the lifetime of a person: (sensory
input at time t, motor output at time t).  Human intelligence, just like
that!  Of course, you'll need to use a recurrent network, but that's no
problem, we can just train it up with back-propagation through time or the
Williams/Zipser algorithm.  I think you see the problem by now...

>(2) INITIALIZATION
>   The evaluator MAY be initialized by training it using an already available
>evaluation function as a supervisor to incorporate already developed expertise
>(why re-invent the wheel?), or can simply be set to a random configuration to
>simulate the process of learning the game from scratch.

One thing you don't want to do is to start completely from scratch.  Even
bootstrapping needs to start from somewhere.  All this would guarantee is
that you would arrive at *some* fixed point of the minimax operator -- but
there are a lot of these -- the constant zero function for instance.  You
have to make your initial evaluator minimally competent (incorporating at
minimum the constraints mentioned in the first paragraph above, and preferably
more) if you want bootstrapping to get anywhere.

Actually, as things stand there's no need for the network to play actual games
at all.  You just generate positions randomly, calculate the minimax+evaluator
prediction, and train the evaluator on that.  (Only one step of minimax search
is required, nothing deep.)  In fact, your proposal doesn't incorporate any
feedback from wins and losses, so playing games doesn't serve any purpose
at all (except perhaps to confine the training of evaluation to a "reasonable"
portion of position space).  And by turning it into a fully-supervised learning
problem, you've made it much easier than it would be with the kind of
reinforcement learning that would be required from playing actual games, with
all the accompanying credit-assignment problems.

Nice idea, and I'm not sure that it couldn't be taken advantage of
in various interesting ways.  Nothing about this need be specific to neural
networks, incidentally -- any kind of learning system would do.  Just one
that has an architecture that guarantees both complete learning and
perfect generalization for the problem at hand.  Which sadly doesn't exist in
1990, but hey, that's only a minor point...

--
Dave Chalmers     (dave@cogsci.indiana.edu)      
Concepts and Cognition, Indiana University.

"It is not the least charm of a theory that it is refutable."