Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!uwm.edu!csd4.csd.uwm.edu!markh
From: markh@csd4.csd.uwm.edu (Mark William Hopkins)
Newsgroups: comp.ai.neural-nets
Subject: A neural net who plays chess
Summary: Hybrid architectures
Keywords: Search, Neural Net
Message-ID: <6758@uwm.edu>
Date: 4 Oct 90 07:17:59 GMT
Sender: news@uwm.edu
Reply-To: markh@csd4.csd.uwm.edu (Mark William Hopkins)
Organization: University of Wisconsin-Milwaukee
Lines: 88


   I've been sitting on this extremely simple but potentially very powerful
design which would enable a neural net to learn to play chess effectively,
starting even from scratch.  Haven't gotten around to coding it yet, or
testing it, but if you'd like to try it yourself, I've posted a specification
below.

(1) ARCHITECTURE
   The design is actually of a hybrid architecture: embodying both your typical
game-playing/search AI program and a typical neural net.  Simply put, the
search program (the "left hemisphere") acts as a 'predictor'.  It searches
out the best configurations N moves ahead of the current configuration.  The
neural net (the "right hemisphere") acts as an 'evaluator'.  It's purpose is
to fill in the role formerly held by the hard-coded heuristics you usually
would have otherwise found in a chess-playing program.  These two components
reenforce each other in such a way that the evaluator gradually learns to
"squash" more and more of the N-move search tree into a static global
evaluation function.  This, of course, serves to increase the effectiveness of
the predictor.

   (a) Evaluator
   The input to the evaluator is a board configuration which contains
information to allow you to completely determine the position of any piece
on the board (or whether it has been captured or not).  It can be as simple
as a matrix, or as complex as a net with 'diagonal detectors', 'knight-L
detectors', etc. built directly into the input representation.  Alternatively,
it might just be a list of board locations indexed by piece.  The output is a
number, say, between -1 and 1, indicating the value of the board.  A 1
indicates that the board is a winning board from the program's point of view,
and -1 ... a losing configuration.

   (b) Predictor
   The predictor is a program to implement search, such as minimax with a
cutoff.  At the cutoff, the evaluator is used to estimate the configuration in
question.  This allows the predictor to return with a value in short finite
time.
   It also contains the body of rules that determine what comprises a valid
move, and a winning, losing, or stalemate configuration.

(2) INITIALIZATION
   The evaluator MAY be initialized by training it using an already available
evaluation function as a supervisor to incorporate already developed expertise
(why re-invent the wheel?), or can simply be set to a random configuration to
simulate the process of learning the game from scratch.

(3) THE PLAY CYCLE
   (a) Move generation
   As mentioned above, to generate a move, the predictor performs a search
and uses the evaluator at the cut-off point.  It comes back with a best
possible move AND an estimate as to the value of the current configuration.

   (b) Learning
   Upon completion of a move, the evaluator is updated with the training pair
(value, configuration).

   After the opponent makes a move, the play cycle is then resumed at (1).
Play continues, of course, until the program or its opponent wins (or, using
standard rules, until a draw or stalemate happens).

(4) EXPECTED PERFORMANCE
   Provided the network was designed with care, I expect to see impressive
gains in playing ability over time.

   (a) Static evaluation and search-tree squashing.
      As time goes on, because of the feedback the predictor is providing to
      the evaluator, the evaluator will actually learn to make static
      evaluations of the current board, effectively squashing the N-move
      search tree into a single global evaluation.  This kind of static
      evaluation in lieu of searching is precisely what good chess players
      claim to be able to do.

   (b) Spontaneous emergence of openings.
      After getting its butt severely kicked the first few hundred thousand
      games (again, we're talking about the case where the evaluator starts
      from zero-knowledge), natural pathways will develop that correspond to
      the openings typically taught to a novice.

   (c) Spontaneous emergence of endgames.
      If the input network architecture has been designed properly, the
      evaluator may be able enough to make generalizations (say, of
      tranalation-invariance) to embody knowledge of the more-or-less
      standard endgame configurations.

(5) EMBEDDING THE PREDICTOR
   An interesting extension to the design specified above would be to embed
the search control structure of the predictor itself in a neural net.  If this
is done, then the hybrid "crutch" (that's all it really is: a crutch) can
finally be discarded.