Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!uwm.edu!csd4.csd.uwm.edu!markh
From: markh@csd4.csd.uwm.edu (Mark William Hopkins)
Newsgroups: comp.ai.neural-nets
Subject: Re: A neural net who plays chess
Message-ID: <6892@uwm.edu>
Date: 11 Oct 90 06:50:22 GMT
References: <6758@uwm.edu> <62519@iuvax.cs.indiana.edu> <8954@latcs1.oz.au> <8966@latcs1.oz.au>
Sender: news@uwm.edu
Reply-To: markh@csd4.csd.uwm.edu (Mark William Hopkins)
Organization: University of Wisconsin-Milwaukee
Lines: 67

In article <8966@latcs1.oz.au> mahler@latcs1.oz.au (Daniel Mahler) writes:

On the emergence of goal-directed behavior:
>	Several people have pointed out
>that my claim about trivial fixpoints
>is not true as your program should recognise
>terminal game positions and give absolute
>value to these.  You of course have to do this to give
>bp some goal directed bias, but for a game of any complexity
>there will be only few terminal nodes in your search horizon.
>There will usually be some in chess but NONE
>in go or othello until the endgame.

The evaluator is always learning based on inputs from positions N moves
ahead.  Initially it will only have substantial input within N moves
of the end, but this learning propagates as it takes THIS input in later games
and evaluates N moves from *it*: now 2N moves from the end.  And so on...  The
evaluator's trying to converge onto a non-trivial fixed point.

The goal directed behavior should propagate from bottom up, as it were.

If it really learns, the first thing you'll notice is that it'll progressively
avoid bad openings one by one because they lead to quick endings.  Eventually,
by exclusion, you'll have the good openings left.

In that sense, the bottom-up learning also leads to top-down learning.

I think the relevant issues are probably whether the network will be big
enough to handle the knowledge a typical chess master has of the state space,
and whether it can learn quick enough without having to beating it against a
wall a hundred thousand times just to even get it to do something as easy as
picking up a simple boolean function?

On using hard-coded feature detectors to expedite learning:
>	However, a slight modification of Samuel's idea seems good.
>Use feature detectors designed by experts as inputs to a small net.
>This will then be Samuels method extended to allow nonlinear combinations
>of these features as evaluation functions.

What's to keep the evaluator neural net from spontaneously generating nodes
that emulate feature detection in its hidden layer(s) anyhow?

As a concrete example on a simpler game try this design on tic-tac-toe:

LAYER 3: 1 node (output).
LAYER 2: N nodes (try N = 12).
LAYER 1: 9 nodes (input .. connected to tic-tac-toe grid)

For input, a square occupied by the program's piece evaluates to +1, a square
occupied by an opponent's piece to -1, and an empty square to 0.

Every cell is connected to every other cell in an adjacent layer.  Use
back-propagation *with thresholds*.  Take your learning input from a minimax
search program that looks merely 1 move ahead using the output node of the
evaluator function to resolve all non-winning/non-losing/non-draw positions.

When using +1, and -1 as limmiting activation values the sigmoid function
1/(1 + e^(-X)) becomes tanh(X), by the way, and the error correction factor
Y(1 - Y) becomes 1 - Y*Y.

I predict two things will happen:
   (1) The program, when playing X, will open in the middle square
       with increasing frequency.  When playing O, it will move into a
       corner in response to an X placed in the middle more and more
       frequently.
   (2) Some of the hidden-layer nodes will evolve spontaneously into
       3-in-a-row detectors.