Path: utzoo!attcan!uunet!munnari.oz.au!mel.dit.csiro.au!latcs1!mahler
From: mahler@latcs1.oz.au (Daniel Mahler)
Newsgroups: comp.ai.neural-nets
Subject: Re: A neural net who plays chess
Message-ID: <8954@latcs1.oz.au>
Date: 9 Oct 90 13:22:08 GMT
References: <6758@uwm.edu> <62519@iuvax.cs.indiana.edu>
Reply-To: mahler@latcs1.oz.au (Daniel Mahler)
Organization: Comp Sci, La Trobe Uni, Australia
Lines: 59

I did my honours thesis on using NN's learn to play games.
I used a program based on the principle you describe
to learn noughts and crosses.
There are 2 points worth making about this aproach.
1) It is essentially an extension of the learning technique
in Samuel's checkers program.
Samuel used an evaluation function that was a linear combination
of given feature functions.
These weights were optimised by getting the minimax value for a position
using the current weights, and using it as a training signal.
Samuel worked before the perceptron era, so he did not call it
a perceptron.
His technique worked because the feature selectors were designed by
a human expert.
I tried the natural extension: multilevel perceptron with bp,
working from a simple representation of the board and trying to
see if the net can develop tactical and strategic concepts.
I started with random weights, because my interests were
mainly theoretical (that's what I always say when something
doesn't work 8>) )
Part of the reason for my techniques failure is the second point.

2) The minimax operator does not have a unique fix-point
(I like this terminology, shame it's to late for my thesis).
In fact it has infinitely many, as any constant value function
is a fix-point.
This means, in NN terms, lots and lots of local minima.
Supplying known strategic features as inputs simplifies the energy space.
This kind of learning is probably affected by game tree pathology,
especially in the early stages.

I think the solution to this problem is to incorporate a genetic
techniques with large populations and/or use reinforcement learning
(Barto & Sutton's Arp element).
Samuel (and I) used a sort of genetic technique with population
size 2.
There are 2 nets that play each other called champion and challenger,
or master and apprentice. The weights in champion are fixed.
Challenger learns by the method under discussion.
When challenger outperforms champion by some criterion,
it becomes champion and a new random challenger is generated.
	This leads to peculiarities which i ascribe to 'inbreeding'.
The graph of the number of games taken by each
successive challenger to become champion is a saw-tooth.
Each new challenger takes longer than the last,
(which I interpret as each champion being better)
until suddenly there is a champion that lasts a very short time,
and then the cycle repeats, though the peaks seem to get higher.
(fractals fanatics may wonder if there are peaks within peaks).
I take this to mean that eventually there evolve nets
designed to beat the champion though they can be beaten by
a random player.
	Daniel Mahler

beating


D
A