Path: utzoo!attcan!uunet!munnari.oz.au!mel.dit.csiro.au!latcs1!mahler From: mahler@latcs1.oz.au (Daniel Mahler) Newsgroups: comp.ai.neural-nets Subject: Re: A neural net who plays chess Message-ID: <8954@latcs1.oz.au> Date: 9 Oct 90 13:22:08 GMT References: <6758@uwm.edu> <62519@iuvax.cs.indiana.edu> Reply-To: mahler@latcs1.oz.au (Daniel Mahler) Organization: Comp Sci, La Trobe Uni, Australia Lines: 59 I did my honours thesis on using NN's learn to play games. I used a program based on the principle you describe to learn noughts and crosses. There are 2 points worth making about this aproach. 1) It is essentially an extension of the learning technique in Samuel's checkers program. Samuel used an evaluation function that was a linear combination of given feature functions. These weights were optimised by getting the minimax value for a position using the current weights, and using it as a training signal. Samuel worked before the perceptron era, so he did not call it a perceptron. His technique worked because the feature selectors were designed by a human expert. I tried the natural extension: multilevel perceptron with bp, working from a simple representation of the board and trying to see if the net can develop tactical and strategic concepts. I started with random weights, because my interests were mainly theoretical (that's what I always say when something doesn't work 8>) ) Part of the reason for my techniques failure is the second point. 2) The minimax operator does not have a unique fix-point (I like this terminology, shame it's to late for my thesis). In fact it has infinitely many, as any constant value function is a fix-point. This means, in NN terms, lots and lots of local minima. Supplying known strategic features as inputs simplifies the energy space. This kind of learning is probably affected by game tree pathology, especially in the early stages. I think the solution to this problem is to incorporate a genetic techniques with large populations and/or use reinforcement learning (Barto & Sutton's Arp element). Samuel (and I) used a sort of genetic technique with population size 2. There are 2 nets that play each other called champion and challenger, or master and apprentice. The weights in champion are fixed. Challenger learns by the method under discussion. When challenger outperforms champion by some criterion, it becomes champion and a new random challenger is generated. This leads to peculiarities which i ascribe to 'inbreeding'. The graph of the number of games taken by each successive challenger to become champion is a saw-tooth. Each new challenger takes longer than the last, (which I interpret as each champion being better) until suddenly there is a champion that lasts a very short time, and then the cycle repeats, though the peaks seem to get higher. (fractals fanatics may wonder if there are peaks within peaks). I take this to mean that eventually there evolve nets designed to beat the champion though they can be beaten by a random player. Daniel Mahler beating D A