Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!uwm.edu!csd4.csd.uwm.edu!markh From: markh@csd4.csd.uwm.edu (Mark William Hopkins) Newsgroups: comp.ai.neural-nets Subject: Re: A neural net who plays chess Message-ID: <6892@uwm.edu> Date: 11 Oct 90 06:50:22 GMT References: <6758@uwm.edu> <62519@iuvax.cs.indiana.edu> <8954@latcs1.oz.au> <8966@latcs1.oz.au> Sender: news@uwm.edu Reply-To: markh@csd4.csd.uwm.edu (Mark William Hopkins) Organization: University of Wisconsin-Milwaukee Lines: 67 In article <8966@latcs1.oz.au> mahler@latcs1.oz.au (Daniel Mahler) writes: On the emergence of goal-directed behavior: > Several people have pointed out >that my claim about trivial fixpoints >is not true as your program should recognise >terminal game positions and give absolute >value to these. You of course have to do this to give >bp some goal directed bias, but for a game of any complexity >there will be only few terminal nodes in your search horizon. >There will usually be some in chess but NONE >in go or othello until the endgame. The evaluator is always learning based on inputs from positions N moves ahead. Initially it will only have substantial input within N moves of the end, but this learning propagates as it takes THIS input in later games and evaluates N moves from *it*: now 2N moves from the end. And so on... The evaluator's trying to converge onto a non-trivial fixed point. The goal directed behavior should propagate from bottom up, as it were. If it really learns, the first thing you'll notice is that it'll progressively avoid bad openings one by one because they lead to quick endings. Eventually, by exclusion, you'll have the good openings left. In that sense, the bottom-up learning also leads to top-down learning. I think the relevant issues are probably whether the network will be big enough to handle the knowledge a typical chess master has of the state space, and whether it can learn quick enough without having to beating it against a wall a hundred thousand times just to even get it to do something as easy as picking up a simple boolean function? On using hard-coded feature detectors to expedite learning: > However, a slight modification of Samuel's idea seems good. >Use feature detectors designed by experts as inputs to a small net. >This will then be Samuels method extended to allow nonlinear combinations >of these features as evaluation functions. What's to keep the evaluator neural net from spontaneously generating nodes that emulate feature detection in its hidden layer(s) anyhow? As a concrete example on a simpler game try this design on tic-tac-toe: LAYER 3: 1 node (output). LAYER 2: N nodes (try N = 12). LAYER 1: 9 nodes (input .. connected to tic-tac-toe grid) For input, a square occupied by the program's piece evaluates to +1, a square occupied by an opponent's piece to -1, and an empty square to 0. Every cell is connected to every other cell in an adjacent layer. Use back-propagation *with thresholds*. Take your learning input from a minimax search program that looks merely 1 move ahead using the output node of the evaluator function to resolve all non-winning/non-losing/non-draw positions. When using +1, and -1 as limmiting activation values the sigmoid function 1/(1 + e^(-X)) becomes tanh(X), by the way, and the error correction factor Y(1 - Y) becomes 1 - Y*Y. I predict two things will happen: (1) The program, when playing X, will open in the middle square with increasing frequency. When playing O, it will move into a corner in response to an X placed in the middle more and more frequently. (2) Some of the hidden-layer nodes will evolve spontaneously into 3-in-a-row detectors.