Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!uwm.edu!csd4.csd.uwm.edu!markh From: markh@csd4.csd.uwm.edu (Mark William Hopkins) Newsgroups: comp.ai.neural-nets Subject: A neural net who plays chess Summary: Hybrid architectures Keywords: Search, Neural Net Message-ID: <6758@uwm.edu> Date: 4 Oct 90 07:17:59 GMT Sender: news@uwm.edu Reply-To: markh@csd4.csd.uwm.edu (Mark William Hopkins) Organization: University of Wisconsin-Milwaukee Lines: 88 I've been sitting on this extremely simple but potentially very powerful design which would enable a neural net to learn to play chess effectively, starting even from scratch. Haven't gotten around to coding it yet, or testing it, but if you'd like to try it yourself, I've posted a specification below. (1) ARCHITECTURE The design is actually of a hybrid architecture: embodying both your typical game-playing/search AI program and a typical neural net. Simply put, the search program (the "left hemisphere") acts as a 'predictor'. It searches out the best configurations N moves ahead of the current configuration. The neural net (the "right hemisphere") acts as an 'evaluator'. It's purpose is to fill in the role formerly held by the hard-coded heuristics you usually would have otherwise found in a chess-playing program. These two components reenforce each other in such a way that the evaluator gradually learns to "squash" more and more of the N-move search tree into a static global evaluation function. This, of course, serves to increase the effectiveness of the predictor. (a) Evaluator The input to the evaluator is a board configuration which contains information to allow you to completely determine the position of any piece on the board (or whether it has been captured or not). It can be as simple as a matrix, or as complex as a net with 'diagonal detectors', 'knight-L detectors', etc. built directly into the input representation. Alternatively, it might just be a list of board locations indexed by piece. The output is a number, say, between -1 and 1, indicating the value of the board. A 1 indicates that the board is a winning board from the program's point of view, and -1 ... a losing configuration. (b) Predictor The predictor is a program to implement search, such as minimax with a cutoff. At the cutoff, the evaluator is used to estimate the configuration in question. This allows the predictor to return with a value in short finite time. It also contains the body of rules that determine what comprises a valid move, and a winning, losing, or stalemate configuration. (2) INITIALIZATION The evaluator MAY be initialized by training it using an already available evaluation function as a supervisor to incorporate already developed expertise (why re-invent the wheel?), or can simply be set to a random configuration to simulate the process of learning the game from scratch. (3) THE PLAY CYCLE (a) Move generation As mentioned above, to generate a move, the predictor performs a search and uses the evaluator at the cut-off point. It comes back with a best possible move AND an estimate as to the value of the current configuration. (b) Learning Upon completion of a move, the evaluator is updated with the training pair (value, configuration). After the opponent makes a move, the play cycle is then resumed at (1). Play continues, of course, until the program or its opponent wins (or, using standard rules, until a draw or stalemate happens). (4) EXPECTED PERFORMANCE Provided the network was designed with care, I expect to see impressive gains in playing ability over time. (a) Static evaluation and search-tree squashing. As time goes on, because of the feedback the predictor is providing to the evaluator, the evaluator will actually learn to make static evaluations of the current board, effectively squashing the N-move search tree into a single global evaluation. This kind of static evaluation in lieu of searching is precisely what good chess players claim to be able to do. (b) Spontaneous emergence of openings. After getting its butt severely kicked the first few hundred thousand games (again, we're talking about the case where the evaluator starts from zero-knowledge), natural pathways will develop that correspond to the openings typically taught to a novice. (c) Spontaneous emergence of endgames. If the input network architecture has been designed properly, the evaluator may be able enough to make generalizations (say, of tranalation-invariance) to embody knowledge of the more-or-less standard endgame configurations. (5) EMBEDDING THE PREDICTOR An interesting extension to the design specified above would be to embed the search control structure of the predictor itself in a neural net. If this is done, then the hybrid "crutch" (that's all it really is: a crutch) can finally be discarded.