Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!samsung!noose.ecn.purdue.edu!iuvax!cogsci!dave From: dave@cogsci.indiana.edu (David Chalmers) Newsgroups: comp.ai.neural-nets Subject: Re: A neural net who plays chess Message-ID: <62519@iuvax.cs.indiana.edu> Date: 6 Oct 90 08:20:21 GMT References: <6758@uwm.edu> Sender: news@iuvax.cs.indiana.edu Reply-To: dave@cogsci.indiana.edu (David Chalmers) Organization: Indiana University, Bloomington Lines: 98 In article <6758@uwm.edu> markh@csd4.csd.uwm.edu (Mark William Hopkins) writes: > I've been sitting on this extremely simple but potentially very powerful >design which would enable a neural net to learn to play chess effectively, >starting even from scratch. Haven't gotten around to coding it yet, or >testing it, but if you'd like to try it yourself, I've posted a specification >below. > >the search program (the "left hemisphere") acts as a 'predictor'. It searches >out the best configurations N moves ahead of the current configuration. The >neural net (the "right hemisphere") acts as an 'evaluator'. It's purpose is >to fill in the role formerly held by the hard-coded heuristics you usually >would have otherwise found in a chess-playing program. These two components >reenforce each other in such a way that the evaluator gradually learns to >"squash" more and more of the N-move search tree into a static global >evaluation function. This, of course, serves to increase the effectiveness of >the predictor. There is the seed of a very nice idea here. i.e., the best evaluation function in chess is a fixed point of the "minimax search" operator in function space. (Spelled out, that means that for an optimal evaluation function E from positions to values, evaluating the position via minimax search, applying E at the end of the search tree, should provide exactly the same result as applying E directly to the position.) Combined with certain constraints on E, e.g. E(checkmate position) = 1, E(checkmated position) = -1, E(2 kings) = 0, this characterization should provide a very tightly constrained evaluation function that would play a mean game of chess. So, is there any way to take advantage of this by using methods of fixed-point analysis? I confess to know almost nothing about computer chess, so I don't know if the idea is commonplace there. Your suggestion of bootstrapping the evaluator with the predictor is a nice way of taking advantage of this property. There are a couple of problems, though... > (b) Learning > Upon completion of a move, the evaluator is updated with the training pair >(value, configuration). >(4) EXPECTED PERFORMANCE > Provided the network was designed with care, I expect to see impressive >gains in playing ability over time. > (a) Static evaluation and search-tree squashing. > (b) Spontaneous emergence of openings. > (c) Spontaneous emergence of endgames. You may be overestimating the capabilities of networks just a tad. Network training isn't magic... networks are ultimately pretty simple structures, not great for computing incredibly complex mappings like the ones required here (yes, I know that you can get universal approximation, but only at the cost of generalization). After all, if you expect the above to work, why not just try: train the network directly on (position, move) pairs directly derived from the games of Kasparov and other grandmasters. We'll get a grandmaster network, easy! And once we've done that, why not try: train a network to simulate the total performance of a person in all domains by training it from data provided over the lifetime of a person: (sensory input at time t, motor output at time t). Human intelligence, just like that! Of course, you'll need to use a recurrent network, but that's no problem, we can just train it up with back-propagation through time or the Williams/Zipser algorithm. I think you see the problem by now... >(2) INITIALIZATION > The evaluator MAY be initialized by training it using an already available >evaluation function as a supervisor to incorporate already developed expertise >(why re-invent the wheel?), or can simply be set to a random configuration to >simulate the process of learning the game from scratch. One thing you don't want to do is to start completely from scratch. Even bootstrapping needs to start from somewhere. All this would guarantee is that you would arrive at *some* fixed point of the minimax operator -- but there are a lot of these -- the constant zero function for instance. You have to make your initial evaluator minimally competent (incorporating at minimum the constraints mentioned in the first paragraph above, and preferably more) if you want bootstrapping to get anywhere. Actually, as things stand there's no need for the network to play actual games at all. You just generate positions randomly, calculate the minimax+evaluator prediction, and train the evaluator on that. (Only one step of minimax search is required, nothing deep.) In fact, your proposal doesn't incorporate any feedback from wins and losses, so playing games doesn't serve any purpose at all (except perhaps to confine the training of evaluation to a "reasonable" portion of position space). And by turning it into a fully-supervised learning problem, you've made it much easier than it would be with the kind of reinforcement learning that would be required from playing actual games, with all the accompanying credit-assignment problems. Nice idea, and I'm not sure that it couldn't be taken advantage of in various interesting ways. Nothing about this need be specific to neural networks, incidentally -- any kind of learning system would do. Just one that has an architecture that guarantees both complete learning and perfect generalization for the problem at hand. Which sadly doesn't exist in 1990, but hey, that's only a minor point... -- Dave Chalmers (dave@cogsci.indiana.edu) Concepts and Cognition, Indiana University. "It is not the least charm of a theory that it is refutable."