I tried to program the AI for tic tac toe using a multilayer perceptron and backpropagation. My idea was to train the neural network in the exact evaluation function for board states, but even after analyzing thousands of games, the problem is that the network does not give accurate estimates.
I use 27 input neurons; each square on the 3x3 board is connected to three input neurons, which take the values 0 or 1 depending on whether the square has x, o or is empty. These 27 input neurons send signals to 10 hidden neurons (I chose 10 randomly, but I also tried 5 and 15).
For training, I had a program that generated a series of games, playing against it, using the current rating function to select what is considered optimal for each side. After creating the game, NN collects training examples (which contain the state of the board and the correct conclusion), considering the correct conclusion for the given state of the board as the value (using the evaluation function) of the state of the board that follows it in the game sequence. I think this is what Gerald Tesauro did when programming TD-Gammon, but I could misinterpret the article. (note: I put a specific mechanism for updating weights at the bottom of this post).
I tried different values for learning speed, as well as different numbers of hidden neurons, but nothing works. Even after “learning” there is no noticeable improvement in strategy, and the evaluation function does not approach accuracy.
I understand that there are much simpler ways to program tic tac toe, but I want to do this with a multi-layer perceptron so that I can use it to connect 4 later. Is it possible? I'm starting to think that there is no reliable rating function for a tic tac toe board with a reasonable amount of hidden neurons.
I assure you that I am not looking for some quick code to include in my homework. I have been working unsuccessfully for some time and just wanted to know what I am doing wrong. All tips are welcome.
This is the specific mechanism that I used for NN:
27 0 1, 1/(1 + e ^ (- x)). (i.output), (i.weights [h]) h. h (h.input), , (h.output). lastInput (h.output * h.weight) . (lastInput).
, err - . dSigmoid (x) x.
h : ( * err * dSigmoid (lastInput) * h.output) h : ( * err * dSigmoid (lastInput) * h.weight * dSigmoid (h.input) * i.output).
backpropagation: http://www.youtube.com/watch?v=UnWL2w7Fuo8.