Tic Tac Toe Neural Network as an Evaluation Function

I tried to program the AI ​​for tic tac toe using a multilayer perceptron and backpropagation. My idea was to train the neural network in the exact evaluation function for board states, but even after analyzing thousands of games, the problem is that the network does not give accurate estimates.

I use 27 input neurons; each square on the 3x3 board is connected to three input neurons, which take the values ​​0 or 1 depending on whether the square has x, o or is empty. These 27 input neurons send signals to 10 hidden neurons (I chose 10 randomly, but I also tried 5 and 15).

For training, I had a program that generated a series of games, playing against it, using the current rating function to select what is considered optimal for each side. After creating the game, NN collects training examples (which contain the state of the board and the correct conclusion), considering the correct conclusion for the given state of the board as the value (using the evaluation function) of the state of the board that follows it in the game sequence. I think this is what Gerald Tesauro did when programming TD-Gammon, but I could misinterpret the article. (note: I put a specific mechanism for updating weights at the bottom of this post).

I tried different values ​​for learning speed, as well as different numbers of hidden neurons, but nothing works. Even after “learning” there is no noticeable improvement in strategy, and the evaluation function does not approach accuracy.

I understand that there are much simpler ways to program tic tac toe, but I want to do this with a multi-layer perceptron so that I can use it to connect 4 later. Is it possible? I'm starting to think that there is no reliable rating function for a tic tac toe board with a reasonable amount of hidden neurons.

I assure you that I am not looking for some quick code to include in my homework. I have been working unsuccessfully for some time and just wanted to know what I am doing wrong. All tips are welcome.


This is the specific mechanism that I used for NN:

27 0 1, 1/(1 + e ^ (- x)). (i.output), (i.weights [h]) h. h (h.input), , (h.output). lastInput (h.output * h.weight) . (lastInput).

, err - . dSigmoid (x) x.

h : ( * err * dSigmoid (lastInput) * h.output) h : ( * err * dSigmoid (lastInput) * h.weight * dSigmoid (h.input) * i.output).

backpropagation: http://www.youtube.com/watch?v=UnWL2w7Fuo8.

+3
5

Tic tac toe 3 ^ 9 = 19683 (, , ). , , backpropagation "rote learning" .

, 10 , 20k , . "" , , , , , .

+5

NN, ...

  • 0 1? 0.5 ? tanh (-1 1 X O 0 ), ?

  • ?

  • , , , , 1 , .

  • , , 9 , ?

  • ( , -, )?

  • ?

0

, , . , 40 10 , , - , -, , .

, , .

, 100%, :

After generating a game, the NN compiles training examples (which comprise a board state and the correct output) by taking the correct output for a given board state to be the value (using the evaluation function) of the board state that follows it in the game sequence.

, , - (, ), "" . ? , , , , , , . , , . ( , .)

0

Just to give one more thought, think about how to use reinforcements for this task? It would be much easier to implement and much more efficient. For example, you can use Q-training, which is often used for games.

0
source

Here you can find an implementation for training a neural network in Tik Tak Toe (with resizable boards) using a standalone game. The gradient spreads back throughout the game using a simple gradient copy trick.

0
source

All Articles