Implementing a Python Tic-Tac-Toe game

Let's build a basic implementation of Tic-Tac-Toe so we can see what an implementation of the min-max algorithm looks like. If you do not feel like copying all of this, you can find the full code in the GitHub repository https://github.com/DanielSlater/PythonDeepLearningSamples in the tic_tac_toe.py file.

In the game board, we will be represented by a 3 x 3 tuple of integers. Tuples are used instead of lists so that later on, we can get equality between matching board states. In this case, 0 represents a square that has not been played in. The two players will be marked 1 and -1. If player one makes a move in a square, that square will be marked with their number. So here we go:

def new_board():
   return ((0,0,0),
          (0,0,0),
          (0,0,0))

The new_board method will be called before the play for a fresh board, ready for the players to make their moves on:

def apply_move(board_state, move, side):
    move_x, move_y = move
    state_list = list(list(s) for s in board_state)
    state_list[move_x][move_y] = side
    return tuple(tuple(s) for s in state_list)

The apply_move method takes one of the 3 x 3 tuples for board_state and returns a new board_state with the move by the given side applied. A move will be a tuple of length 2, containing the coordinate of the space that we want to move to as two integers. Side will an integer representing the player who is playing the move, either 1 or -1:

import itertools

def available_moves(board_state):
    for x, y in itertools.product(range(3), range(3)):
            if board_state[x][y] == 0:
                  yield (x, y)

This method gives us the list of legal moves for a given 3 x 3 board_state, which is simply all the non-zero squares. Now we just need a method to determine whether a player has the three winning marks in a row:

def has_3_in_a_line(line):
  return all(x==-1 for x in line) | all(x==1 for x in line)

The has_3_in_a_line takes a sequence of three squares from the board. If all are either 1 or -1, it means one of the players has gotten three in a row and has won. We then need to run this method against each possible line on the Tic-Tac-Toe board to determine whether a player has won:

def has_winner(board_state):
    # check rows
    for x in range(3):
        if has_3_in_a_line (board_state[x]):
            return board_state[x][0]
    # check columns
    for y in range(3):
        if has_3_in_a_line([i[y] for i in board_state]):
            return board_state[0][y]
    # check diagonals
    if has_3_in_a_line([board_state[i][i] for i in range(3)]):
        return board_state[0][0]
    if has_3_in_a_line([board_state[2 - i][i] for i in range(3)]):
        return board_state[0][2]
    return 0 # no one has won

With just these few functions, you can now play a game of Tic-Tac-Toe. Simply start by getting a new board, then have the players successively choose moves and apply those moves to board_state. If we find that there are no available moves left, the game is a draw. Otherwise, if has_winner returns either 1 or -1, it means one of the players has won. Let's write a simple function for running a Tic-Tac-Toe game with the moves decided by methods that we pass in, which will be the control policies of the different AI players that we will try out:

def play_game(plus_player_func, minus_player_func):
    
board_state = new_board()
    
player_turn = 1

We declare the method and take it to the function that will choose the action for each player. Each player_func will take two arguments: the first being the current board_state and the second being the side that the player is playing, 1 or -1. The player_turn variable will keep track of this for us:

    while True:
        _available_moves = list(available_moves(board_state))
        if len(_available_moves) == 0:
            print("no moves left, game ended a draw")
            return 0.

This is the main loop of the game. First we have to check whether there are any available moves left on board_state; if there are, the game is not over and it is a draw:

        if player_turn > 0:
            move = plus_player_func(board_state, 1)
        else:
            move = minus_player_func(board_state, -1)

Run the function associated with whichever player's turn it is to decide a move:

        if move not in _avialable_moves:
            # if a player makes an invalid move the other player wins
            print("illegal move ", move)
            return -player_turn

If either player makes an illegal move, that is an automatic loss. Agents should know better:

        board_state = apply_move(board_state, move, player_turn)
        print(board_state)

        winner = has_winner(board_state)
        if winner != 0:
            print("we have a winner, side: %s" % player_turn)
            return winner
        player_turn = -player_turn

Apply the move to board_state and check whether we have a winner. If we do, end the game; if we don't, switch player_turn to the other player and loop back around.

Here is how we could write a method for a control policy that would choose actions completely at random out of the available legal moves:

def random_player(board_state, side):
    moves = list(available_moves(board_state))
    return random.choice(moves)

Let's run two random players against each other and check whether the output might look something like this:

play_game(random_player, random_player)

((0, 0, 0), (0, 0, 0), [1, 0, 0])
([0, -1, 0], (0, 0, 0), [1, 0, 0])
([0, -1, 0], [0, 1, 0], [1, 0, 0])
([0, -1, 0], [0, 1, 0], [1, -1, 0])
([0, -1, 0], [0, 1, 1], [1, -1, 0])
([0, -1, 0], [0, 1, 1], [1, -1, -1])
([0, -1, 1], [0, 1, 1], [1, -1, -1])
we have a winner, side: 1

Now we have a good way of trying out different control policies on a board game, so let's go about writing something a bit better. We can start with a min-max function that should play at a much higher standard than our current random players. The full code for the min-max function is also available in the GitHub repo in the min_max.py file.

Tic-tac-toe is a game with a small space of possibilities, so we could simply run a min-max for the whole game from the board's starting position until we have gone through every possible move for every player. But it is good practice to still use an evaluation function, as for most other games we might play, this will not be the case. The evaluation function here will give us one point for getting two in a line if the third space is empty; it'll be the opposite if our opponent achieves this. First, we will need a method for scoring each individual line that we might make. The score_line will take sequences of length 3 and score them:

def score_line(line):
    minus_count = line.count(-1)
    plus_count = line.count(1)
    if plus_count == 2 and minus_count == 0:
        return 1
    elif minus_count == 2 and plus_count == 0:
        return -1
    return 0

Then the evaluate method simply runs through each possible line on the tic-tac-toe board and sums them up:

def evaluate(board_state):
    score = 0
    for x in range(3):
        score += score_line(board_state[x])
    for y in range(3):
        score += score_line([i[y] for i in board_state])
    #diagonals
    score += score_line([board_state[i][i] for i in range(3)])
    score += score_line([board_state[2-i][i] for i in range(3)])

    return score

Then, we come to the actual min_max algorithm method:

def min_max(board_state, side, max_depth):
    best_score = None
    best_score_move = None

The first two arguments to the method, which we are already familiar with, are board_state and side; however, max_depth is new. Min-max is a recursive algorithm, and max_depth will be the maximum number of recursive calls we will make before we stop going down the tree and just evaluate it to get the result. Each time we call min_max recursively, we will reduce max_depth by 1, stopping to evaluate when we hit 0:

    moves = list(available_moves(board_state))
    if not moves:
        return 0, None

If there are no moves to make, then there is no need to evaluate anything; it is a draw, so let's return with a score of 0:

    for move in moves:
        new_board_state = apply_move(board_state, move, side)

Now we will run through each legal move and create a new_board_state with that move applied:

        
winner = has_winner(new_board_state)
        if winner != 0:
            return winner * 10000, move

Check whether the game is already won in this new_board_state. There is no need to do any more recursive calling if the game is already won. Here, we are multiplying the winner's score by 1,000; this is just an arbitrary large number so that an actual win or loss is always considered better/worse than the most extreme result we might get from a call to evaluate:

        else:
            if max_depth <= 1:
                score = evaluate(new_board_state)
            else:
                score, _ = min_max(new_board_state, -side, max_depth - 1)

If you don't already have a winning position, then the real meat of the algorithm starts. If you have reached max_depth, then now is the time to evaluate the current board_state to get our heuristic for how favorable the current position is to the first player. If you haven't reached max_depth, then recursively call min_max with a lower max_depth until you hit the bottom:

            if side > 0:
                if best_score is None or score > best_score:
                    best_score = score
                    best_score_move = move
            else:
                if best_score is None or score < best_score:
                    best_score = score
                    best_score_move = move
       return best_score, best_score_move

Now that we have our evaluation for the score in new_board_state, we want either the best or worst scoring position depending on which side we are. We keep track of which move leads to this in the best_score_move variable, which we return to with the score at the end of the method.

A min_max_player method can now be created to go to our earlier play_game method:

def min_max_player(board_state, side):
    return min_max(board_state, side, 5)[1]

Now if we run a series of games with random_player against a min_max player, we will find that the min_max player wins almost every time.

The min max algorithm, though important to understand, is never used in practice because there is a better version of it: min max with alpha beta pruning. This takes advantage of the fact that certain branches of the tree can be ignored or pruned, without needing to be fully evaluated. Alpha beta pruning will produce the same result as min max but with, on average, half as much search time.

To explain the idea behind alpha beta pruning, let's consider that while building our min-max tree, half of the nodes are trying to make decisions to maximize the score and the other half to minimize it. As we start evaluating some of the leaves, we get results that are good for both min and max decisions. If taking a certain path through the tree scores, say -6, the min branch knows it can get this score by following the branch. The thing that stops it from using this score is that max decisions has to make the decisions, and it cannot choose a leaf favorable to the min node.

But as more leaves are evaluated, another might be good for the max node, with a score of +5. The max node will never choose a worse outcome than this. But now that we have a score for both min and max, we know if we start going down a branch where the best score for min is worse than -6 and the best score for max is worse than +5, then neither min nor max will choose this branch, and we can save on the evaluation of that whole branch.

The alpha in alpha beta pruning stores the best result that the max decisions can achieve. The beta stores the best result (lowest score) that the min decisions can achieve. If alpha is ever greater than or equal to beta, we know we can skip further evaluation of the current branch we are on. This is because both the decisions already have better options.

Figure 4 gives an example of this. Here see that from the very first leaf itself, we can set an alpha value of 0. This is because once the max player has found a score of 0 in a branch, they need never choose a lower score. Next, in the third leaf across, the score is 0 again, so the min player can set their beta score to 0. The branch that reads branch ignored no longer needs to be evaluated because both alpha and beta are 0.

To understand this, consider all the possible results that we could get from evaluating the branch. If it were to result in a score of +1, then the min player would simply choose an already existing branch where it had scored 0. In this case, the branch to the ignored branches left. If the score results in -1, then the max player would simply choose the left most branch in the image where they can get 0. Finally, if it results in a score of 0, it means no one has improved, so the evaluation of our position remains unchanged. You will never get a result where evaluating a branch would change the overall evaluation of the position. Here is an example of the min max method modified to use alpha beta pruning:

import sys

def
Implementing a Python Tic-Tac-Toe game

Figure 4: Min max method with alpha beta pruning

min_max_alpha_beta(board_state, side, max_depth, 
                       alpha=-sys.float_info.max,
                       beta=sys.float_info.max):

We now pass in both alpha and beta as parameters; we stop searching through the branches that are either less than alpha or more than beta:

    best_score_move = None
    moves = list(available_moves(board_state))
    if not moves:
        return 0, None

    for move in moves:
        new_board_state = apply_move(board_state, move, side)
        winner = has_winner(new_board_state)
        if winner != 0:
            return winner * 10000, move
        else:
            if max_depth <= 1:
                score = evaluate(new_board_state)
            else:
                score, _ = min_max_alpha_beta(new_board_state, -side, max_depth - 1, alpha, beta)

Now when we recursively call min_max_alpha_beta, we pass in our new alpha and beta values that may have been updated as part of the search:

        if side > 0:
            if score > alpha:
                alpha = score
                best_score_move = move

The side > 0 expression means that we are looking to maximize our score, so we will store the score in the alpha variable if it's better than our current alpha:

        else:
            if score < beta:
                beta = score
                best_score_move = move

If side is < 0 we are minimizing, so store the lowest scores in the beta variable:

        if alpha >= beta:
            break

If alpha is greater than beta, then this branch cannot improve the current score, so we stop searching it:

    return alpha if side > 0 else beta, best_score_move

In 1997, IBM created a chess program called Deep Blue. It was the first to beat the reigning world chess champion Garry Kasparov. While an amazing achievement, it would be hard to call Deep Blue intelligent. Though, it has huge computational power, and its underlying algorithm is just the same min-max algorithm from the 50's. The only major difference is that Deep Blue took advantage of the opening theory in chess.

The opening theory comprises of a sequences of moves that are from the starting position and are known to lead to favorable or unfavorable positions. For example, if white starts with the move pawn e4 (the pawn in front of the king moved forward by two spaces), then black responds with pawn c5; this is known as the Sicilian defense, and there are many books written on the sequences of play that could follow from this position. Deep Blue was programmed to simply follow the best moves recommended from these opening books and only start calculating the best min-max move once the opening line of play reaches its end. In this way, it saves on computational time, but it also takes advantage of the vast human research that has gone into the working out of the best positions in the opening stages of chess.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset