Chapter 7. Deep Learning for Board Games

You may have read sci-fi novels from the 50's and 60's; they are full of visions of what life in the 21st century would look like. They imagined a world of people with personal jet packs, underwater cities, intergalactic travel, flying cars, and truly intelligent robots capable of independent thought. The 21st century has arrived now; sadly, we are not going to get those flying cars, but thanks to deep learning, we may get that robot.

What does this have to do with deep learning for board games? In the next two chapters, including the current one, we will look at how to build Artificial Intelligence (AI) that can learn game environments. Reality has a vast space of possibilities. Doing even simple human tasks, such as getting a robot arm to pick up objects, requires analyzing huge amounts of sensory data and controlling many continuous response variables for the movement of the arms.

Games act as a great playing field for testing general purpose learning algorithms. They give you an environment of large, but manageable possibilities. Also, when it comes to computer games, we know that humans can learn to play a game just from the pixels visible on the screen and the most minor of instructions. If we input the same pixels plus an objective into a computer agent, we know we have a solvable problem, given the right algorithm. In fact, for the computer, the problem is easier because a human being identifies that the things they seeing in their field of vision are actually game pixels, as opposed to the area around the screen. This is why so many researchers are looking at games as a great place to start developing true AI's—self-learning machines that can operate independently from us. Also, if you like games, it's lots of fun.

In this chapter, we will cover the different tools used for solving board games, such as checkers and chess. Eventually, we'll build up enough knowledge to be able to understand and implement the kind of deep learning solution that was used to build AlphaGo, the AI that defeated the greatest human Go player. We'll use a variety of deep learning techniques to accomplish this. The next chapter will build on this knowledge and cover how deep learning can be used to learn how to play computer games, such as Pong and Breakout.

The full list of concepts that we will cover across both the chapters is as follows:

  • The min-max algorithm
  • Monte-Carlo Tree Search
  • Reinforcement learning
  • Policy gradients
  • Q-learning
  • Actor-Critic
  • Model-based approaches

We will use a few different terms to describe tasks and their solutions. The following are some of the definitions. They all use the example of a basic maze game as it is a good, simple example of a reinforcement learning environment. In a maze game, there are a set of locations with paths between them. There is an agent in this maze that can use the paths to move between the different locations. Some locations have a reward associated with them. The agent's objective is to navigate their way through the maze to get the best possible reward.

Deep Learning for Board Games

Figure 1

  • Agent is the entity for which we are trying to learn actions. In the game, this is the player that will try to find its way through the maze.
  • Environment is a world/level/game in which the agent operates, that is, the maze itself.
  • Reward is the feedback that the agent gets within the environment. In the case of this example maze game, it might be the exit square or the carrots in the image that the agent is trying to collect. Some mazes may also have traps that give a negative reward, which the agent should try to avoid.
  • State refers to all of the information available to the agent about its current environment. In a maze, the state is simply the agent's position.
  • Action is a possible response, or set of responses, that an agent can make. In a maze, this is a potential path that an agent can take from one state to another.
  • Control policy determines what actions the agent will take. In the context of deep learning, this is the neural network that we will train. Other policies might be selecting actions at random or selecting actions based on the code that the programmer has written.

A lot of this chapter is code-heavy, so as an alternative to copying all the samples from the book, you can find the full code in a GitHub repository at https://github.com/DanielSlater/PythonDeepLearningSamples. All the examples in the chapters are presented using TensorFlow, but the concepts could be translated into other deep learning frameworks.

Early game playing AI

Building AI's to play games started in the 50's with researchers building programs that played checkers and chess. These two games have a few properties in common:

  • They are zero-sum games. Any reward that one player receives is a corresponding loss to the other player and vice versa. When one player wins, the other loses. There is no possibility of cooperation. For example, consider a game such as the prisoner's dilemma; here, the two players can agree to cooperate and both receive a smaller reward.
  • They are both games of perfect information. The entire state of the game is always known to both the players unlike a game such as poker, where the exact cards that your opponents are holding is unknown. This fact reduces the complexity that the AI must handle. It also means that a decision about what the best move can be made is based on just the current state. In poker, the hypothetical optimal decision about how to play would require information that is not just on your current hand and how much money is available to each player, but also about the playing styles of the opponents and what they had bid in the previous positions they were in.
  • Both games are deterministic. If a given move is made by either player, then that will result in an exact next state. In some games, the play may be based on a dice roll or random drawing of a card from a deck; in these cases, there would be many possible next states to consider.

The combination of perfect information and determinism in chess and checkers means that given the current state, we can exactly know what state we will be in if the current player takes an action. This property also chains if we have a state, then takes an action leading to a new state. We can again take an action in this new state to keep playing as far into the future as we want.

To experiment with some of the approaches of mastering board games, we will give examples using a Python implementation of the game called Tic-Tac-Toe. Also known as noughts and crosses, this is a simple game where players take turns making marks on a 3 by 3 grid. The first player to get three marks in a row wins. Tic-Tac-Toe is another deterministic, zero sum, perfect information game and is chosen here because a Python implementation of it is a lot simpler than chess. In fact, the whole game can be done in less than a page of code, which will be shown later in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset