The Unity Obstacle Tower Challenge

The Unity Obstacle Tower Challenge was introduced in February 2019 as a discrete visual learning problem. As we have seen before, this is the holy grail of learning for games, robotics, and other simulations. What makes it more interesting is this challenge was introduced outside of ML-Agents and requires the challenger to write their own Python code from scratch to control the game—something we have come close to learning how to do in this book, but we omitted the technical details. Instead, we focused on the fundamentals of tuning hyperparameters, understanding rewards, and the agent state. All of these fundamentals will come in handy if you decide to tackle the tower challenge.

At the time this book was written, the ML-Agents version used for developing was 0.6. If you have run all the exercises to completion, you will have noticed that all of the visual learning environments using a discrete action space suffer from a vanishing or exploding gradient problem. What you will see happen is the agent essentially learning nothing and performing random actions; this often takes several hundred thousand iterations to see. But we don't see this problem in environments with a smaller state space using vector observations. In visual environments with a large input state, though, the problem can be seen quite regularly. This means that, essentially, at the time of writing anyway, you would not want to use the Unity code; it currently is a poor visual learner of discrete actions.

At the time of writing, the Unity Obstacle Tower Challenge has just started, and early metrics are already being reported. The current leading algorithm from Google, DeepMind, not surprisingly, is an algorithm called Rainbow. In short, Rainbow is the culmination of many different DRL algorithms and techniques all combined to better learn the discrete action visual-learning space that the tower so well defines.

Now that we have established that you likely want to write your own code, we will understand the high-level critical pieces your agent needs to address. It likely would take another book to explain how to do the coding and other technical aspects of that, so we will instead talk about the overall challenges and the critical elements you need to address. Also, the winners will more than likely need to use more probabilistic methods in order to address the problem, and that is currently not covered very well anywhere.

Let's set up the challenge and get it running in the next exercise:

  1. Download the Obstacle Tower Environment as a binary from https://github.com/Unity-Technologies/obstacle-tower-env.
  2. Follow the instructions and download the zip file for your environment as directed. On most systems, this just requires downloading and unzipping the file into a folder you will execute from later.
  3. Unzip the file into a well-known folder.
  1. Launch the program by double-clicking on it (Windows) to enter the name in a console. After you launch the challenge, you can actually play it as a human. Play the game and see how many floors you can climb. An example of the running challenge is shown in the following screenshot:

The Obstacle Tower Challenge in player mode

One of the first things you will learn as you progress through the game is that the game starts out quite simply, but on the later floors, it gets quite difficult, even for a human.

Now, as we mentioned, solving this challenge is well beyond the scope of this book, but hopefully you can now appreciate some of the complexities that currently stifle the field of deep reinforcement learning. We have reviewed the major challenges that you will face when undertaking this method in the following table:

Problem Chapter Current Status Future
Visual observation state—you will need to build a complex enough CNN and possibly recurrent networks to encode enough details in the visual state. Chapter 7Agent and the Environment The current Unity visual encoder is far from acceptable. Fortunately, there is plenty of work always being done with CNN and recurrent networks for analysis of videos. Remember, you don't just want to capture static images; you also want to encode the sequence of the images.
DQN, DDQN, or Rainbow Chapter 5, Introducing DRL Rainbow is currently the best, and it is available on the GCP. As we have seen in this book, PPO only performs well on continuous action spaces. In order to tackle the discrete action space, we look back to more fundamental methods such as DQN or the newcomer Rainbow, which is the summation of all base methods. We will also discuss future ways in which further use of deep probabilistic methods may be the answer.
Intrinsic rewards Chapter 9, Rewards and Reinforcement Learning The use of an intrinsic reward system shows promise for exploration. Being able to introduce intrinsic reward systems such as Curiosity Learning allows the agent to explore new environments based on some expectation of state. This method will be essential for any algorithm that plans to reach the higher levels of the tower.
Understanding Chapter 6, Unity ML-Agents Unity provides an excellent sample environment to build and test models on. You can easily build and test a similar environment in Unity quite quickly and on your own. It is no wonder Unity never released the raw Unity environment as a project. This was more than likely because this would have attracted many novices, thinking they could overcome the problem with just training. Sometimes, training is just not the answer.
Sparse rewards

Chapter 9, Rewards and Reinforcement Learning

Chapter 10, Imitation and Transfer Learning

Could implement Curriculum or Imitation Learning. We have already covered many examples of ways to manage the sparse rewards problem. It will be interesting to see how much the winners depend on one of these methods, such as IL, to win.
Discrete actions Chapter 8Understanding PPO We learned how PPO allowed continuous action problems to learn, using stochastic methods. As we alluded to before, it will likely take new work into more deep probabilistic methods and techniques to work around some of the current problems. This will likely require the development of new techniques using new algorithms, and how long that takes remains to be seen.

 

Each of the problems highlighted in the preceding table will likely need to be solved in part or wholly in order to get an agent from floor 1 to 100 to complete the entire challenge. It remains to be seen how this will play out for Unity, the winner, and DRL as a whole. In the next section, we discuss the practical applications of DL and DRL, and how they can be used for your game.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset