Transfer learning

Transferring knowledge between two environments, especially if these environments are similar to each other, is a hard task. Transfer learning strategies propose to bridge the knowledge gap so that the transition from an initial environment to a new one is as easy and smooth as possible. Specifically, transfer learning is the task of efficiently transferring knowledge from a source environment (or multiple environments) to a target environment. Thus, the more experience that has been acquired from a set of source tasks and transferred to a new target task, the faster the agent will learn and the better it will perform on the target task.

Generally speaking, when you think about an agent that hasn't been trained yet, you have to imagine a system that does not have any kind of information in it. Instead, when you play a game, you use a lot of prior knowledge. For example, you may guess the meaning of the enemies from their shapes and colors, as well as their dynamics. This implies that you are able to recognize the enemies when they shoot you, like in the Space Invaders game that's shown in the following diagram. Also, you can easily guess the general dynamics of the game. Instead, at the start of the training, an RL agent won't know anything. This comparison is important because it provides valuable insight into the importance of transferring knowledge between multiple environments. An agent that has the ability to use the experience that was acquired from a source task can learn exponentially faster on the target environment. For example, if the source environment is Pong and the target environment is Breakout, then many of the visual components could be reused, saving a lot of time for computation. To have an accurate understanding of its overall importance, imagine the efficiency that's gained in much more complex environments:

Figure 13.5. A screenshot of Space Invaders. Are you able to infer the role of the sprites?

When speaking about transfer learning, we refer to 0-shot learning, 1-shot learning, and so on, as the number of attempts required in the target domain. For example, 0-shot learning means that the policy that has been trained on a source domain is directly employed on the target domain without further training. In this case, the agent must develop strong generalization capabilities to adjust itself to the new task. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset