1-task learning

1-task learning or simply transfer learning is the task of training the policy on one domain and transferring it onto a new one. Three major techniques can be employed to do that. These are as follows:

  • Fine-tuning: This involves the refinement of the learned model on the target task. If you get involved in machine learning, and especially in computer vision or natural language processing, you have probably used this technique already. Unfortunately, in reinforcement learning, fine-tuning is not as easy as it is in the aforementioned fields, as it requires more careful engineering and generally has lower benefits. The reason for this is that, in general, the gap between the two RL tasks is bigger than the gap between two different image domains. For example, the differences between the classification of a cat and a dog are minor compared to the differences between Pong and Breakout. Nonetheless, fine-tuning can also be used in RL and tuning just the last few layers (or substituting them if the action space is totally different) could give better generalization properties.
  • Domain randomization: This is based on the idea that the diversification of the dynamics on a source domain increases the robustness of the policy on a new environment. Domain randomization works by manipulating the source domain, for example, by varying the physics of the simulator, so that the policy that has been trained on multiple randomly modified source domains is robust enough to perform well on a target domain. This strategy is more effective for training agents that need to be employed in the real world. In such circumstances, the policy is more robust and the simulation doesn't have to be exactly the same as the physical world to provide the required levels of performance. 
  • Domain Adaptation: This is another process that's used, especially to map a policy from a simulation-based source domain to a target physical world. Domain adaptation consists of changing the data distribution of the source domain to match that of the target. It is mainly used in image-based tasks, and the models usually make use of generative adversarial networks (GANs) to turn synthetic images into realistic ones.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset