Facing real-world challenges

Besides the big problems of sample-efficiency and generalization, when dealing with the real world, we need to face problems such as safety and domain constraints. In fact, the agent is often not free to interact with the world due to safety and cost constraints. A solution may come from the use of constraint algorithms such as TRPO and PPO, which are embedded into the system mechanisms to limit the change of actions while training. This could prevent the agent from a drastic change in its behavior. Unfortunately, in highly sensitive domains, this is not enough. For example, nowadays, you cannot start training a self-driving car on the road straight away. The policy may take hundreds or thousands of cycles to understand that falling off a cliff leads to a bad conclusion and learn to avoid it. The alternative option of training the policy in a simulation first is a viable option. Nevertheless, when employed in cities, more safety-related decisions have to be made.

As we just hinted at, a simulation-first solution is a feasible approach and depending on the complexity of the real task, it may lead to good performance. However, the simulator has to mimic the real-world environment as closely as possible. For example, the simulator on the left-hand side of the following image cannot be used if the world resembles the right-hand side of the same image. This gap between the real and the simulated world is known as the reality gap:

Figure 13.6. Comparison between an artificial world and the physical world

On the other hand, using a highly accurate and realistic environment may not be feasible either. The bottleneck is now the computation power that's required by the simulator. This limitation can be partially overcome by starting with a faster and less accurate simulator, and then progressively increasing the fidelity so as to decrease the reality gap. Eventually, this is to the detriment of the speed, but at this point, the agent should have already learned most of the tasks and may need only a few iterations to fine-tune itself. However, it is very difficult to develop highly accurate simulators that mimic the physical world. Thus, in practice, the reality gap will remain and techniques that improve generalization will have the responsibility to handle the situation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset