Comparing RL and supervised learning

RL and supervised learning are similar, yet different, paradigms to learn from data. Many problems can be tackled with both supervised learning and RL; however, in most cases, they are suited to solve different tasks.

Supervised learning learns to generalize from a fixed dataset with a limited amount of data consisting of examples. Each example is composed of the input and the desired output (or label) that provides immediate learning feedback.

In comparison, RL is more focused on sequential actions that you can take in a particular situation. In this case, the only supervision provided is the reward signal. There's no correct action to take in a circumstance, as in the supervised settings.

RL can be viewed as a more general and complete framework for learning. The major characteristics that are unique to RL are as follows:

  • The reward could be dense, sparse, or very delayed. In many cases, the reward is obtained only at the end of the task (for example, in the game of chess).
  • The problem is sequential and time-dependent; actions will affect the next actions, which, in turn, influence the possible rewards and states.
  • An agent has to take actions with a higher potential to achieve a goal (exploitation), but it should also try different actions to ensure that other parts of the environment are explored (exploration). This problem is called the exploration-exploitation dilemma (or exploration-exploitation trade-off) and it manages the difficult task of balancing between the exploration and exploitation of the environment. This is also very important because, unlike supervised learning, RL can influence the environment since it is free to collect new data as long as it deems it useful.
  • The environment is stochastic and nondeterministic, and the agent has to take this into consideration when learning and predicting the next action. In fact, we'll see that many of the RL components can be designed to either output a single deterministic value or a range of values along with their probability.

The third type of learning is unsupervised learning, and this is used to identify patterns in data without giving any supervised information. Data compression, clustering, and generative models are examples of unsupervised learning. It can also be adopted in RL settings in order to explore and learn about the environment. The combination of unsupervised learning and RL is called unsupervised RL. In this case, no reward is given and the agent could generate an intrinsic motivation to favor new situations where they can explore the environment.

It's worth noting that the problems associated with self-driving cars have also been addressed as a supervised learning problem, but with poor results. The main problem is derived from a different distribution of data that the agent would encounter during its lifetime compared to that used during training. 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset