Unsupervised RL

Unsupervised RL is related to the usual unsupervised learning in how both methods don't use any source of supervision. While in unsupervised learning the data isn't labeled, in the reinforced counterpart, the reward is not given. That is, given an action, the environment returns only the next state. Both the reward and the done status are removed.

Unsupervised RL can be helpful in many occurrences, for example, when the annotation of the environment with hand-designed rewards is not scalable, or when an environment can serve multiple tasks. In the latter case, unsupervised learning can be employed so that we can learn about the dynamics of the environment. Methods that are able to learn from unsupervised sources can also be used as an additional source of information in environments with very sparse rewards. 

How can we design an algorithm that can learn about the environment without any source of supervision? Can't we just employ model-based learning? Well, model-based RL still needs the reward signal to plan or infer the next actions. Therefore, a different solution is required. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset