Summary 

In this chapter, we had look into the background of RL and what a DQN is, including the Q-learning algorithm. We have seen how DQNs offer a unique (relative to the other architectures that we've discussed so far) approach to solving problems. We are not supplying output labels in the traditional sense as with, say, our CNN from Chapter 5, Next Word Prediction with Recurrent Neural Networks, which processed CIFAR image data. Indeed, our output label was a cumulative reward for a given action relative to an environment's state, so you may now see that we have dynamically created output labels. But instead of them being an end goal for our network, these labels help a virtual agent make intelligent decisions within a discrete space of possibilities. We also looked into what types of predictions we can make around rewards or actions.

Now you can think about other possible applications for a DQN and, more generally, for problems where you have a simple reward of some kind but no labels for your data—the canonical example being an agent in some sort of environment. The agent and environment should be defined in the most general way possible, as you are not limited to a bit of math playing Atari games or trying to solve a maze. For example, a user of your website can be considered an agent, and an environment is a space in which you have some kind of feature-based representation of your content. You could use this approach to build a recommendation engine for news. You can refer to the Further reading section for a link to a paper that you may want to implement as an exercise.

In the next chapter, we will look into building a Variational Autoencoder (VAE) and learn about the advantages that a VAE has over a standard autoencoder.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset