Elements of RL

As we know, an agent interacts with their environment by the means of actions. This will cause the environment to change and to feedback to the agent a reward that is proportional to the quality of the actions and the new state of the agent. Through trial and error, the agent incrementally learns the best action to take in every situation so that, in the long run, it will achieve a bigger cumulative reward. In the RL framework, the choice of the action in a particular state is done by a policy, and the cumulative reward that is achievable from that state is called the value function. In brief, if an agent wants to behave optimally, then in every situation, the policy has to select the action that will bring it to the next state with the highest value. Now, let's take a deeper look at these fundamental concepts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset