Policy

The policy chooses the actions to be taken in a given situation and can be categorized as deterministic or stochastic.

A deterministic policy is denoted as at = µ(st), while a stochastic policy can be denoted as atπ(.|st), where the tilde symbol (~) means has distribution. Stochastic policies are used when it is better to consider an action distribution; for example, when it is preferable to inject a noisy action into the system.

Generally, stochastic policies can be categorical or Gaussian. The former case is similar to a classification problem and is computed as a softmax function across the categories. In the latter case, the actions are sampled from a Gaussian distribution, described by a mean and a standard deviation (or variance). These parameters can also be functions of states. 

When using parameterized policies, we'll define them with the letter θ. For example, in the case of a deterministic policy, it would be written as µθ (st).

Policy, decision-maker, and agent are three terms that express the same concept, so, in this book, we'll use these terms interchangeably. 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset