Policy

The policy chooses the actions to be taken in a given situation and can be categorized as deterministic or stochastic.

A deterministic policy is denoted as a_t = µ(st), while a stochastic policy can be denoted as a_t ~ π(.|s_t), where the tilde symbol (~) means has distribution. Stochastic policies are used when it is better to consider an action distribution; for example, when it is preferable to inject a noisy action into the system.

Generally, stochastic policies can be categorical or Gaussian. The former case is similar to a classification problem and is computed as a softmax function across the categories. In the latter case, the actions are sampled from a Gaussian distribution, described by a mean and a standard deviation (or variance). These parameters can also be functions of states.

When using parameterized policies, we'll define them with the letter θ. For example, in the case of a deterministic policy, it would be written as µ_θ (s_t).

Policy, decision-maker, and agent are three terms that express the same concept, so, in this book, we'll use these terms interchangeably.

Table of Contents for Policy

Create new playlist

Sign In

Sign Up

Table of Contents for
Policy