The policy chooses the actions to be taken in a given situation and can be categorized as deterministic or stochastic.
A deterministic policy is denoted as at = µ(st), while a stochastic policy can be denoted as at ~ π(.|st), where the tilde symbol (~) means has distribution. Stochastic policies are used when it is better to consider an action distribution; for example, when it is preferable to inject a noisy action into the system.
Generally, stochastic policies can be categorical or Gaussian. The former case is similar to a classification problem and is computed as a softmax function across the categories. In the latter case, the actions are sampled from a Gaussian distribution, described by a mean and a standard deviation (or variance). These parameters can also be functions of states.
When using parameterized policies, we'll define them with the letter θ. For example, in the case of a deterministic policy, it would be written as µθ (st).