MARKOV CHAINS AND APPROXIMATE SIMULATION 15
In other words, a chain is a Harris chain if there is a subset of states A that the
chain r eturns to with positive probability after a finite number of steps from any
starting state. Moreover, if the state is somewhere in A, then with probability
ε
the
next state does not depend on exactly which element of A the state is at.
The notions of recurrence and aperiodicity are needed to make the stationary dis-
tribution limiting as well. A Harris chain always has positive probability of returning
to A. When the chance of returning is 1, then the chain is recurrent.
Definition 1.30. Let R = inf{n > 0:X
n
∈ A }. A Harris chain is recurrent if for all
x ∈ A, P(R < ∞|X
0
= x)=1. A Harris chain that is not recurrent is transient.
Given a recurrent Harris chain, suppose the state space can be partitioned into
P
1
,P
2
,...,P
k
so that for a state in P
i
, the chance of moving to P
i+1
is 1. For a state
in P
k
, the next state is back in P
1
.Thenifk is the smallest number for which there is
such a partition, call k the perio d of the chain. Intuitively, when k is 1, the chain is
aperiodic. The following definition of aperiodicity is equivalent to this notion.
Definition 1.31. A recurrent Harris chain is aperiodic if for all x ∈ Ω, there exists n
such that for all n
≥ n,
P(X
n
∈ A|X
0
= x) > 0. (1.10)
Theorem 1.3 (Ergodic Theorem for Harris chains). Let X
n
be an aperiodic recurrent
Harris chain with stationary distribution
π
.IfP(R < ∞|X
0
= x)=1 for all x, then as
t →∞, for all measurable sets C and for all x:
|P(X
t
∈C|X
0
= x) −
π
(C)|→0. (1.11)
This theorem is the heart o f Markov chain Monte Carlo: for a given target distri-
bution, build a Harris chain whose stationary distribution equals the target distribu-
tion and is recurrent and aperiodic. Then the limitin g distribution of the Harris chain
will be the target distribution. So by running the chain an infinite number of steps, a
draw from the target distribution is made.
Since very few practitioners actually have an infinite amount of time to spare,
instead a finite number of steps are taken, and the user hopes that it is sufficient to
make the state close to the limiting distribution . However, while it is often known
that the distribution of the state of the chain converges to the stationary distribution,
it is rare to be able to calculate how quickly that convergence occurs.
One of the few methods that exist for doing so is known as coupling.
Definition 1.32. Suppose {X
t
}∼
ν
X
and {Y
t
}∼
ν
Y
.Thenacoupling of {X
t
} and
{Y
t
} is a bivariate process {(X
t
,Y
t
)} such that {X
t
}∼
ν
X
and {Y
t
}∼
ν
Y
.
TheideahereisthatX
t
and Y
t
both have a specified distribution, perhaps both are
Markov chains with a specified transition kernel, for instance. Then a coupling is a
way of advancing both X
t
and Y
t
forward into the future at the same time while each,
when viewed individually, is following the transition probabilities of th e appropriate
Markov chain. Say that the two Markov chains are coupled.
The word couple comes from the Latin for “to fasten together,” and this usage is
still the important part of using coupled processes. It is important to recognize when
the two processes cross paths; that is, when they are equal to one another.