6 INTRODUCTION
Throughout this book,
ν
will be used to denote a measure. (While the Greek letter
μ
is often used in mathematics for measure, in this text
μ
is reserved for constants
such as the mean of a normal or the magnetization o f the Ising model.)
Denition 1.7. For a measure space (Ω, F ) a function
ν
: F [0,) ∪{} is a
measure if
1. [The measure of nothing is nothing]
ν
(/0)=0
2. [Countable additivity] For a sequence A
1
,A
2
,... F such that for all i = j,
A
i
A
j
= /0,
ν
(A
i
)=
ν
(A
i
).
Denition 1.8. If
ν
(Ω) < ,then
ν
is a nite measure.If
ν
(Ω)=1,then
ν
is a
probability measure or distribution.Foranite measure
ν
,call
ν
(Ω) the norma lizing
constant.
Usually, the symbols
τ
or
π
will be used to de note a probability measure. A
simplewaytoturnanite measure into a probability measure is to divide by its
normalizing constant.
Lemma 1.1. If
ν
is a nite measure for (Ω,F ), then for all C F ,
τ
(C)=
ν
(C)/
ν
(Ω) is a probability measure.
Probability measures are also often written using P.SoP (C)=
τ
(C) indicates
the proba bility that the outcome falls in C. For unnormalized (but nite measures)
τ
and
π
,if
τ
(C)/
τ
(Ω)=
π
(C)/
π
(Ω) for all measurable sets C, write
τ
π
.
Denition 1.9. Consider a probability space (Ω
1
,F
1
,
ν
) and a measurable space
(Ω
2
,F
2
).AnΩ
2
-valued random variable is a function X from Ω
1
to Ω
2
such that for
all A F
2
, {
ω
Ω
1
: X (
ω
) C}∈F
1
.
The set {
ω
Ω
1
: X(
ω
) C} is usually abbreviated more simply as {X C}.The
denition o f a random variable ensures that
ν
({X C}) exists, and this number is
also written P(X C). The measure
ν
X
(C)=P(X C) is called the distribution of X ,
or the law of X .WhenX R,thenF
X
(a)=P(X a) is the cumulative distribution
function, or cdf of X .If
ν
X
(C)=
ν
for some measure
ν
, write X
ν
.
1.4.1 Integrals
Two of the most common measures are Lebesgue measure and counting measure.
The counting measure of a nite set A is just the number of elements in set A.When
ν
is Lebesgue measure, then (roughly speaking) in one dimension
ν
corresponds
to length, so
ν
([a,b]) = b a. In two dimensions,
ν
corresponds to area, in three
dimensions to volume, and so on.
Integration with respect to Lebesgue measure over a closed bounded interval re-
duces to Riemann integration when the function is continuous over the region of
integration, so
x[a,b]
f (x)
ν
(dx)=
b
a
f (x) dx. (1.3)
A LITTLE BIT OF MEASURE THEORY 7
In counting measure, for a nite set A,
ν
(A) denotes the number of elements in
the set A. Integrals with respect to counting measure just become a summation over
the set.
xC
w(x)
ν
(dx)=
xC
w(x). (1.4)
1.4.2 Densities
Another notion that comes in handy is the idea of a density.
Denition 1.10. Say that a probability distribution
τ
has a density f (x) with respect
toameasure
ν
if for all measurable sets C ,
τ
(C)=
xC
f (x)
ν
(dx).
Suppose that Z =
Ω
f (x)
ν
(dx),whereZ> 0 and is nite, but not necessarily 1. Then
say that f is an unnormalized density for the probability distribution
τ
dened as
τ
(C)=Z
1
xC
f (x)
ν
(dx).
Call Z the normalizing constant.
Usually densities are given with respect to Lebesgue measure or counting mea-
sure. A density with respect to counting measure is also called a probability mass
function. When the density is with respect to Lebesgue measure over R
1
, then this is
the usual notion of density for continuous distributions over the real line.
Denition 1.11. Suppose X
τ
.If
τ
has a density f
X
, say that f
X
is the density of
X.
Integrals with respect to measures then give probab ilities and expectations.
xA
f
X
(x)
ν
(dx)=P(X A),
xΩ
g(x) f
X
(x)
ν
(dx)=E[g(X )].
Denition 1.12. The indicator function 1(expression) evaluates to 1 when the ex-
pression inside is true, and 0 otherwise. For example, the function 1 (x 3) is 1 when
x 3,and0whenx< 3.
Using th e indicato r function, the probability of an event can be wr itten as an
expectation:
P(X A)=E[1(X A)]. (1.5)
In most of the applications of Monte Carlo methods, only an unnormalized den-
sity of a random variable X is known, and the goal of the Monte Carlo approach is
either to approximate the normalizing constant, or nd E[g(X )] for some function g
without knowing the normalizing constant of the d ensity of X .
8 INTRODUCTION
Table 1.1 Initialisms.
AR Acceptance/rejection
cdf Cumulative distribution function
CNF Conjunctive normal form
CFTP Coupling from the past
DNF Disjunctive normal form
FMMR Fill, Mach ida, Murdoch, and Rosenthal
iid Independent, identically distributed
MA1 Move ahead 1
MAVM Multiple auxiliary variable me thod
MCMC Markov chain Monte Carlo
mgf Moment-generating function
MLE Maximum likelihood estimator
MRF Markov random eld
MTF Move to front
pdf Probability density function
PRAR Partially recursive acceptance/rejection
ras Randomized approximation scheme
RAR Recursive acceptance/rejection
SAR Sequential acceptance/rejection
SAVM Single auxiliary variable me thod
SDE Stochastic differential equation
1.5 Notation
First, some basic notation to be used.
Throughout this work,
π
will denote a target distribution, and the goal of a perfect
simulation algorithm is to draw a random variate from
π
.
For a nite set A,#A or #(A) will denote the number of elements in A.
G =(V,E) will always refer to an undirected graph where V is the set of nodes
and E is a collection of sets of two nodes. The degree of node i, written deg(i),
is the number of nodes j such that {i, j} is an edge. The maximum degree of any
node in the graph will be denoted Δ. For example, in the square lattice Δ = 4.
When A and B are sets, A ×B = {(a, b) : a A and b B} denotes the Cartesian
product of the sets.
Probability has built up a surprising number of initialisms. This text continues
the trend by using initialisms for most of the major perfect simulation protocols, as
well as for many models. A su mmary of the initialisms used in this text is given
in Table 1.1. Some of these abbreviations are quite common, others pertain only to
perfect simulation m ethods.
NOTATION 9
Table 1.2 Discrete distributions.
Name Notation Density f (i)
Bernoulli Bern(p) p1(i = 1)+(1 p)1(i = 0)
Geometric Geo (p) p(1 p)
i1
1(i ∈{1, 2,...}
Poisson Pois(
μ
) exp(
μ
)
μ
i
(i!)
1
1(i ∈{0,1,2,...}
Uniform Unif(A)(#A)
1
1(i A)
Table 1.3 Continuous distributions.
Name Notation Density f (x)
Uniform Unif([a,b]) (b a)
1
1(x (b a))
Exponential Exp(
λ
)
λ
exp(
λ
x)1(x > 0)
Gamma Gamma(r,
λ
)
λ
r
x
r1
exp(
λ
x)Γ(r)
1
1(x > 0)
Normal (Gaussian) N(
μ
,
σ
)(2
πσ
2
)
1/2
exp((x
μ
)
2
/[2
σ
2
])
Inverse Gaussian InvGau(
μ
,
λ
)(2
π
x
3
/
λ
)
1/2
exp(
λ
(x
μ
)
2
/(2
μ
2
x))
1.5.1 Distributions
Several common distributions will be used throughout this work. Discrete distribu-
tions have a density with respect to counting measure while continuous distributions
have a density with respect to Lebesgue measure. See Table 1.2 and Table 1.3 for
a summary of univariate distributions. For higher dimensional spaces, say that X is
uniform over Ω (write X Unif(Ω))iffor
ν
counting m easure or Lebesgue measure,
ν
(Ω) is nite and X has density f
X
(x)=1(x Ω)/
ν
(Ω).
1.5.2 Conditional distributions
Often, the distribution of a random variable depends upon one or more other random
variables. For instance, if (X,Y ) is chosen uniformly from the triangular region with
vertices (0,0), (1,1),and(1,0), then given the value of Y , X is uniform over the
interval [Y,1]. This information is denoted [X |Y ] Unif([Y,1]). Similarly, [Y |X ]
Unif([0,X ]).
Formally, conditional expectations of the form E[X|Y ] are dened using the
Radon-Nikodym derivative between measures (see for instance [31]) and then con-
ditional probabilities are dened using P(X C|Y )=E[1(X C)|Y
]. The key prop-
erties of conditional expectation that will be used here are that E[X|Y ]= f (Y ) for
some function Y .Also,E[E[X|Y ]] = E[X ] for random variables X and Y with nite
mean.
10 INTRODUCTION
1.6 A few applications
Perfect simulation m ethods have been successfully applied to a variety o f problems
coming from many different areas. The list of applications includes:
Markov random elds (specically auto models). Th ese are labellings of a graph
where the density is a product of functions of the labels on each node, and a
function of the labels on each edge. Examples o f this type of problem include the
Ising model, the Potts model, sink-free orientations of a graph, and the autonormal
model.
Permutations. A permutation can be thought of as an ordering of n objects. The
set of permutations is easy to sample from, but once the goal is to sample from a
density over p ermutations, the problem can become very difcult.
Stochastic differential equations. These are used to model continuous functions
that uctuate randomly. They are used often in nance and as models of physical
systems.
Spatial point processes. These are used as models for data that is in the form of
points in a space. Examples include the Strauss process, marked Hawkes pro-
cesses, and Mat´ern processes.
Bayesian posteriors. In Bayesian statistics, a prior is a probability distribution on
parameters of interest. Conditioned on d ata, the prior is then updated using Bayes’
rule to give an unnormalized density that is called the posterior. The normalizing
constant for the posterior is often very difcult to compute exactly. Hence the
ability to sample from the posterior without calculating the normalizing constant
is essential to Bayesian inference. While it is not known how to perfectly sample
from every Bayesian p osterior, many do have perfect simulation algorithms.
Combinatorial objects. Given a graph G, consider the number of proper colorings
of the graph or the number of sink-free orientations. Sets of combinatorial objects
such as these tend to grow in size exponentially fast in the size of the underlying
graph. For many of these problems, the ability to draw samples from the set (and
restrictions on the set) leads to approximation algorithms for counting the number
of objects in the set. These types of problems can often be linked to problems
coming from physics or statistics.
Markov processes. Markov chains are powerful modeling tools that describe how
a stochastic process is updated. Under simple conditions, the distribution of the
state of the process will converge to what is known as a stationary distribution.
Sampling from this stationa ry distribution directly can often be difcult. Often
the stationary distribution cannot be described exactly. Examples of these kinds
of problems dealt with here include the antivoter model, self-organizing lists, and
perpetuities.
Each of these applications will be described more fully as we are r eady to present
algorithms for them. One thing all of these applications have in common is that they
can be presented as an unnormalized density for which it is far too expensive com-
putationally to calculate the normalizing co nstant directly.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset