Chapter 1: Introduction (2/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6 INTRODUCTION

Throughout this book,

will be used to denote a measure. (While the Greek letter

is often used in mathematics for measure, in this text

is reserved for constants

such as the mean of a normal or the magnetization o f the Ising model.)

Deﬁnition 1.7. For a measure space (Ω, F ) a function

: F → [0,∞) ∪{∞} is a

measure if

1. [The measure of nothing is nothing]

(/0)=0

2. [Countable additivity] For a sequence A

,... ∈ F such that for all i = j,

∩A

= /0,

(∪A

∑

Deﬁnition 1.8. If

(Ω) < ∞,then

is a ﬁnite measure.If

(Ω)=1,then

is a

probability measure or distribution.Foraﬁnite measure

,call

(Ω) the norma lizing

constant.

Usually, the symbols

will be used to de note a probability measure. A

simplewaytoturnaﬁnite measure into a probability measure is to divide by its

normalizing constant.

Lemma 1.1. If

is a ﬁnite measure for (Ω,F ), then for all C ∈ F ,

(C)=

(C)/

(Ω) is a probability measure.

Probability measures are also often written using P.SoP (C)=

the proba bility that the outcome falls in C. For unnormalized (but ﬁnite measures)

and

,if

(C)/

(Ω)=

(C)/

(Ω) for all measurable sets C, write

∼

Deﬁnition 1.9. Consider a probability space (Ω

) and a measurable space

(Ω

).AnΩ

-valued random variable is a function X from Ω

to Ω

such that for

all A ∈ F

, {

∈Ω

: X (

) ∈C}∈F

The set {

∈Ω

: X(

) ∈C} is usually abbreviated more simply as {X ∈C}.The

deﬁnition o f a random variable ensures that

({X ∈ C}) exists, and this number is

also written P(X ∈C). The measure

(C)=P(X ∈C) is called the distribution of X ,

or the law of X .WhenX ∈ R,thenF

(a)=P(X ≤ a) is the cumulative distribution

function, or cdf of X .If

(C)=



for some measure



, write X ∼



1.4.1 Integrals

Two of the most common measures are Lebesgue measure and counting measure.

The counting measure of a ﬁnite set A is just the number of elements in set A.When

is Lebesgue measure, then (roughly speaking) in one dimension

corresponds

to length, so

([a,b]) = b −a. In two dimensions,

corresponds to area, in three

dimensions to volume, and so on.

Integration with respect to Lebesgue measure over a closed bounded interval re-

duces to Riemann integration when the function is continuous over the region of

integration, so



x∈[a,b]

f (x)

(dx)=



f (x) dx. (1.3)

A LITTLE BIT OF MEASURE THEORY 7

In counting measure, for a ﬁnite set A,

(A) denotes the number of elements in

the set A. Integrals with respect to counting measure just become a summation over

the set.



x∈C

w(x)

(dx)=

∑

x∈C

w(x). (1.4)

1.4.2 Densities

Another notion that comes in handy is the idea of a density.

Deﬁnition 1.10. Say that a probability distribution

has a density f (x) with respect

toameasure

if for all measurable sets C ,

(C)=



x∈C

f (x)

(dx).

Suppose that Z =



f (x)

(dx),whereZ> 0 and is ﬁnite, but not necessarily 1. Then

say that f is an unnormalized density for the probability distribution

deﬁned as

(C)=Z

−1



x∈C

f (x)

(dx).

Call Z the normalizing constant.

Usually densities are given with respect to Lebesgue measure or counting mea-

sure. A density with respect to counting measure is also called a probability mass

function. When the density is with respect to Lebesgue measure over R

, then this is

the usual notion of density for continuous distributions over the real line.

Deﬁnition 1.11. Suppose X ∼

.If

has a density f

, say that f

is the density of

Integrals with respect to measures then give probab ilities and expectations.



x∈A

(x)

(dx)=P(X ∈ A),



x∈Ω

g(x) f

(x)

(dx)=E[g(X )].

Deﬁnition 1.12. The indicator function 1(expression) evaluates to 1 when the ex-

pression inside is true, and 0 otherwise. For example, the function 1 (x ≥3) is 1 when

x ≥ 3,and0whenx< 3.

Using th e indicato r function, the probability of an event can be wr itten as an

expectation:

P(X ∈ A)=E[1(X ∈ A)]. (1.5)

In most of the applications of Monte Carlo methods, only an unnormalized den-

sity of a random variable X is known, and the goal of the Monte Carlo approach is

either to approximate the normalizing constant, or ﬁnd E[g(X )] for some function g

without knowing the normalizing constant of the d ensity of X .

8 INTRODUCTION

Table 1.1 Initialisms.

AR Acceptance/rejection

cdf Cumulative distribution function

CNF Conjunctive normal form

CFTP Coupling from the past

DNF Disjunctive normal form

FMMR Fill, Mach ida, Murdoch, and Rosenthal

iid Independent, identically distributed

MA1 Move ahead 1

MAVM Multiple auxiliary variable me thod

MCMC Markov chain Monte Carlo

mgf Moment-generating function

MLE Maximum likelihood estimator

MRF Markov random ﬁeld

MTF Move to front

pdf Probability density function

PRAR Partially recursive acceptance/rejection

ras Randomized approximation scheme

RAR Recursive acceptance/rejection

SAR Sequential acceptance/rejection

SAVM Single auxiliary variable me thod

SDE Stochastic differential equation

1.5 Notation

First, some basic notation to be used.

• Throughout this work,

will denote a target distribution, and the goal of a perfect

simulation algorithm is to draw a random variate from

• For a ﬁnite set A,#A or #(A) will denote the number of elements in A.

• G =(V,E) will always refer to an undirected graph where V is the set of nodes

and E is a collection of sets of two nodes. The degree of node i, written deg(i),

is the number of nodes j such that {i, j} is an edge. The maximum degree of any

node in the graph will be denoted Δ. For example, in the square lattice Δ = 4.

• When A and B are sets, A ×B = {(a, b) : a ∈ A and b ∈ B} denotes the Cartesian

product of the sets.

Probability has built up a surprising number of initialisms. This text continues

the trend by using initialisms for most of the major perfect simulation protocols, as

well as for many models. A su mmary of the initialisms used in this text is given

in Table 1.1. Some of these abbreviations are quite common, others pertain only to

perfect simulation m ethods.

NOTATION 9

Table 1.2 Discrete distributions.

Name Notation Density f (i)

Bernoulli Bern(p) p1(i = 1)+(1 − p)1(i = 0)

Geometric Geo (p) p(1 −p)

i−1

1(i ∈{1, 2,...}

Poisson Pois(

) exp(−

)

(i!)

−1

1(i ∈{0,1,2,...}

Uniform Unif(A)(#A)

−1

1(i ∈ A)

Table 1.3 Continuous distributions.

Name Notation Density f (x)

Uniform Unif([a,b]) (b −a)

−1

1(x ∈ (b −a))

Exponential Exp(

)

exp(−

x)1(x > 0)

Gamma Gamma(r,

)

r−1

exp(−

x)Γ(r)

−1

1(x > 0)

Normal (Gaussian) N(

)(2

πσ

)

−1/2

exp(−(x −

)

/[2

])

Inverse Gaussian InvGau(

)(2

)

−1/2

exp(−

(x −

)

/(2

x))

1.5.1 Distributions

Several common distributions will be used throughout this work. Discrete distribu-

tions have a density with respect to counting measure while continuous distributions

have a density with respect to Lebesgue measure. See Table 1.2 and Table 1.3 for

a summary of univariate distributions. For higher dimensional spaces, say that X is

uniform over Ω (write X ∼Unif(Ω))iffor

counting m easure or Lebesgue measure,

(Ω) is ﬁnite and X has density f

(x)=1(x ∈ Ω)/

(Ω).

1.5.2 Conditional distributions

Often, the distribution of a random variable depends upon one or more other random

variables. For instance, if (X,Y ) is chosen uniformly from the triangular region with

vertices (0,0), (1,1),and(1,0), then given the value of Y , X is uniform over the

interval [Y,1]. This information is denoted [X |Y ] ∼ Unif([Y,1]). Similarly, [Y |X ] ∼

Unif([0,X ]).

Formally, conditional expectations of the form E[X|Y ] are deﬁned using the

Radon-Nikodym derivative between measures (see for instance [31]) and then con-

ditional probabilities are deﬁned using P(X ∈C|Y )=E[1(X ∈C)|Y

]. The key prop-

erties of conditional expectation that will be used here are that E[X|Y ]= f (Y ) for

some function Y .Also,E[E[X|Y ]] = E[X ] for random variables X and Y with ﬁnite

mean.

10 INTRODUCTION

1.6 A few applications

Perfect simulation m ethods have been successfully applied to a variety o f problems

coming from many different areas. The list of applications includes:

• Markov random ﬁelds (speciﬁcally auto models). Th ese are labellings of a graph

where the density is a product of functions of the labels on each node, and a

function of the labels on each edge. Examples o f this type of problem include the

Ising model, the Potts model, sink-free orientations of a graph, and the autonormal

model.

• Permutations. A permutation can be thought of as an ordering of n objects. The

set of permutations is easy to sample from, but once the goal is to sample from a

density over p ermutations, the problem can become very difﬁcult.

• Stochastic differential equations. These are used to model continuous functions

that ﬂuctuate randomly. They are used often in ﬁnance and as models of physical

systems.

• Spatial point processes. These are used as models for data that is in the form of

points in a space. Examples include the Strauss process, marked Hawkes pro-

cesses, and Mat´ern processes.

• Bayesian posteriors. In Bayesian statistics, a prior is a probability distribution on

parameters of interest. Conditioned on d ata, the prior is then updated using Bayes’

rule to give an unnormalized density that is called the posterior. The normalizing

constant for the posterior is often very difﬁcult to compute exactly. Hence the

ability to sample from the posterior without calculating the normalizing constant

is essential to Bayesian inference. While it is not known how to perfectly sample

from every Bayesian p osterior, many do have perfect simulation algorithms.

• Combinatorial objects. Given a graph G, consider the number of proper colorings

of the graph or the number of sink-free orientations. Sets of combinatorial objects

such as these tend to grow in size exponentially fast in the size of the underlying

graph. For many of these problems, the ability to draw samples from the set (and

restrictions on the set) leads to approximation algorithms for counting the number

of objects in the set. These types of problems can often be linked to problems

coming from physics or statistics.

• Markov processes. Markov chains are powerful modeling tools that describe how

a stochastic process is updated. Under simple conditions, the distribution of the

state of the process will converge to what is known as a stationary distribution.

Sampling from this stationa ry distribution directly can often be difﬁcult. Often

the stationary distribution cannot be described exactly. Examples of these kinds

of problems dealt with here include the antivoter model, self-organizing lists, and

perpetuities.

Each of these applications will be described more fully as we are r eady to present

algorithms for them. One thing all of these applications have in common is that they

can be presented as an unnormalized density for which it is far too expensive com-

putationally to calculate the normalizing co nstant directly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 1: Introduction (2/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 1: Introduction (2/5)