102 COALESCENCE ON CONTINUOUS AND UNBOUNDED STATE SPACES
6.3.1 Random walk with normals
Now examine a different random walk, one where
φ
2
(x,Z)=x + Z,whereZ is a
standard normal random variable. At rst glance it does not appear that the same
technique can be applied, but in fact adding Z can be transformed into a process
where a uniform is added.
The b asic idea is to write the normal distribution as a mixture of uniform random
variables. That is, build random variables A and B so that if [X |A , B] Unif([A,B]),
then X N(0, 1).
The easiest way to do this is to begin by generating Z N(0, 1).Thengiven
X, create the auxiliar y variable [Y |Z] Unif([0,exp(Z
2
/2)]). At this point (Z,Y )
are uniform over the region under the unnormalized density f
X
(x)=exp(x
2
/2).
Now turn it around and generate [W |Y ] Unif(x
:0 y exp(x
2
/2)).Then
W N(0 , 1). But before that happens, [W |Y ] Unif([
2ln(y),
2ln(y)]).See
Figure 6.2 for an illustration.
0 1-1 2-2
Z
Y
Figure 6.2 First generate Z f, then [Y |Z] Unif([0, f (Z)]). Finally, draw W Unif({x :
f (x) Y }).ThenW Z. In the gure, the state space for the draw W is the dashed line.
So the W that is being added to the current state is uniform, and so multishift
coupling can be used. Pseudocode for this procedure is as follows.
Multishift update normal Input: current state x Output: next state y
1) Draw Z N (0,1)
2) Draw Y Unif([0,exp(Z
2
/2)])
3) w 2
2ln(Y )
4) Draw U Unif([0,1])
5) W wU w/2
6) y wx/w+W
Although this was written f or a standard normal being added, to deal with a normal
with mean
μ
and variance
σ
2
being added, simply multiply by
σ
and add
μ
to X
at
step 5.
When this update is applied to an interval [x
min
,x
max
], the result is (x
max
x
min
)/w different points returned. This can be a problem as w can be arbitrarily close
to 0.
Wilson [130] solved this issue by taking the normal distribution density, and ip-
ping over” the right-hand side. That is, instead of generating (X,Y ) uniformly over
AUTO MODELS 103
{(x,y ) :0 y exp(x
2
/2)}, generate (X ,Y ) uniformly over
{(x,y ) : x 0,0 y exp(x
2
/2)}∪{(x,y) : x > 0, 1 exp (x
2
/2) y 1}.
(6.7)
See Figure 6.3 for an illustration.
Given x, y is still uniform over an interval of length exp(x
2
/2).andsoX is still
normally distributed. This leads to the following algorithm.
Multishift update normal flipped Input: x Output: y
1) Draw X N(0,1)
2) If X 0thendrawY Unif ([0,exp(X
2
/2)])
3) Else draw Y Unif([1 exp(X
2
/2),1])
4) a ←−
2ln(Y ), b
2ln(1 Y ), w b a
5) Draw U Unif([0, 1])
6) X
wU + a
7) y wx/w+ X
Unlike the rst method, here w 2
ln(4) > 2.35. Therefore, when applied to
the interval [x
min
,x
max
], this multishift coupler results in moving to at most (x
max
x
min
)/2.35 different points.
0 1-1 2-2
Z
Y
Figure 6.3 Inverting the right half of the normal density. Z is still drawn from f , now [Y |Z] is
uniform over the area between the two lines with horizontal coordinate Z. Then W is uniform
over the shaded area with vertical coordinate Y . Now the state space of W is an interval of
length at least 2.35.
6.4 Auto models
Multishift coupling is especially useful in auto models. Recall that an auto model is
a Markov random eld where the density can be written as a product of functions on
the nodes and functions over labels on endpoints of edges.
In particular, in an auto model the distribution of X(v) for a single node v V
only depends on the immediate neighbors, and not on more distant nodes. That is,
[X(v)|X(V v)] [X(v)|X ({w : {v, w}∈E})].
When the distribution of X (v) conditioned on its neighbors always comes from
the same d istribution family, the model can be named as auto followed by the name
of the distribution family.
For instance, suppose x ∈{0,1}
V
is a labeling of the nodes with either 0 or 1.
104 COALESCENCE ON CONTINUOUS AND UNBOUNDED STATE SPACES
Then trivially the distribution of x conditioned on its neighbors is Bernoulli since it
must be either 0 or 1. Hence this is an autobernoulli model.
In the hierarchical Poisson/gamma mod e l from Section 6.2.1, consider the star
graph with node v
β
at the center, and nodes v
1
,...,v
10
each connected only to v
β
.
Label node v
β
with the value
β
,andv
i
with
λ
i
.
Then this forms an autogamma model, since conditioned on the neighbors of each
node, the distribution of the node will come from the gamma family of distributions.
6.4.1 Example: autonormal model
In [9], Besag considered a statistical model for the output from a camera. Each pixel
of the camera returns numerical values b a sed on the color of light that hits it, with
larger numbers denoting brighter values. In a standard photo, the color space is three
dimensional, with one dimension giving the red, one the blue, and one the green.
In a grayscale photo, the color space is one dimensional. For simplicity, only the
one-dimensional case will be considered here, but this easily extends to higher di-
mensions.
The model is that at each pixel of the camera, the returned value is normally
distributed with mean equal to the actual entering value, and some xed variance.
Then the prior distribution on the picture entering the camera (the data) assigns
a penalty for neighboring pixels that goes down exponentially in the square of the
difference between the pixel values.
If the observed pictur e is y, then the posterior on the true picture X has density
f
X
(x)=Z
1
exp(H(x,d))1(x [0, 1]
V
) (6.8)
H(x, y)=
1
2
σ
2
i
(x(i) y(i))
2
γ
2
2
{i, j}∈E
(x(i) x( j))
2
, (6.9)
where
σ
and
γ
are parameters of the model.
Now suppose a Gibbs sampler is implemented for this problem. A random scan
Gibbs sampler chooses a node uniformly at random, then updates the value of the
node conditioned on the rest of the neighbors.
Fact 6.2. For the autonormal Gibbs sampler,
[X(v)|X(neighbors of v)] exp((x a(i))
2
/[2b(i)]),
b(i)=(
σ
2
+ deg(i)
γ
2
)
1
,
a(i)=
σ
2
y(i)+
γ
2
j:{i, j}∈E
X( j)
b(i),
where deg(i) is the number of neighbors of i in the graph. That is, X (v) conditioned
on the values of its neighbors is normally distribu ted with mean a(i) and variance
b(i).
So the target distribution is a normal whose variance is independent of the neigh-
bors, but whose mean is an increasing function of the neighboring values.
METROPOLIS-HASTINGS FOR CONTINUOUS STATE SPACES 1 05
This is a situation where multishift coupling could b e applied, since the goal is to
couple normally distributed random variables with means over an interval from the
smallest value of a(i) to the largest value of a(i).
6.5 Metropolis-Hastings for continuous state spaces
Gibbs sampler Markov chains can often be made monotonic, but require special ap-
proaches such as multigamma coupling or multishift coupling. Metropolis-Hastings
Markov chains are usually not monotonic, but can make continuous spaces coalesce
down to a single point. To get the best from these two approaches, create an update
function that consists of several steps in a Gibbs chain to b ring the upper and lower
states close together, and then use a Metropolis-Hastings step to completely reduce
the image down to a single state.
6.5.1 Example: truncated autonormal
The multishift coupler was used previously to deal with the unrestricted autonormal
model, but in fact for image processing usually values are conditioned to lie in the
real interval from 0 (solid black) to 1 (solid white).
This is easy to deal with for the Gibbs sampler: when generating the normal with
mean a(i) and variance b(i), just condition on the normal lying between 0 and 1 .
Allison Gibbs [41] showed that this Markov chain was rapidly mixing. In [41] the
Wasserstein metric was used to assess convergence because it was easier to work
with; in [54] the method was improved to handle total variation distance. Therefore
the details of the Wasserstein me tric are omitted here.
The truncation complicates matters: the distributions at each node are no longer
shifts from the same family of distributions. For example, a normal with mean 0.5
and variance 1 conditioned to lie in [0,1] is not a shifted version of a normal of mean
0.7 and variance 1 conditioned to lie in [0,1]. So the multishift coupling used in the
previous section no longer works here. On the other hand, using the inverse transform
method to generate the two normals does give a monotonic update function.
A solution lies in the Metropolis-Hastings approach from Section 1.8.2. In this
approach a state is proposed to be moved to, an acceptance ratio formed, and if a
Unif([0,1]) falls below the acceptance ratio, the state is m oved to the proposed state.
Otherwise it stays where it is.
Suppose our proposed state is found by taking each node in turn, then indepen-
dently adding U
i
, which is a uniform [
ε
,
ε
] to the value at that node, where
ε
> 0
is a small value. When
ε
is very small, it is likely that such a proposed move will be
accepted, regardless of th e initial state. Let U =(U
1
,...,U
#V
).Then
f
X
(x + U)
f
X
(x)
exp
#V(2
ε
+
ε
2
)
2
σ
2
γ
2
2
#E(2
ε
+
ε
2
)
.
Making
ε
= Θ(min{
σ
2
/#V,1/[
γ
2
#E]}) gives f
X
(x + Y )/ f
X
(x) 1/2, and so
there is a 1/2 chance that the Metropolis-Hastings move will be accepted starting
from any state.
106 COALESCENCE ON CONTINUOUS AND UNBOUNDED STATE SPACES
Since a uniform is being added to each node value, multishift coupling could be
used to couple the proposed state for all initial states. However, there is a another
way of coupling that is simpler to implement.
This method works as follows. Suppose the initial value x
i
is in the interval [a, b],
and the goal is to couple x
i
+U
i
together where U
i
Unif([
ε
,
ε
]). Then from a,the
goal is to g enerate from [a
ε
,a +
ε
], from b, the goal is to generate from [b
ε
,b +
ε
], and from x
i
(a,b), the goal is to generate from [x
i
ε
,x
i
+
ε
].
Note that all of these intervals lie in the interval [a
ε
,b +
ε
].SoAR can be used
to sample as follows: g enerate A
i
Unif([a
ε
,b +
ε
]).IfA
i
[x
i
ε
,x
i
+
ε
] then
φ
(x
i
,U)=A
i
. Otherwise start over. Repeat until all x
i
[a,b] have been assigned.
Here is why this is good: if the rst A
i
falls in [b
ε
,a +
ε
], then for all x
i
[a,b],
A
i
[x
i
ε
,x
i
+
ε
]. Hence the proposed move is the same for all states.
Inside the Gibbs sampler, it is necessary to generate a normal random variable
with mean a(i) and variance b(i) truncated to lie in [0, 1]. One way to generate
a N (a(i),b(i)) is to take a standard norm al, multiply by
b(i),andadda(i).In
order for the result to lie in [0,1], the initial standard normal must have been in
[a(i)/
b(i),(1 a(i))/
b(i)]. To obtain a standard normal in [a
,b
],usethein-
verse transform method.
If Φ is the cdf of a standard normal, and U Unif([0,1]),thenΦ
1
(U ) will
be a standard normal. Furthermore, (Φ(b
) Φ(a
))U + Φ(a
) will be uniform over
[Φ(a
),Φ(b
)],sothenΦ
1
((Φ(b
) Φ(a
))U + Φ(a
)) is a standard normal condi-
tioned to lie in [a
,b
].
This gives the following Gibbs sampler update. As earlier, let deg(i) denote the
number of neighbors of node i in the graph.
Truncated autonormal Gibbs update Input: x,U,v, Output: x(i)
1) b(i) (
σ
2
+ deg(i)
γ
2
)
1
, a(i)
σ
2
y(i)+
γ
2
j:{i, j}∈E
x( j)
b(i)
2) a
←−a(i)/
b(i), b
(1 a(i))/
b(i)
3) Z Φ
1
((Φ(b
) Φ(a
))U + Φ(a
))
4) x(i) a(i)+
b(i)Z
Next is the Metropolis-Hastings update.
Truncated autonormal MH update Input: x
min
, x
max
, x, Output: y, cflag
1) cflagTRUE
2) For all i V
3) Draw U
i
Unif([x
min
ε
,x
max
+
ε
])
4) If U
i
/ [x
max
ε
,x
min
+
ε
] then cflagFALSE
5) If U
i
/ [x
ε
,x +
ε
]
6) U
i
Unif([x
ε
,x +
ε
])
7) U (U
1
,...,U
#V
)
8) r f
X
(x + U)/ f
X
(x)
9) Draw U
r
uniformly from [0,1]
10) y x1(U
r
r)+(x + U)1(U
r
> r)
11) If U
r
> 1/2thencflagFALSE
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset