6Iterative reconstruction

Previous chapters dealt with analytic image reconstruction algorithms. This chapter, on the other hand, introduces iterative image reconstruction algorithms. Due to high-speed computers, iterative algorithms get more and more attention in medical image reconstruction. This chapter describes the imaging problem as a system of linear equations and reconstructs an image by minimizing an objective function. Many algorithms are available to solve the system of linear equations or to minimize an objective function. The objective function can be set up by using the likelihood function and can also include the prior knowledge about the image. The likelihood function models the noise distribution in the projection measurements. The maximum-likelihood expectation-maximization (ML-EM) algorithm or ordered-subset expectation-maximization (OS-EM) algorithm is the most popular iterative image reconstruction algorithm in emission tomography, and this chapter has devoted significant efforts to it. Many strategies for noise control are discussed. This chapter also presents a research hot spot – image reconstruction with highly undersampled data, which is often referred to as compressed sensing and is, in fact, nothing but another application of Bayesian image reconstruction.

6.1Solving a system of linear equations

Instead of using an analytical algorithm to reconstruct an image, image reconstruction can also be obtained by solving a system of linear equations. In doing so, the image is first discretized into pixels or voxels (volumetric pixels) as illustrated in Figure 6.1.

Here, the image pixels xj (j =1,2, ...) are labeled in a 1D sequential order, as are all projections pi (i =1,2, ). For the simple example in Figure 6.1, we can relate the image pixels and the projections using a system of linear equations:

x1+x2+x3=p1,x4+x5+x6=p2,x7+x8+x9=p3,x3+x6+x9=p4,x2+x5+x8=p5,x1+x4+x7=p6,2(21)x4+(21)x7+2(21)x8=p7,2x1+2x5+2x9=p8,2(21)x2+(21)x3+2(21)x6=p9.

image

Fig. 6.1: An example with nine unknowns and nine measurements.

This system can be rewritten in the matrix form as

AX=P,

where X =[x1, x2, ..., x9]T, P =[p1, p2,, p9]T, and A is the coefficient matrix of the system. The element aij in A represents the weight of the contribution of the jth pixel xj to the ith projection pi. In this example, the contribution is the segment length of the projection ray within the pixel. If the inverse matrix A–1 of A exists, the reconstructed image is given as

X=A1P.

Line length is not the only way to model the “contribution.” Some imaging physics (e.g., attenuation and point spread function) can also be included as well.

For a practical imaging problem, the matrix A is not square. In this case, a generalized inverse of the matrix can be used. For example, we can find a least-squares solution:

X=(ATA)1ATP,if the system is overdetermined;X=AT(AAT)1P,if the system is under-determined.

A generalized inverse can be obtained via a least-squares minimization. In the case that the system is overdetermined (i.e., the number of projection rays is greater than the number of image pixels), we let

x2=AXP2=(AXP)T(AXP)=XAATAX2XTATP+PTP

and set the partial derivatives (i.e., gradient) to zero:

0=x2=2ATAX2ATP.

Rearranging the terms, we have

ATAX=ATP,

which is a set of normal equations, because (AXP) is orthogonal (i.e., normal) to the rows of A, that is, AT(AXP)=0. Solving the normal equations immediately yields a generalized solution

X=(ATA)1ATP.

On the other hand, in the case that the system is under-determined (i.e., the number of image pixels is greater than the number of projection rays), the system will have infinite number of solutions for AX = P, assuming that the system is consistent. In this case, we would like to choose the minimum norm solution. Therefore, we use the method of Lagrange multipliers to minimize ||X||2 subject to AX = P. We thus set up a Lagrange function

L=X2+ΛT(AXP),

with a row matrix ΛT=[λ1,λ2,...,λm] containing the Lagrange multipliers λ1,λ2,...,λm and m being the number of projection rays.

Setting the partial derivatives (i.e., gradient) of the Lagrange function to zero yields

0=2X+ATΛandAX=P.

Pre-multiplying with matrix A,0=2X+ATΛ becomes

0=2AX+AATΛ.

Solving for Λ and using AX = P, we have

Λ=2(AAT)1AX=2(AAT)1P.

Finally, solving for 0=2X+ATΛ gives

X=12ATΛ=AT(AAT)1P.

Even if the matrix A is square, its inverse A–1 may not exist. When A is not full rank, A–1 does not exist. In fact, the matrix A for the example in Figure 6.1 is not full rank. One can easily check that the sum of the rows 1, 2, and 3 is the same as the sum of the rows 4,5, and 6. If the matrix A is not full rank, we cannot even calculate (AAT)–1 or (ATA)–1. In all applications, the matrix A is not full rank and not square either, and the projections are not consistent due to noise. If the matrix A is rank deficient, you could use the singular value decomposition (SVD) to find a pseudo-inverse.

The SVD technique is a powerful and stable method to find a generalized inverse and diagnose the system condition. Now we use SVD to find the least-squares solution for AX = P as follows.

Assume that matrix A has m rows and n columns and is denoted as Am×n. Using SVD, the matrix Am×n can be decomposed into

Am×n=Um×mVn×nT,

where

VTV=In×n,UTU=Im×m,andm×n=[diag{σl}000]

with the singular values arranged in the descending order:

σ1σ2...σi...0.

A generalized inverse (or pseudo-inverse) is defined as

A+=V+UT,

where

n×m+=[Dr000],

and the diagonal matrix Dr with a cutoff index r is defined as

Dr=diag{1σ1,1σ2,...,1σr,0,...,O}.

The reconstructed image is given as

X=A+P=V+UTP.

In the SVD method, the user selects the cutoff index r. If a very small r is chosen, the resultant reconstructed image only contains low-frequency components. If a very large r is chosen, the resultant reconstructed image will contain high-frequency components and the image is noisy as well.

More often than not, the matrix A is too large to store in the computer; it can only be generated one row at a time when this row is used in solving the system of equations. Not every method that is able to solve a system of linear equations can be used here. For example, the methods based on diagonalizing the matrix A or transforming matrix A into an upper triangle matrix are not applicable. Any method that modifies the matrix A cannot be used. We can only use methods that use matrix A and its transposed matrix AT. Therefore, iterative methods that only use A and AT (but do not modify them) make sense in finding an approximate solution to our imaging problem.

An analytic reconstruction can be thought of as an “open-loop” system, while an iterative algorithm can be thought of as a “closed-loop” system. Each loop, referred to as an iteration, usually consists of a projection operation, a comparison of the projected data with the measured data, and a backprojection operation. The backprojection maps the data discrepancies from the projection space to the image space. The backprojected discrepancies will be used to modify the currently estimated image at each iteration (see Figure 6.2).

image

Fig. 6.2: A general procedure of an iterative image reconstruction algorithm.

6.2Algebraic reconstruction technique

The main idea of the algebraic reconstruction technique (ART) algorithm (which is also known as the Kaczmarz method) is to make the estimated image satisfy one equation at a time as illustrated in Figure 6.3, where three lines – L1, L2, and L3 –represent three equations, and their intersection is the solution. In this example, the image only consists of 2 pixels.

In Figure 6.3, x0 is the initial guess of the solution. The first step is to project this point perpendicularly onto L1, obtaining x1. Next, project perpendicularly onto L2 to obtain x2, and so on, projecting each point onto a line (which is one equation) one at a time. Eventually, the algorithm will converge to the solution of the system of equations (see Figure 6.3, top). If the equations are not consistent, the algorithm will bounce around and never converge (see Figure 6.3, bottom). One iteration is defined as the procedure of going through all the equations once.

image

Fig. 6.3: The ART algorithm tries to satisfy each equation at each update.

The ART algorithm executed one projection ray at a time, and the image is updated after each ray is considered. Symbolically, the algorithm can be written as

χnext=χcurrentBackprojectray{Projectray(χcurrent)MeasurementrayNormalizationfactor}.

6.3Gradient descent algorithms

First, an objective function χ2 is formed based on the system of imaging equations:

X2=AXP2,

which is a quadratic function (see Figure 6.4). Due to the noise, the equations are inconsistent. Thus the minimum value of the objective function χ2 usually has a nonzero, positive value.

6.3.1The gradient descent algorithm

The strategy of gradient descent algorithms is to evaluate the gradient of the objective function χ2 and use the gradient information to find the minimum of the objective function. The gradient in 1D is the derivative of the function. A positive gradient means an upward direction, and a negative gradient means a downward direction. The gradient descent algorithms take the opposite direction of the direction that is indicated by the gradient and use a small enough step size so that the algorithms can find the minimum (see Figure 6.5). The general form of a gradient descent algorithm looks like

χnext=χcurrentacurrentΔ(χcurrent),

image

Fig. 6.4: A quadratic objective function.

image

Fig. 6.5: The opposite direction of the gradient is the downhill direction.

where Δ is the gradient of the objective function χ2 atcurrent and contains both projection and backprojection at all rays. In fact,

Δ=AXP2=2AT(AXP),

where is the notation for the gradient operator, the projection AX is the multiplication of X by matrix A, and the backprojection is multiplication by matrix AT. The data discrepancy is (AXP). The algorithm converges when AX = P and X does not change any more. If the system is inconsistent, the algorithm converges when AT(AXP)=0.In our notation, X and are the same thing.

In the gradient descent algorithm, the step size αcurrent is calculated at each descent direction so that the quadratic objective function reaches its minimum along the chosen path. Most likely this minimum is not the global minimum of the objective function. At this point, the algorithm finds the current negative gradient direction and travels downhill to reach the next minimum along the new path. This procedure repeats over and over again until the algorithm terminates.

If the system is under-determined, this least-squares problem does not have a unique solution, and the objective function χ2 has a long valley (see Figure 6.6). The solution will depend upon the initial solution. If the initial solution is zero (i.e., x0=0), then the algorithm will converge to a minimum norm solution.

Due to noise, we seldom ever have AX = P at convergence. Instead, we get a very noisy image when the iteration number is large. We apply an iterative reconstruction algorithm to noisy data generated with the phantom in Figure 2.10. As shown in Figure 6.7, at the early iterations, the images only contain low-frequency components; at higher iterations, high-frequency components are recovered and noise comes into effect, too.

image

Fig. 6.6: For a degenerated system, the iterative algorithm solution depends on the initial value.

6.3.2The Landweber algorithm

The Landweber algorithm is a simplified gradient descent algorithm, in which the step size is a fixed value determine by the user. Unlike the gradient decent algorithm, the Landweber algorithm does not necessarily reach the minimum in the decent direction at each iteration. However, the algorithm can converge if the step size is chosen small enough. How small is small enough? The step size α should satisfy

0<α<2σ12

where σ1 is the largest singular value of matrix A. Sometimes, it is advantageous to make the step size α variable, as a function of, for example, iteration number, noise weighting, and/or regions of interest.

6.3.3The conjugate gradient algorithm

The gradient direction is easy to compute in a practical imaging problem using Δ=AXP2=2AT(AXP), but the negative gradient direction may not be the optimal direction to use in finding the optimum image. Let us look at the contour lines of a typical objective function χ2 in Figure 6.8, where the contour lines are ellipses. The gradient direction at any point is perpendicular to the tangent of the ellipse. The searching directions of two consequential steps, ucurrent and unext, are orthogonal to each other (that is, ucurrentunext=0). The searching path is zigzagging and not optimal.

image

Fig. 6.7: The image gets noisier as the iteration number gets larger.

image

Fig. 6.8: The negative gradient direction may not be the most efficient way to find the function’s minimum.

A better searching direction is to use the concept of conjugate directions. The conjugate directions are defined by ucurrent(ATA)unext=0 (see Figure 6.9). The shape of the objective function χ2 is characterized by (ATA). When we use the conjugate directions, we actually first deform the contour lines into circles then find the orthogonal directions. The conjugate directions make the algorithm converge faster.

image

Fig. 6.9: The conjugate gradient direction is more effective than the gradient direction.

6.4ML-EM algorithms

We do not have to use a least-squares objective function. There are many different ways to set up an objective function. If we use the Poisson noise model or simply use the nonnegativity constraint, we will get a special objective function. By minimizing that special objective function, a multiplicative updating algorithm known as the MLEM algorithm is derived and can be symbolically expressed as

xnext=xcurrentBackproject{Measurement/Project(xcurrent)}Backproject{1},

where is a vector with elements of 1’s. The size of the vector is that of the projection data vector. In this algorithm, the data discrepancy is calculated as a ratio instead of a difference. The distinguishing feature of this algorithm is its nonnegativity. If the initial image does not contain any negative pixels or voxels, the image values will never become negative.

Now let us explain the name of this algorithm: ML-EM. The objective function of this algorithm can be a likelihood function, which is the joint probability density function of Poisson random variables. We are looking for a solution (i.e., the reconstructed image) that can maximize this likelihood function. Therefore, this is a maximum likelihood algorithm.

When we try to maximize or minimize a function (e.g., an objective function or a likelihood function), we usually take the partial derivatives with respect to all of its unknowns (i.e., the pixel or voxel values), set these derivatives to zero, and solve for the unknowns. It turns out that our Poisson likelihood function is too complicated for us to optimize. We take the expectation value (or the statistical mean value) of the likelihood function. This is the “E” step, and it simplifies the problem significantly. We then find the maximum of the expected likelihood function. This is the “M” step. Therefore, we have the name “EM,” the expectation-maximization.

This ML-EM algorithm is also called the Richardson–Lucy algorithm or Lucy– Richardson algorithm, because Richardson and Lucy developed this algorithm for image deblurring applications in 1972 and 1974. There are many EM algorithms in different fields of research. The usual ML-EM algorithm is derived and used particularly for the emission data reconstruction. We also have transmission data ML-EM algorithms, too, but they are not as popular.

6.5OS-EM algorithm

In ART, the image is updated after each projection ray is considered. On the other hand, in gradient descent methods and in the ML-EM algorithm, the image is updated only when all projection rays are considered. One way to speed up the convergence rate of an iterative algorithm is to make more frequent image updates.

In an OS-EM algorithm, the projection views are grouped in different sets (called subsets), the algorithm goes through the subsets in a specified order, and the image is updated after each subset is considered. Figure 6.10 shows an example of how the projection views are divided into subsets. There are many strategies for dividing the views into subsets.

Increasing the number of subsets accelerates the convergence rate but may increase the noise as well. Roughly speaking, if you have N subsets, you may accelerate the ML-EM algorithm about N times. Modest acceleration of approximately 10 times is possible with very little increase in noise.

6.6Noise handling

Nowadays, many people choose an iterative image reconstruction algorithm over an analytical algorithm simply because the iterative algorithm in general can provide images containing less noise with the same or better resolution. We will investigate how the analytical and iterative algorithms handle the noise in this section.

image

Fig. 6.10: The total projection views are divided in to subsets.

6.6.1Analytical methods – windowing

In an analytical algorithm, noise regulation is achieved via the application of a window function when the projection data are filtered. The filter in an image reconstruction algorithm is always a high-pass filter (e.g., the ramp filter) in which the high-frequency components are enhanced more than the low-frequency components. In order to suppress the high-frequency noise, a window function is always applied to the ramp filter (see Figure 6.11). Basically, the noise regulation strategy in an analytical algorithm is to control the bandwidth. Thus, both high-frequency noise and high-frequency signal are discarded.

6.6.2Iterative methods – stopping early

There are many ways to control the noise in an iterative algorithm. We can first consider a rough noise propagation model of a linear iterative algorithm:

Errorimageλn(ω)×Errorprojections,

where Errorimage is the error magnitude in the reconstructed image, Errorprojections is the error magnitude in the projections, and λn(ω) is the algorithm transfer function which depends on the frequency ω and the iteration number n.

We can compare an iterative reconstruction algorithm with an SVD matrix pseudo-inverse solution. You may imagine that λn(ω) contains the information of both the singular values and singular vectors of the imaging matrix A. The frequency components are in the singular vectors. As the iteration number n increases, more singular vectors join λn(ω). The iteration number is somehow related to the cutoff index in an SVD pseudo-inversion expression. With a larger iteration number n, λn(ω) contains components with higher frequencies. In some linear algorithms, this relationship can be simplified to

Errorimageκ×Errorprojections,

where κ is similar to the condition number of matrix A, and κ is defined as the ratio of the largest singular value σ1 over the cutoff singular value σn. This simplification is reasonable because the “worst” noise influence comes from the frequency components (i.e., the singular vector) corresponding to the current smallest singular value σn. In the SVD pseudo-inverse method, the reconstructed image is a sum of many terms. Each term is a product of a component (i.e., the singular vector) and the reciprocal of its corresponding singular value 1/σ. The largest gain comes from 1/σn, which corresponds to a singular vector with very high frequencies.

image

Fig. 6.11: Application of a window function to the ramp filter.

In your mind, you can imagine that λn(ω) is a windowed ramp filter(see Figure 6.11). The width of the window increases as the iteration number increases. We immediately see that one way to control noise is to control the iteration number.

This analogy between an iterative reconstruction and an SVD pseudo-inverse is not mathematically correct, but it is a way of showing the similarity of these two approaches. This analogy can give us some insight of an iterative algorithm.

Stopping early is the simplest way to regulate the noise. However, iterative algorithms do not have a uniform convergence rate throughout the image. After an iterative algorithm is stopped, the resultant image will have nonuniform resolution. If you would like your reconstructed image to have uniform resolution, one remedy is to over-iterate (i.e., not to stop early) and then apply a post-filter to suppress the noise.

6.6.3Iterative methods – choosing pixels

The second way to reduce image noise is to reduce the errors in the data. This approach of regularization is a unique feature for the iterative algorithm. The errors between the projections P and the model AX, Error data, consist of two parts: deterministic errors and random errors. The deterministic errors are generated from the nonideal system model AX. First of all, discretizing the continuous object into pixels (or voxels) may cause errors. One must consider the trade-offs when deciding pixel size. Smaller pixels give a more accurate model but increase the number of unknowns to be solved. Larger pixels make the image model less accurate, but fewer unknowns can make the inverse problem more stable.

Using nonoverlapping uniform pixels or voxels to model an image is not an ideal approach because this image model contains a lot of discontinuity in the image and introduces too many artificial high-frequency components into the image. Some people have tried to use overlapping non-uniform pixels (or voxels) such as blobs (see Figure 6.12), which results in improved image quality. This gives a more realistic band-limited image model.

One drawback of using blobs as image voxels is the increased computational complexity. An alternative approach has been investigated to achieve the same effect but with better computational efficiency. This strategy uses the traditional non-overlapping voxels in the image, but a low-pass filter is applied to the backprojected image. The kernel function of the low-pass filter is chosen as the 3D “profile” of the blobs. In other words, the backprojected image is 3D convolved with the blob (see Figure 6.13).

image

Fig. 6.12: Using overlapping blobs to replace the traditional voxels can better model the image.

image

Fig. 6.13: An alternative approach to get the blob effect.

To make the inverse problem more stable, as a rule of thumb, we would select the pixel size larger than the detector bin size; this makes the number of image pixels smaller than the number of detector bins (see Figure 6.14). In practice, it is advantageous to choose a large array size (with a small bin dimension) on the detector during data acquisition. This makes a big difference in noise control in an iterative algorithm, especially when the system resolution is modeled in the projector/backprojector. A balanced selection of the detector bin size is half the size of the image pixel. If the image size is 256 × 256 × 256 and there is no option on the scanner to acquire 512 × 512 projections, you can acquire data using the 256 × 256 mode and interpolate the data into 512 × 512 arrays during iterative image reconstruction.

image

Fig. 6.14: It is advantageous to use a detector bin size that is smaller than the image pixel size.

6.6.4Iterative methods – accurate modeling

Modeling the system’s point spread function (see Figure 6.15) and patient induced attenuation and scattering in the matrix A will significantly reduce the deterministic errors between the projections P and the model AX. If you are not ready to model the imaging physics in matrix A, you still have a choice of line-length weighting or area weighting of the image path within the pixel to be used in calculating the elements in matrix A (see Figure 6.16). The freedom to model the imaging system with various geometries and physics effects is the main advantage of using an iterative reconstruction algorithm. We are able to control the errors between the projection data and the model to some degree; we can at least control the deterministic part of them with good system modeling. Smaller data modeling errors result in smaller errors in the image. With reduced data modeling error, we can increase the iteration number to get better image resolution with the same or reduced image noise.

image

Fig. 6.15: Modeling the system distance-dependent resolution and sensitivity in the projector.

image

Fig. 6.16: Area weighting is a better model than the line-length weighting but has a higher computation cost.

There is a fourth way to control the noise in an iterative algorithm: the random part. We can model the noise distribution in the objective function. Section 6.7 will be dedicated to this topic.

The fifth way is to use the prior knowledge about the image that we are looking for in addition to using the projection data alone. This topic will be covered in Section 6.8.

6.7Noise modeling as a likelihood function

In order for the noise model to work, we must have redundant measurements; otherwise, the noise model has no effect on the solution. In Figure 6.17, there are two lines, L1 and L2, representing two independent measurements described by two independent linear equations. These measurements can be noisy or noiseless. The solution is the intersection of these two lines, regardless of the presence of noise or if you trust L1 more or L2 more. Due to noise, this intersection may not be the true solution at all. There is nothing we can do to improve upon the solution if we only have two measurements.

What if we have three measurements L1, L2, and L3 (see Figure 6.18)? Because of noise influence the three lines do not intersect at one point. How should we pick a reasonable solution? A wise decision would depend on the noisiness of each measurement. We should trust the measurement with less noise more, and trust measurement with more noise less.

If we use the variance σ2 to characterize the noisiness of a measurement, we can assign a weighting factor 1/σ2 to that measurement. Thus we can use a variance-based weighting scheme to select a solution. Our old least-squares objective function

χ2=AXP2=(AXP)T(AXP)

image

Fig. 6.17: Two lines intersect at one point, which is the solution of the corresponding two equations.

image

Fig. 6.18: If the system has redundant measurements and is not consistent, one can use the noisiness to weight each measurement and find an approximate solution. In this example, we assume that L3 is noisier than L1, and L2 is less noisy than L1.

becomes a weighted least-squares objective function

χW2=(AXP)TW(AXP),

where W is a diagonal matrix W =diag {1σ12,1σ22,,1σN2}, and N is the number of projections. In many cases, it is advantageous to make the definition of the weighting matrix W more general. For example, we can let W =diag {1σ1γ,1σ2γ,,1σNγ} where the value of γ can be different from 2. The weighting W sometimes causes low-frequency shadowing artifacts in the reconstruction. A careful selection of this parameter γ may reduce the artifacts.

This weighted, least-squares objective function can also be obtained through the likelihood function by assuming Gaussian noise in the projections. The ith projection measurement pi is a Gaussian random variable whose mean value is μi=iaijxj=AiX and variance is Here, Ai is the ith row of the matrix A. The Gaussian distribution density function gives

Prob(pi)=12πσiexp((AiXpi)22σi2).

We assume that all projections are statistically independent. The likelihood function is the joint probability density function by considering all projections together:

Prob(P)=i12πσiexp((AiXpi)22σi2).

Our goal is to find an image X that maximizes the above likelihood function, hence the term “maximum likelihood solution.” Taking the logarithm of the likelihood function, we have

ln(Prob(P))=12i(AiXpi)2σi2+iln(12πσi).

The second term in the above equation is a constant. Therefore, maximizing the likelihood function is equivalent to minimizing the following weighted least-squares objective function:

χW2=i(AiXpi)2σi2=(AXP)TW(AXP).

If the data noise is not Gaussian, the above approach of setting up a likelihood function and an objective function still applies. However, the resultant objective function is different.

6.8Including prior knowledge (Bayesian)

We sometimes know more about the image that we are looking for – in addition to the measurements. We can enforce this prior knowledge into the image by adding an extra term to the objective function.

For example, if we know in advance that the image X is very smooth, we can add a penalty term ||X||2 to suppress sharp jumps and encourage the smoothness:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset