1.8 Compositing as Bayesian Joint Estimation

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

1.8 Compositing as Bayesian Joint Estimation

Up to this point in the discussion, we have not characterized the noise terms inherent in the imaging process, nor have we used them to improve the estimation of the light quantity present in the original scene. In this section we develop a technique to incorporate a noise model into the estimation process.

Our approach for creating an HDR image from N input LDR images begins with the construction of a notional N-dimensional inverse camera response function that incorporates the different exposure and weighting values between the input images. Then we could use this to estimate the photoquantity $\hat{q}$ $\hat{q}$ at each point by writing $\hat{q} (x) = f^{- 1} (f_{1}, f_{2}, \dots, f_{N}) / k_{1}$ $\hat{q} (x) = f^{- 1} (f_{1}, f_{2}, \dots, f_{N}) / k_{1}$ . In this case f⁻¹ is a joint estimator that could potentially be implemented as an N-dimensional lookup table (LUT). Recognizing the impracticality of this for large N, we consider pairwise recursive estimation for larger N values in the next section. The joint estimator f⁻¹(f₁,f₂,…,f_N) may be referred to more precisely as a comparametric inverse camera response function because it always has the domain of a comparagram and the range of the inverse of the response function of the camera under consideration.

Pairwise estimation

Assume we have N LDR images that are a constant change in exposure value apart, so that $Δ EV = {log}_{2} k_{i + 1} - {log}_{2} k_{i}$ $Δ EV = {log}_{2} k_{i + 1} - {log}_{2} k_{i}$ is a positive constant ∀ i ∈{1,…,N − 1}, where k_i is the exposure of the ith image. Now consider specializing to the case N = 2 so we have two exposures, one at k₁ = 1 (without loss of generality, because exposures have meaning only in proportion to one another) and the other at k₂ = k. Our estimate of the photoquantity may then be written as $\hat{q} (x) = f_{Δ EV}^{- 1} (f_{1}, f_{2}),$ $\hat{q} (x) = f_{Δ EV}^{- 1} (f_{1}, f_{2}),$ where $Δ EV = {log}_{2} k$ $Δ EV = {log}_{2} k$ .

To apply this pairwise estimator to three input LDR images, each with a constant difference in exposure between them, we can proceed by writing

$\begin{array}{l} f (\hat{q}) = f (f_{Δ EV}^{- 1} (f (f_{Δ EV}^{- 1} (f_{1}, f_{2})), f (f_{Δ EV}^{- 1} (f_{2}, f_{3})))) . \end{array}$ $\begin{array}{l} f (\hat{q}) = f (f_{Δ EV}^{- 1} (f (f_{Δ EV}^{- 1} (f_{1}, f_{2})), f (f_{Δ EV}^{- 1} (f_{2}, f_{3})))) . \end{array}$

(1.54)

In this expression, we first estimate the photoquantity based on images 1 and 2, and then the photoquantity based on images 2 and 3, then we combine these estimates using the same joint estimator, by first putting each of the earlier round (or “level”) of estimates through a virtual camera f, which is the camera response function.

This process may be expanded to any number N of input LDR images, by use of the recursive relation

$f_{i}^{(j + 1)} = f (f_{Δ EV}^{- 1} (f_{i}^{(j)}, f_{i + 1}^{(j)})),$ $f_{i}^{(j + 1)} = f (f_{Δ EV}^{- 1} (f_{i}^{(j)}, f_{i + 1}^{(j)})),$

where j = 1,…,N − 1,i = 1,…,N − j, and $f_{1}^{(N)}$ $f_{1}^{(N)}$ is the final output image, and in the base case, $f_{i}^{(1)}$ $f_{i}^{(1)}$ is the ith input image. This recursive process may be understood graphically as in Fig. 1.20. This process forms a graph with estimates of photoquantities as the nodes, and comparametric mappings between the nodes as the edges. A single estimation step using a CCRF is illustrated in Fig. 1.21.

f01-20-9780081004128 — Figure 1.20 Graph structure of pairwise comparametric image compositing. The HDR image $f_{1}^{(4)}$ $f_{1}^{(4)}$ is composited from the LDR source camera images $f_{1 \dots 4}^{(1)}$ $f_{1 \dots 4}^{(1)}$ . Nodes $f_{i}^{(j > 1)}$ $f_{i}^{(j > 1)}$ are rendered here by the scaling and rounding of the output from the comparametric camera response function (CCRF). To illustrate the details captured in the highlights and lowlights in the LDR medium of this paper, we include a spatiotonally mapped LDR rendering of $f_{1}^{(4)}$ $f_{1}^{(4)}$ .

f01-21-9780081004128 — Figure 1.21 CCRF-based compositing of a single pixel. The floating-point tonal values f₁ and f₂ are the arguments to the CCRF $f ° f_{Δ EV}^{- 1}$ $f ° f_{Δ EV}^{- 1}$ , which returns a refined estimate of an ideal camera response to the scene being photographed. This virtual camera’s exposure setting is equal to the exposure of the lower-exposure image f₁.

si171_e — Figure 1.21 CCRF-based compositing of a single pixel. The floating-point tonal values f₁ and f₂ are the arguments to the CCRF $f ° f_{Δ EV}^{- 1}$ $f ° f_{Δ EV}^{- 1}$ , which returns a refined estimate of an ideal camera response to the scene being photographed. This virtual camera’s exposure setting is equal to the exposure of the lower-exposure image f₁.

For efficient implementation, rather than computing at runtime or storing values of f⁻¹(f₁,f₂), we can store f(f⁻¹(f₁,f₂)). We call this the comparametric camera response function (CCRF). It is the comparametric inverse camera response function evaluated via (or “imaged” through, because we are in effect using a virtual camera) the camera response function f. This means at runtime we require N(N − 1)/2 recursive lookups, and we can perform all pairwise comparisons at each level in parallel, where a level is a row in Fig. 1.20.

The reason we can use the same CCRF throughout is because each virtual comparametric camera f ° f⁻¹ returns an exposure that is at the same exposure point as the less exposed of the two input images (recall that we set k₁ = 1), so Δev between images remains constant at each subsequent level.

In comparison with Fig. 3 in Mann and Mann (2001), wherein the objective is to recover the camera response function and its inverse, in this case we are using a similar hierarchical structure to instead combine information from multiple source images to create a single composite image.

The memory required to store the entire pyramid, including the source images, is N(N + 1) times the amount of memory needed to store a single uncompressed source image with floating-point pixels. Multichannel estimation (eg, for color images) can be done by use of separate response functions for each channel, at a cost in compute operations and memory storage that is proportional to the number of channels.

Alternative graph topology

Other connection topologies are possible — for example, we can trade memory usage for speed by compositing using the following form for the case N = 4 :

$f (\hat{q}) = f (f_{2 Δ EV}^{- 1} (f (f_{Δ EV}^{- 1} (f_{1}, f_{2})), f (f_{Δ EV}^{- 1} (f_{3}, f_{4})))),$ $f (\hat{q}) = f (f_{2 Δ EV}^{- 1} (f (f_{Δ EV}^{- 1} (f_{1}, f_{2})), f (f_{Δ EV}^{- 1} (f_{3}, f_{4})))),$

in which case we perform only three lookups at runtime, instead of six with the previous structure. However, we must store twice as much lookup information in memory: for $f ° f_{Δ EV}^{- 1}$ $f ° f_{Δ EV}^{- 1}$ as before, and for $f ° f_{2 Δ EV}^{- 1}$ $f ° f_{2 Δ EV}^{- 1}$ , because the results of the inner expressions are no longer Δev apart, but instead are twice as far apart in exposure value, 2Δev, as shown in Fig. 1.22. As a recursive relation for $N = 2^{n}, n \in N$ $N = 2^{n}, n \in N$ we have

$f_{i}^{(j + 1)} = f (f_{j Δ EV}^{- 1} (f_{2 i - 1}^{(j)}, f_{2 i}^{(j)})),$ $f_{i}^{(j + 1)} = f (f_{j Δ EV}^{- 1} (f_{2 i - 1}^{(j)}, f_{2 i}^{(j)})),$

where $j = 1, \dots, {log}_{2} N$ $j = 1, \dots, {log}_{2} N$ and i = 1,…,N/2^j−1. The final output image is $f_{1}^{({log}_{2} N + 1)}$ $f_{1}^{({log}_{2} N + 1)}$ , and $f_{i}^{(1)}$ $f_{i}^{(1)}$ is the ith input image. This form requires N − 1 lookups. In general, by combining this approach with the previous graph structure, we can see that comparametric image composition can always be done in O(N) lookups $\forall N \in N$ $\forall N \in N$ .

f01-22-9780081004128 — Figure 1.22 Example of an alternative graph structure for pairwise comparametric image compositing.

The alternative form described so far is only a single option of many possible configurations. For example, use of multiple LUTs provides less locality of reference, causing cache misses in the memory hierarchy, and uses more memory. In a memory-constrained environment, or one in which memory access is slow, we could use a single LUT while still using this alternate topology, as in

$\begin{array}{l} \hat{q} (x) = & f_{Δ EV}^{- 1} (f (f_{Δ EV}^{- 1} (f_{1}, f_{2}) / 2), f (f_{Δ EV}^{- 1} (f_{3}, f_{4}) / 2)) . \end{array}$ $\begin{array}{l} \hat{q} (x) = & f_{Δ EV}^{- 1} (f (f_{Δ EV}^{- 1} (f_{1}, f_{2}) / 2), f (f_{Δ EV}^{- 1} (f_{3}, f_{4}) / 2)) . \end{array}$

(1.55)

This comes at the expense of our performing more arithmetic operations per comparametric lookup.

Constructing the CCRF

To create a CCRF f ° f⁻¹(f₁,f₂,…,f_N), the ingredients required are a camera response function f(q) and an algorithm for creating an estimate $\hat{q}$ $\hat{q}$ of photoquantity by combining multiple measurements. Once these have been selected, f ° f⁻¹ is the camera response evaluated at the output of the joint estimator, and is a function of two or more tonal inputs f_i.

To create a LUT means sampling through the possible tonal values, so, for example, to create a 1024 × 1024 LUT we could execute our $\hat{q}$ $\hat{q}$ estimation algorithm for all combinations of $f_{1}, f_{2} \in {0, \frac{1}{1023}, \frac{2}{1023}, \dots, 1}$ $f_{1}, f_{2} \in {0, \frac{1}{1023}, \frac{2}{1023}, \dots, 1}$ and store the result of $f (\hat{q})$ $f (\hat{q})$ in a matrix indexed by [1023f₁,1023f₂], assuming zero-based array indexing. Intermediate values may be estimated by linear or other interpolation.

Incremental updates

In the common situation that there is a single camera capturing images in sequence, we can easily perform updates of the final composited image incrementally, using partial updates, by only updating the buffers dependent on the new input.

1.8.1 Example Joint Estimator

In this section we describe a simple joint photoquantity estimator, using nonlinear optimization to compute a CCRF. This method executes in real time for HDR video, using pairwise comparametric image compositing. Examples of the results of this estimator can be seen in Figs. 1.20 and 1.22.

1.8.1.1 Bayesian probabilistic model for the CCRF

In this section we propose a simple method for estimating a CCRF. First, we select a comparametric model, which determines the analytical form of the camera response function. As an example, we illustrate our compositing approach using the “preferred saturation” camera model (Mann, 2001), for which an analytical is known and can be verified by use of the approach in the previous section.

The next step is to determine the model parameters, as in Fig. 1.23; however, any camera model with good empirical fit may be used with this method.

f01-23-9780081004128 — Figure 1.23 Comparametric model fitting. Preferred saturation model parameters were found via nonlinear optimization, by the method of least squares with the Levenberg-Marquardt algorithm. The optimal comparametric model function, determined per color channel, is plotted directly on empirical comparasums to verify a good fit. Comparasums are sums of comparagrams from the same sensor with the same difference in exposure value ΔEV. They are shown range compressed with the $log$ $log$ function, and color inverted, to show finer variation. The best results for comparametric compositing are found when the camera response function model parameters are optimized against a range of k values. Here k₁ = 1 and k₂ = 8, k₃ = 64, and k₄ = 512, which implies that for comparametric image compositing we would use ΔEV = 3.

Let scalars f₁ and f₂ form a Wyckoff set from a camera with zero-mean Gaussian noise, and let random variables X_i = f_i − f(k_iq),i ∈{1,2} be the difference between the observation and the model, with k₁ = 1 and k₂ = k.

We can estimate the variances of X_i can be estimated between exposures by calculating the interquartile range along each column (for X₁) and row (for X₂) of the comparagram with the Δev of interest (ie, using the “fatness” of the comparagram). A robust statistical formula, based on the quartiles of the normal distribution, gives $\hat{σ} \approx interquartile range / 1.349$ $\hat{σ} \approx interquartile range / 1.349$ , which can be stored in two one-dimensional vectors.

Discontinuities in ${\hat{σ}}_{X_{i}}$ ${\hat{σ}}_{X_{i}}$ with respect to f_i can be mitigated by Gaussian blurring of the sample statistics, as shown in Fig. 1.24. Using interpolation between samples of the standard deviation, and extrapolation beyond the first and last samples, we can estimate for any value of f₁ or f₂ the corresponding constant $σ_{X_{1}}$ $σ_{X_{1}}$ or $σ_{X_{2}}$ $σ_{X_{2}}$ .

f01-24-9780081004128 — Figure 1.24 Trace plot of estimated standard deviations from a comparagram. Each estimate is proportional to the IQR, calculated from each column f₁ and row f₂ of a comparagram; here with ΔEV = 3 as given in Fig. 1.23. Gaussian smoothing is applied to reduce discontinuities due to edge effects, quantization, and other noise.

The probability of $\hat{q}$ $\hat{q}$ , given f₁ and f₂, is

$\begin{array}{l} P (q = \hat{q} | f_{1}, f_{2}) & = \frac{P (q) P (f 1 | q, f 2) P (f 2 | q)}{P (f 1, f 2)} \\ = \frac{P (q) P (f_{1} | q) P (f_{2} | q)}{P (f_{1} | f_{2})} \\ = \frac{P (q) P (f_{1} | q) P (f_{2} | q)}{\int_{0}^{\infty} P (f_{1} | q) P (f_{2} | q) d q} \\ \propto P (q = \hat{q}) P (f_{1} | q) P (f_{2} | q) . \end{array}$ $\begin{array}{l} P (q = \hat{q} | f_{1}, f_{2}) & = \frac{P (q) P (f 1 | q, f 2) P (f 2 | q)}{P (f 1, f 2)} \\ = \frac{P (q) P (f_{1} | q) P (f_{2} | q)}{P (f_{1} | f_{2})} \\ = \frac{P (q) P (f_{1} | q) P (f_{2} | q)}{\int_{0}^{\infty} P (f_{1} | q) P (f_{2} | q) d q} \\ \propto P (q = \hat{q}) P (f_{1} | q) P (f_{2} | q) . \end{array}$

si195_e

For simplicity, we choose a uniform prior, which gives us $P_{prior} (q = \hat{q}) = constant$ $P_{prior} (q = \hat{q}) = constant$ . Using X_i, we have

$\begin{array}{l} P_{model} (f_{i} | q) & = normal (μ_{X_{i}} = 0, σ_{X_{i}}^{2}) \\ = \frac{1}{\sqrt{2 π} σ_{X_{i}}} exp (- \frac{{(f_{i} - f (k_{i} q))}^{2}}{2 σ_{X_{i}}^{2}}) . \end{array}$ $\begin{array}{l} P_{model} (f_{i} | q) & = normal (μ_{X_{i}} = 0, σ_{X_{i}}^{2}) \\ = \frac{1}{\sqrt{2 π} σ_{X_{i}}} exp (- \frac{{(f_{i} - f (k_{i} q))}^{2}}{2 σ_{X_{i}}^{2}}) . \end{array}$

si197_e

To maximize $P (q = \hat{q} | f_{1}, f_{2})$ $P (q = \hat{q} | f_{1}, f_{2})$ with respect to q, we remove constant factors and equivalently minimize $- log (P)$ $- log (P)$ . Then the optimal value of q, given f₁ and f₂, is

$q = \underset{q}{argmin} (\frac{{(f_{1} - f (q))}^{2}}{σ_{X_{1}}^{2}} + \frac{{(f_{2} - f (k q))}^{2}}{σ_{X_{2}}^{2}}) .$ $q = \underset{q}{argmin} (\frac{{(f_{1} - f (q))}^{2}}{σ_{X_{1}}^{2}} + \frac{{(f_{2} - f (k q))}^{2}}{σ_{X_{2}}^{2}}) .$

si200_e

In practice, good estimates of optimal q values can be found with use of, for example, the Levenberg-Marquardt algorithm.

1.8.2 Discussion Regarding Compositing via the CCRF

Use of direct computation for nonlinear iterative methods as in Pal et al. (2004) is not feasible for real-time HDR video, because the time required to converge to a solution on a per-pixel basis is too long.

For our simplistic probabilistic model given in Section 1.8.1, it takes more than 1 min (approximately 65 s) to compute each output frame with a single processor. With the method proposed in Section 1.8, the multicore speedup is more than 2500 times for CPU-based computation, and 3800 times for graphics processing unit (GPU)-based computation (versus CPU), as shown in Table 1.3.

Table 1.3

Performance of Pairwise Composition Versus Direct Calculation of a Composite HDR Image on Four Input LDR Images

	Method
	Direct Calculation	CCRF, Full Update	CCRF, Incremental
Platform	Speed (output frames per second)			Speedup
CPU (serial)	0.0154	51	78	5065 times
CPU (threaded)	0.103	191	265	2573 times
GPU	–	272	398	–

t0020

The selection of the size of the LUT depends on the range of exposures for which it is used. It was found empirically that 1024 × 1024 samples of a CCRF is enough for the practical dynamic range of our typical setups. Further increases in the size of the LUT resulted in no noticeable improvement in output video quality.

Because GPUs implement floating-point texture lookup with linear interpolation in hardware, and can execute highly parallelized code, the GPU execution would seem to be a natural application of general-purpose GPU computation. However, for this application much of the time is spent waiting for data transfer between the host and the GPU — the pairwise partial update is useful in this context because we can reuse partial results from the previous estimate, and transfer only the new data.

1.8.3 Review of Analytical Comparametric Equations

In this section have we developed the general solution to any comparametric problem. This solution converts the comparametric system from a functional equation to a separable ordinary differential equation. The solution to this differential equation can then be used to perform image compositing based on photoquantity estimation.

We have further illustrated comparametric compositing as a novel computational method that uses multidimensional LUTs, recursively if necessary, to estimate HDR output from LDR inputs. The runtime cost is fixed irrespective of the algorithm implemented if it can be expressed as a comparametric lookup. Pairwise estimation decouples the specific compositing algorithm from runtime, enabling a flexible architecture for real-time applications, such as HDR video, that require fast computation. Our experiments show that we approach a data transfer barrier rather than a compute-time limit. We demonstrated a speedup of three orders of magnitude for nonlinear optimization–based photoquantity estimation.

1.9 Efficient Implementation of HDR Reconstruction via CCRF Compression

High-quality HDR typically requires large computational and memory requirements. We seek to provide a system for efficient real-time computation of HDR reconstruction from multiple LDR samples, using very limited memory. The result translates into hardware such as field-programmable gate array (FPGA) chips that can fit within eyeglass frames or other miniature devices.

The CCRF enables an HDR compositing method that performs a pairwise estimate by taking two pixel values p_i and p_j at an exposure difference of Δev and outputs $\hat{q}$ $\hat{q}$ , the photoquantigraphic quantity (Ali and Mann, 2012). The CCRF results can be stored in a LUT. The LUT contains N × N elements of precomputed CCRF results on pairs of discretized p_i and p_j values. The benefit of the use of a CCRF is that the final estimate of photoquantity can incorporate multiple samples at different exposures to determine a single estimate of the photoquantity at each pixel.

Quadtree representation

The CCRF LUT can be represented in a tree structure. For N × N elements, we can generate a quadtree (a parent node in a such tree contains four child nodes) to fully represent the CCRF LUT. Such a quadtree is a complete tree with ${log}_{4} N^{2}$ ${log}_{4} N^{2}$ or ${log}_{2} N$ ${log}_{2} N$ levels.

One method of generating such a tree is to recursively divide a unit square into four quadrants (four smaller but equally sized squares). We can visualize the center of a divided unit square as the parent node of the four quadrants. The center of each quadrant is considered a child node. Such a process is performed recursively in each quadrant until the root unit square is divided into N × N equally sized squares. The bottom nodes of the quadtree are the leaves of the tree, each of which stores the CCRF lookup value of the corresponding pixel pair (p_i, p_j).

Reducing the quadtree

Storing the leaves of the complete quadtree representation of the CCRF costs as much space as the CCRF LUT itself. To reduce the number of elements needed for storage, we proceed to interpolate the CCRF value ${\hat{p}}_{i, j}$ ${\hat{p}}_{i, j}$ of a specific pair (p_i, p_j) on the basis of its neighbor CCRF lookups. The value ${\hat{p}}_{i, j}$ ${\hat{p}}_{i, j}$ interpolated on the basis of its neighbors is compared against the actual CCRF lookup value p_i,j, which gives e_i,j, the error per lookup entry:

$\begin{array}{l} e_{i, j} = | {\hat{p}}_{i, j} - p_{i, j} | . \end{array}$ $\begin{array}{l} e_{i, j} = | {\hat{p}}_{i, j} - p_{i, j} | . \end{array}$

(1.56)

We accept the approximated result if e_i,j is within a fraction of $\frac{1}{2^{D}}$ $\frac{1}{2^{D}}$ as the error threshold e_th:

$\begin{array}{l} e_{th} = α \cdot \frac{1}{2^{D}}, \end{array}$ $\begin{array}{l} e_{th} = α \cdot \frac{1}{2^{D}}, \end{array}$

(1.57)

where α is the fraction constant and D is the bit depth of the pixel value. We define the neighbor CCRF points as the four corner CCRF points of the square. We interpolate all CCRF (p_i, p_j) within the same square on the basis of these four corner values. We denote $I (e_{i, j}, e_{th})$ $I (e_{i, j}, e_{th})$ as the indicator function of unsatisfied error condition:

$\begin{array}{l} I (e_{i, j}, e_{th}) = \{\begin{matrix} 1, & if e_{i, j} \geq e_{th}, \\ 0, & otherwise, \end{matrix} \end{array}$ $\begin{array}{l} I (e_{i, j}, e_{th}) = \{\begin{matrix} 1, & if e_{i, j} \geq e_{th}, \\ 0, & otherwise, \end{matrix} \end{array}$

si210_e (1.58)

We divide the square into four quadrants if

$\begin{array}{l} \sum I (e_{i, j}, e_{th}) > 0, \forall p_{i, j} \in square. \end{array}$ $\begin{array}{l} \sum I (e_{i, j}, e_{th}) > 0, \forall p_{i, j} \in square. \end{array}$

(1.59)

The purpose of the division is to obtain new corner values that are closer to the point. The closer corner values to the point for interpolation may yield lower e_i,j as the CCRF LUT generally varies smoothly over a continuous and wide range of p_i and p_j values. Therefore, we expect that the density of the divisions corresponds to the local gradient of the CCRF LUT: the higher the local gradient, the more recursive divisions are required to bring corners closer to the point, whereas the points formed of a large and smooth region of the CCRF with low local gradient share the same corners.

Error weighting and tree depth criteria

The error of the interpolation against the original lookup value of the input pair (p_i, p_j) should be within e_th. This error is affected by the interpolation method. Empirically, we find that bilinear interpolation works better than quartic interpolation in terms of minimizing the number of lookup points while satisfying the error constraint.

Statistically, we observe that the most frequently accessed CCRF values lie along the comparagram, as shown in Fig. 1.25. Therefore, higher precision on interpolation may not be necessary for CCRF lookup points that are distant from the comparagram. This suggests that the error constraint for the pair (p_i, p_j) should vary depending on its likelihood of occurrence. To further compress the CCRF LUT, we can scale e_i,j by the number of observed occurrences of the pair (p_i, p_j). This information is obtained through the construction of the comparagram. For each entry of the CCRF lookup, we weight the interpolation error by directly multiplying it by the occurrence count observed on the comparagram:

$\begin{array}{l} e_{i, j} \cdot (B_{i, j} + 1), \end{array}$ $\begin{array}{l} e_{i, j} \cdot (B_{i, j} + 1), \end{array}$

(1.60)

where B_i,j is the count of the number of occurrences on the comparagram entry of (p_i, p_j). The result with the weighted errors also favors bilinear interpolation over quartic interpolation in terms of minimizing the number of entries of the CCRF for storage, as shown in Fig. 1.26. Reconstruction of the original CCRF from this quadtree is shown in Fig. 1.27, with the corresponding error shown in Fig. 1.28.

f01-25-9780081004128 — Figure 1.25 A comparagram is a joint histogram of pixel values from images of the same scene taken with different exposures. For any given sensor, the comparagram is directly related to the camera response function. Areas that are dark in this plot correspond to joint values that are expected to occur in practice.

f01-26-9780081004128 — Figure 1.26 Quadtree-based representation of the CCRF based on weighting from the comparagram shown in Fig. 1.25. Areas of rapid change or high use are more finely subdivided for greater accuracy. Inside each square the CCRF value is approximated by bilinear interpolation based on the corner values.

f01-27-9780081004128 — Figure 1.27 Reconstruction of the CCRF LUT based on a compressed quadtree with an error constraint of within one pixel value (α set to 1.0). The error bound is weighted by the expected usage, so values used more often have a smaller error bound.

f01-28-9780081004128 — Figure 1.28 The absolute difference between the original and the reconstructed CCRF LUTs. The allowed error is greater in areas of the table that correspond to highly unlikely situations.

The CCRF LUT we use has 1024 entries for p_i and 1024 entries for p_j. Therefore, there is no point in constructing a tree that has a depth of more than ${log}_{2} 1024 = 10$ ${log}_{2} 1024 = 10$ . We may constrain the depth of the tree to fewer than 10 levels as long as the error constraint is met. This affects the resulting number of entries in the CCRF quadtree, as well as the number of iterations required to find a leaf node.

Corner value access

Each node of the quadtree is the center point of the square that contains it. To access the corner values of a leaf node, we can perform a recursive comparison of pair (p_i, p_j) with (p_x, p_y) of the nonleaf nodes in the tree until a leaf node has been reached, seen in Algorithm 1. The leaf nodes contain memory addresses of the corresponding corner values. The corner values of a leaf node are stored in the memory for retrieval.

1.9.1 Implementation

The algorithm can be implemented efficiently on a medium-sized FPGA. Given a finalized quadtree data structure, a system can be generated with software. The four corner values are stored in ROMs implemented with on-chip block RAM, and then selected by multiplexer chains based on the inputs f₁ and f₂. An arithmetic circuit that follows can then calculate the result on the basis of bilinear interpolation. Thus, as shown in Fig. 1.29, the system consists of two major parts: an addressing circuit and an interpolation circuit.

f01-29-9780081004128 — Figure 1.29 The top-level system consists of two main parts: the addressing circuit and the interpolation circuit. Input pixel values are normalized first to 16-bit fixed-point representation (f₁ and f₂) before entering the boundary block. According to the boundary conditions of the two values, the circuit generates controlling signals to the multiplexer tree, which selects the correct address that corresponds to the original quadtree node. The address is then used in the interpolation circuit to retrieve the stored values that are necessary for bilinear interpolation. After arithmetic circuit operation, the output ff can be further combined with ff from another instance of the same circuit.

Because the size of the quadtree can grow to 10 levels, a C program is written to generate the implementation of the two circuits in Verilog hardware description language. Given a compressed data structure, this program generates four ROM initialization files and a circuit that retrieves corner values stored in the ROMs based on the inputs.

1.9.1.1 Addressing circuit

Each of the quadtree leaves needs a unique address. This address is used to retrieve the corresponding corner values from ROMs. As shown in Fig. 1.30, the circuit outputs the address by comparing f₁ and f₂ with constant boundary values, in the same way as we traverse the quadtree. The main function of the boundary comparator is to send the controlling signal to traverse the multiplexer tree, on the basis of the given input pair. At each level, it compares the input values with the prestored center coordinate values, and determines which branch (if exits) it should take next. Otherwise, the current node is a quadtree leaf and a valid address will be selected.

f01-30-9780081004128 — Figure 1.30 The relation between the original quadtree and its multiplexer implementation makes it very easy to generate the Verilog using the same quadtree data structure in software (ie, generate software that generates the Verilog code that describes the hardware design). Efficient use of four-to-one multiplexers in the six-LUT FPGA architecture can significantly reduce resource usage (ie, each multiplexer is mapped to one logic slice, instead of three slices if a two-to-one multiplexer is used) and code generation algorithm.

The algorithm generates the circuit by traversing the quadtree. Because a new unique address is needed for every leaf being visited, a global counter is used to determine the addresses. The width of the circuit data path is then determined with the last address generated (ie, the maximum address).

Algorithm 1

Recursive quadtree search

1.9.1.2 Interpolation circuit

The circuit takes the address and uses it to look up values that are prestored in the block RAM. These values can be used to perform an arithmetic operation (as shown in Fig. 1.31) for bilinear interpolation. To maintain high throughput, the intermediate stages are pipelined with use of registers.

f01-31-9780081004128 — Figure 1.31 This circuit performs pipelined arithmetic that is necessary for bilinear interpolation. The inputs to this circuit are load from block RAM, which is initialized according to the compressed quadtree structure. The dotted red (dark gray in print versions) lines indicate the stage after which the data can be pipelined in order to have higher throughput.

1.9.2 Compression Performance

The compressed CCRF LUT requires storage of all corner values per pair (p_i, p_j) for interpolation. Without any compression, this method would require as many as four times the storage space because of redundancies of identical CCRF values shared between adjacent lookups. However, the compression is able to reduce the number of lookup entries by a factor greater than four times. This compression factor depends on the selection of α.

We wrote the compression in the C programming language to output the compressed CCRF LUT. In Table 1.4, we list the minimum compression factor and the expected error over the entire CCRF range, for four selections of α.

Table 1.4

Quadtree Properties Depending on the Choice of α

Maximum Depth = 8	α = 1	α = 1/2	α = 1/4	α = 1/16
Number of entries	3315	4828	6508	10,807
Compression factor	79.1	54.2	161.1	97.0
Mean depth	5.8	6.2	6.6	7.4
Expected depth	7.7	7.8	7.8	7.9
Error constraint	0.0039	0.0019	0.00098	0.00024
Expected error	0.00042	0.00038	0.00036	0.00035

t0025

Notes: The compression factor is calculated on the basis of the number of CCRF lookup entries required divided by the number of the entries after compression. The CCRF without compression contains 10⁶ floating-point entries.

The minimum compression factor summarizes the amount of compression achieved by taking the ratio of the total number of entries in the CCRF LUT to the number in the compressed one:

$\begin{array}{l} Minimum compression factor = \frac{N^{2}}{number of leaf nodes \times 4} . \end{array}$ $\begin{array}{l} Minimum compression factor = \frac{N^{2}}{number of leaf nodes \times 4} . \end{array}$

(1.61)

The constant 4 is the upper bound of the maximum redundancy of the corner value storage that overlaps with adjacent CCRF lookup points.

1.9.2.1 Hardware resource usage

For resource estimation without consideration of optimization effort performed by the CAD tool, we have

$\begin{array}{l} Number of multiplexers & = number of nonleaf nodes, \\ Number of comparators & = number nonleaf nodes \times 2, \\ Number of addresses & = number of leaves . \end{array}$ $\begin{array}{l} Number of multiplexers & = number of nonleaf nodes, \\ Number of comparators & = number nonleaf nodes \times 2, \\ Number of addresses & = number of leaves . \end{array}$

si222_e

The actual number of resources needed to implement the system is much less (almost halved) than what we expected. The detailed data gathered from both expectation (using counters) and the CAD outputs are summarized in Table 1.5.

Table 1.5

The Resource Usage of the Implementation on a Kintex 7 (Xilinx XC7K325T) FPGA

	Depth = 4	Depth = 6	Depth = 8	Depth = 10
Expected slice usage	63	633	3315	21,207
Portion used (%)	0.0309	0.31	1.6	10
Actual slice usage	2	119	737	11,101
Portion used (%)	9.8 × 10⁻⁴	0.058	0.36	5.5

t0030

Notes: The total number of logic slices used in this FPGA is 203,800. The difference between the expected and actual usage is due to the optimization present in the synthesis process, enabling more efficient use of resources.

1.9.3 Conclusion

This section presented an architecture for accurate HDR imaging that is amenable for implementation on highly memory constrained low-power FPGA platforms. Construction of the compositing function is performed by nonlinear optimization of a Bayesian formulation of the compositing problem, where the selected prior creates an accurate estimator that is smooth for robustness and enhanced compression. The estimator is solved over a regular grid in the unit square, forming a two-dimensional LUT. Implementation of this solution on an FPGA then relies on the compression of the LUT into a quadtree form that allows random access, and uses bilinear interpolation to approximate values for intermediate points. This form allows selective control over error bounds, depending on the expected use of the table, which is easily obtained for a particular sensor. This results in compression of more than 60 times relative to the original LUT, with visually indistinguishable results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 1.8 Compositing as Bayesian Joint Estimation

Create new playlist

Sign In

Sign Up

1.8 Compositing as Bayesian Joint Estimation

Pairwise estimation

Alternative graph topology

Constructing the CCRF

Incremental updates

1.8.1 Example Joint Estimator

1.8.1.1 Bayesian probabilistic model for the CCRF

1.8.2 Discussion Regarding Compositing via the CCRF

1.8.3 Review of Analytical Comparametric Equations

1.9 Efficient Implementation of HDR Reconstruction via CCRF Compression

Quadtree representation

Reducing the quadtree

Error weighting and tree depth criteria

Corner value access

1.9.1 Implementation

1.9.1.1 Addressing circuit

1.9.1.2 Interpolation circuit

1.9.2 Compression Performance

1.9.2.1 Hardware resource usage

1.9.3 Conclusion

Table of Contents for
1.8 Compositing as Bayesian Joint Estimation