Chapter 5 Fundamentals of Distributed Estimation

Many applications such as target tracking, robotics, and manufacturing have increasingly used multiple sensor or data sources to provide information. Multiple sensors provide better coverage than a single sensor, either over a larger geographical area or broader spectrum. By generating more measurements, they can improve detection and false alarm performance. Improved accuracy (location, classification) can also result from viewing or phenomenological diversity provided by multiple sensors. For example, similar sensors that are not co-located can provide more accurate measurements on target location by exploiting different viewing angles, while dissimilar sensors such as radar and optical can observe different features for better object recognition.

The measurements from multiple sensors can be processed or fused at a central site or multiple sites. The centralized fusion architecture requires communicating all the measurements to a single site and is theoretically optimal because the information in the measurements is not degraded by any intermediate processing. When the sensors are geographically distributed, it may make sense to also distribute the processing, with each processing site responsible for the measurements from one or more sensors. These sites can communicate their results to other fusion sites for further processing. The distributed fusion architecture has many advantages such as lower bandwidth by communicating processing results rather than measurements, availability of processing results for local functions such as sensor management, distribution of the processing load to multiple sites, and less vulnerability because there is no single point of failure. Furthermore, a properly designed distributed fusion system can provide modularity and scalability for rapid incorporation of more sensors.

Because of these advantages, there are many examples of distributed fusion systems including net-centric military systems, robotics teams, and wireless sensor networks, where centralized processing is not practical. However, many technical issues need to be addressed for distributed fusion systems to achieve high performance. The first issue is selecting the appropriate fusion architecture that connects sensors with the processors or agents at the fusion sites and how the data are shared with other sites in the network. The fusion architecture also specifies the information flow between the agents. The second issue is how the data should be processed by each agent to provide the best performance. For example, a fusion agent has to recognize when common information occurs in any received data to avoid double counting when fusing the data.

This chapter presents the fundamental concepts for distributed data fusion. In particular, we focus on the estimation problem where the goal of fusion is to compute an estimate of the state from measurements collected by multiple sensors. The state may be continuous and time varying such as the position and velocity of a vehicle in object tracking. It may also be discrete and static such as the class of an object in object classification. We focus on estimation to exclude discussions on data association issues that are important in object tracking. These issues will be discussed in Chapter 6.

The rest of this chapter is structured as follows. Section 5.2 discusses distributed fusion architectures, their advantages and disadvantages, the use of information graph to represent information flow, and selection of an appropriate architecture. Section 5.3 presents the Bayesian fusion equation for combining two probability functions, or their means and covariances. Section 5.4 shows how the information graph can be used to keep track of information flow in a distributed estimation system and how it can be used to derive fusion equations for various fusion architectures. Section 5.5 discusses some suboptimal but practical approaches that are based on approximations of the optimal approach. Section 5.6 presents algorithms for fusing estimates characterized by means and covariances. Section 5.7 discusses distributed fusion for object tracking when the state is continuous and time-varying. Section 5.8 discusses distributed fusion for object classification when the state is a discrete and static random variable. Section 5.9 provides a summary, and Section 5.10 contains some bibliographic notes.

Much has been published on distributed estimation over the last three decades with summaries provided in Chong et al. (1990) and Liggins and Chang (2009). Our discussion focuses on algorithms that are non-iterative, i.e., we will not address the consensus problem (Teneketzis and Varaiya 1988, Olfati-Saber 2005). We also view decentralized estimation (Durrant-Whyte et al. 1990) as a special case of distributed estimation. Furthermore, we sometimes use fusion and estimation to mean the same thing, and consider conditional probability (density) as a form of estimate.

5.2 DISTRIBUTED ESTIMATION ARCHITECTURES

The basic components of a distributed estimation system are sensors, processors (estimation or fusion agents), and users. Sensors generate measurements or data on the objects of interest. The measurements contain information on the object state such as position, velocity, or class. Estimation or fusion agents process sensor data or results received from other fusion agents to generate better estimates. Users are the consumers of the fusion results. A user can be the controller in a robotic system or a commander in a surveillance system. In a distributed fusion or estimation system, there are multiple sensors, processors, and users. These components are usually distributed geographically and connected together by a communication network.

The fusion architecture (Chong 1998) consists of three components. At the system level, the communication graph represents network connectivity between the components. When sensors collect measurements and processors fuse estimates at multiple times, the information graph represents the detailed information flow from the sensors to the processors. Finally, the information content communicated also has to be specified.

5.2.1 FUSION ARCHITECTURE GRAPH

The fusion architecture graph represents the connectivity of the fusion system as determined by the communication network. The nodes of the graph represent the sensors and processors, and the directed edges are the communication paths between the components. There are two main types of system architectures based on the number of communication paths from a particular sensor to the processor.

5.2.1.1 Singly Connected Fusion Architectures

In a singly connected fusion architecture, there is a single path between any sensor–processor pair. Figure 5.1 shows four examples of singly connected fusion architectures—centralized, decoupled, replicated centralized, and hierarchical without feedback.

In the centralized architecture, measurements from all sensors are sent to a single fusion site or agent to be processed. Theoretically this architecture produces the best performance since there is no information loss. However, centralization implies high communication load over the network, high processing load at the fusion site, and low survivability due to a single point of failure. The decoupled architecture partitions the sensors into multiple sets with a fusion site responsible for each set. This architecture is appropriate when there is a natural partitioning of the sensors so that the sensors in the same set can help each other but those outside the set provide little additional information. This architecture has the lowest computation and communication requirements. However, the performance can be poor if the sensors cannot be partitioned easily. In the replicated centralized architecture, multiple fusion sites process data from overlapping sets of sensors. There is no communication among the fusion sites. This architecture has high performance and reliability due to the multiple sites processing the same data. However, it also has high communication and processing costs.

FIGURE 5.1 Singly connected fusion architectures.

These three architectures do not allow communication among the fusion sites. Thus there is a single information path from a sensor to a fusion site. This allows the use of simple fusion algorithms since the double counting or rumor propagation problem does not exist. These architectures are useful since they serve as benchmarks for comparing the performance of other distributed fusion architectures.

In the hierarchical (without feedback) architecture, the fusion sites are arranged in a hierarchy with the low-level fusion sites processing sensor data to form local estimates. These estimates are sent to a high-level fusion site to be combined. In order to realize the benefit of reduced communication, the communication rate from the low-level site to the high level should be lower than the sensor observation rate. As compared to the centralized architecture, the hierarchical architecture has the advantage of lower communication, lower processing cost when the low-level site processes data from a smaller set of sensors, and increased reliability. However, multiple information paths can occur if the sensors and fusion sites collect measurements and process at multiple times.

5.2.1.2 Multiply Connected Fusion Architectures

In a multiply connected fusion architecture, there are multiple communication paths between a pair of sensor and processor. Figure 5.2 shows four examples of multiply connected fusion architectures—hierarchical with sensor sharing, hierarchical with feedback, peer-to-peer, and broadcast.

In the hierarchical with sensor sharing architecture, the measurements from one sensor are processed by multiple fusion sites. This makes sense when that sensor is particularly powerful. However, high-level fusion is difficult because the common information from that sensor cannot be removed easily. In the hierarchical with feedback architecture, the accuracy of the local estimates can be enhanced by feeding back high-level estimates (which include information from more sensors) to the low level where the data are to be combined. In this feedback architecture, information flows in both directions, from low level to high level and also from high level to low level.

FIGURE 5.2 Multiply connected fusion architectures.

In the peer-to-peer architecture, a fusion agent has two-way communication with another fusion agent (with only neighbors in a decentralized architecture). In the broadcast architecture, a fusion agent broadcasts its results to multiple fusion agents, who can also broadcast their own results. These two are examples of fully distributed architectures where the communication is dynamic and not specified a priori. For example, a fusion site may send its results to another fusion site depending on the results or in response to a request for information from another site. Such architectures can adapt dynamically to the current situation. In general, multiply connected fusion architectures are more robust against failures, but algorithms are more difficult to develop because of the multiple information paths.

5.2.2 INFORMATION GRAPH

The fusion architecture graph characterizes information paths at a high level. It does not describe how each measurement or fusion result flows through the system, and particularly it does not portray the effects of time between updates or communications due to repeated sensor observations and fusion processing. In particular, the architectures in Figures 5.1 and 5.2 do not represent the relationship between the estimates and the sensor data at different times, which is needed in order to identify the common information to avoid double counting or data incest. The information graph (Chong et al. 1982, 1983, 1985, 1986, 1987, Chong and Mori 2004) represents the detailed information flow and transactions in a fusion architecture specified by communication paths. It also supports the development of optimal and suboptimal fusion algorithms. A similar graph model can be found in McLaughlin et al. (2004, 2005).

The nodes in the information graph represent information events. The observation node represents the observation event of a sensor at a specific time; the fusion node represents a fusion event at a fusion site at a specific time. There are two main types of fusion events: fusion of sensor observation with the local fusion result, and fusion of the processing results from other sites with the local results.

The directed edges or links represent the communication between information nodes. Note that the observation node is a leaf node with no predecessors and its successor nodes are always fusion nodes. The predecessor node of a fusion node may be an observation node or another fusion node. A fusion node may have other fusion nodes as successors or no successor nodes.

The edges in the graph can be used to trace the information available to a node. A directed path from Node A to Node B means that Node B has access to the information at Node A, and in general each node has access to the information of its predecessor nodes. The specific information available depends on what is communicated. Sensor data are transmitted from an observation node to a fusion node but usually estimates are communicated between fusion nodes. If the estimate is the sufficient statistics, then the maximum information at a node consists of the sensor data based on all its ancestor observation nodes.

A main problem in distributed fusion is identifying the common information shared by two estimates that need to be fused. The information graph provides a useful tool to discover the source of this common information. If two fusion nodes have common ancestors, then the estimates at these nodes contain the information of the common ancestors. If two fusion nodes have no common predecessor, there is no sharing of information except for the prior. The following are some examples of information graphs.

5.2.2.1 Singly Connected Information Graphs for Singly Connected Fusion Architectures

FIGURE 5.3 Singly connected information graph for singly connected fusion architectures.

5.2.2.2 Multiply Connection Information Graphs for Hierarchical Fusion

Figure 5.4 shows the information graph for hierarchical fusion without feedback architecture. Even though the fusion architecture graph is singly connected when there is no feedback from the high-level site, the information graph (on the left) is multiply connected due to repeated communication and fusion. For example, both fusion nodes H and L have the predecessor node $\bar{L}$ , that is, the information at $\bar{L}$ is included in the information at H and L. Thus fusion of H and L have to make sure that the common information of $\bar{L}$ is not double counted. This multiply connected information graph can be transformed into a singly connected graph by modifying the processing and communication strategies. One approach is to have the local fusion site send only new information since the last time it communicated with the high-level fusion site. This is equivalent to deleting the edge at the local site after each communication (local restart and sending new information in Figure 5.4). Effectively, the local fusion site has a separate estimator whose output is communicated. The other approach of getting a singly connected graph is not to allow memory at the high-level fusion site. Then the fusion nodes will only have observation nodes from each sensor (global restart and no memory in Figure 5.4).

FIGURE 5.4 Information graph for hierarchical fusion without feedback.

Figure 5.5 shows the information graph for hierarchical fusion with feedback. As in hierarchical fusion without feedback, the multiply-connected information graph for high level fusion can be converted to a singly connected network if the local fusion site sends fusion results that do not rely on feedback or the sensor observations before the last communication. Effectively, the local site keeps two sets of books—an optimal estimate for local use based on all sensor observations and an estimate for communication based only on new local observations received since the last communication. Similarly, the low-level fusion site can obtain a singly connected fusion graph by deleting the appropriate edges.

FIGURE 5.5 Information graph for hierarchical fusion with feedback.

FIGURE 5.6 Information graph for hierarchical fusion with common sensor.

Figure 5.6 shows the information graph for hierarchical fusion with a common sensor. In this case, the information graph is inherently multiply connected and it is difficult to convert it into a singly connected information graph.

5.2.2.3 Information Graph for Distributed Architectures

The information graphs for general distributed architectures are usually multiply connected because of the possible communication paths. However, it is sometimes possible to convert them to singly connected information graphs by designing the appropriate information exchange. Figure 5.7 shows how the information graphs for peer-to-peer and broadcast fusion architectures can be made singly connected if each local fusion site only communicates the new information received since the last communication. The dotted lines are information paths that are only maintained for generating the local estimates but not for communication.

FIGURE 5.7 Peer-to-peer and broadcast architectures.

FIGURE 5.8 Information graph for cyclic architecture.

The information graph can become complicated if the fusion sites only communicate their fusion results. Figure 5.8 shows a cyclic fusion architecture where site 1 sends its local fusion result to site 3, site 3 to site 2, and site 2 to site 1. It is difficult to identity new information because of the loopy communication in this architecture. From the information graph, the most recent common predecessors of B and C are D and E. The common predecessors of D and E are the nodes F and G, which have the same information as H.

5.2.3 INFORMATION COMMUNICATED AND COMMON PRIOR KNOWLEDGE

Defining the distributed fusion architecture also requires specifying the type of data communicated. The data can be the sensor measurements collected by the site or processing results, which can be estimates or probabilities of the state. Choosing what to communicate is a tradeoff between bandwidth and amount of information. Communicating measurements require the most bandwidth but provide the most information. Processing is easy for the fusion site receiving the measurements because the measurement errors are generally independent. Effectively, each fusion site performs centralized fusion of measurements and the information graph is singly connected.

When processing results such as estimates or probabilities are communicated, additional information is frequently needed for optimal fusion by the receiver. For example, network topology or information pedigree is needed to construct the information graph to identify the common information. Optimal fusion may also require knowing other estimates. When such information or sufficient statistics is not available, fusion can only be suboptimal. For example, optimal fusion for tracking objects with nonzero process noise requires knowledge of state estimates at multiple previous times. The fusion will be suboptimal when the state is only known at the current time.

5.2.4 SELECTING APPROPRIATE ARCHITECTURES

The fusion architecture has a significant impact on the development and performance of the distributed fusion system. A fusion architecture can be evaluated by the amount of information generated, communication bandwidth, algorithm complexity, and robustness. The following are some general guidelines for selecting fusion architectures:

• Use all sensor data for optimal performance. A fusion site should have access to as much sensor data as possible and a fusion node in the information graph should include information on all observation nodes (ancestors) that can be communicated to the fusion node.

• Compress sensor data for efficient communication. Less communication bandwidth is needed if the information in multiple observation nodes can be captured by a single intermediate fusion node. However, compression may result in information loss and introduce multiply connected information paths.

• Find architectures with singly connected information paths. Then the information to be fused will not contain common information and the fusion algorithm will be relatively simple.

• Use redundant paths for robustness/survivability. Each observation node should have multiple paths to reach a fusion node. However, redundancy may result in more processing/communication cost and/or more complicated fusion algorithms.

5.3 BAYESIAN DISTRIBUTED FUSION ALGORITHM

The goal of distributed estimation is to generate an “optimal” estimate for each fusion site given the information available to the fusion site. We assume that local estimates (or probabilities) and not measurements are communicated to the fusion site. The advantage is local use of estimates and lower bandwidth due to data compression.

When measurements are communicated, as in centralized fusion, the fusion algorithm can exploit the independent measurement errors or the conditional independence of the measurements given the state or variable to be estimated. When only local estimates are shared across the network, this conditional independence may be lost due to common information resulting from prior communication. In some cases, the “state” may not be large enough due to internal variables not included in the estimates. These are issues that have to be addressed in developing distributed fusion algorithms.

The following sections will develop the optimal Bayesian distributed fusion algorithm for a general object state. For object tracking, this state is a temporal sequence of states (e.g., position, velocity) at each time, and the observation is a temporal sequences) of measurements, e.g., range, angle for a radar. For object classification, the state is the object class and attributes such as object size and the observations are observed features such as measured length.

5.3.1 BAYESIAN DISTRIBUTED ESTIMATION PROBLEM AND SOLUTION

Let x be the state to be estimated. The state may be a continuous random variable such as the position and velocity of an object or a discrete random variable such as the class of an object. Let p(x) be the prior probability density function for a continuous variable or the probability distribution for a discrete variable.

Suppose the measurement sets at two fusion nodes (as in the information graph), node 1 and node 2, are

$\begin{array}{l} Z_{1} = {z_{11}, z_{12}, z_{13}, \dots} \\ Z_{2} = {z_{21}, z_{22}, z_{23}, \dots} \end{array}$

(5.1)

These measurements may come from multiple sensors at different times or the same time. Assume the measurements are conditionally independent given x, i.e.,

$p (z_{i j}, \dots, z_{m n}, \dots | x) = p (z_{i j} | x) \dots p (z_{m n} | x) \dots$

(5.2)

This assumption is valid if the measurement errors are independent across sensors and over time.

The fusion nodes compute the local posterior conditional probabilities p(x | Z₁) and p(x | Z₂). The goal of distributed estimation is to compute the posterior conditional probability p(x | Z₁ ∪ Z₂) given all the measurements Z₁ ∪ Z₂.

The fused information set Z₁ ∪ Z₂ is the union of each node’s private information and the common information (Figure 5.9), i.e.,

$Z_{1} \cup Z_{2} = (Z_{1} Z_{2}) \cup (Z_{2} Z_{1}) \cup (Z_{1} \cap Z_{2})$

(5.3)

where denotes set difference. Then, the assumption (5.2) of conditional independence of the measurements given the state implies that

$\begin{array}{l} p (Z_{1} \cup Z_{2} | x) & = p (Z_{1} Z_{2} | x) p (Z_{2} Z_{1} | x) p (Z_{1} \cap Z_{2} | x) \\ = \frac{p (Z_{1} | x) p (Z_{2} | x)}{p (Z_{1} \cap Z_{2} | x)} \end{array}$

(5.4)

FIGURE 5.9 Decomposition into private and common information.

Bayes rule leads to the Bayesian distributed fusion equation

$\begin{array}{l} p (x | Z_{1} \cup^{} Z_{2}) & = \frac{p (Z_{1} | x) p (Z_{2} | x) p (x)}{p (Z_{1} \cap^{} Z_{2} | x) p (Z_{1} \cup^{} Z_{2})} \\ = C^{- 1} \frac{p (x | Z_{1}) p (x | Z_{2})}{p (x | Z_{1} \cap^{} Z_{2})} \end{array}$

(5.5)

where the normalizing constant is given by

$C = \frac{p (Z_{1} \cap^{} Z_{2}) p (Z_{1} \cup^{} Z_{2})}{p (Z_{1}) p (Z_{2})}$

(5.6)

This Bayesian distributed fusion equation states that the fused posterior probability p(x | Z₁ ∪ Z₂) is the product of the local probabilities p(x | Z₁) and p(x | Z₂), divided by the common probability p(x | Z₁ ∩ Z₂), which is included in each of the local probabilities.

The Bayesian fusion equation can be used to derive optimal fusion equations for the state of interest as long as the measurements are conditionally independent given the state. The key is identifying the common information that has to be removed to avoid double counting. This common information is usually a prior probability or estimate or the information shared during the last communication. Thus, fusion requires knowing the common probability p(x | Z₁ ∩ Z₂) in addition to p(x | Z₁) and p(x | Z₂).

5.3.2 BAYESIAN DISTRIBUTED FUSION FOR GAUSSIAN RANDOM VECTORS

Suppose the state x is a Gaussian random vector with known mean and covariance, and the measurements are also Gaussian with zero mean errors and known covariance. Then the local estimates are Gaussian random vectors with means ${\hat{x}}_{i}$ and covariances P_i. The fused estimate is also Gaussian with mean ${\hat{x}}_{1 \cup^{} 2}$ and covariance $P_{1 \cup^{} 2}$ . Then the fusion equation (5.5) becomes

$P_{1 \cup^{} 2}^{- 1} {\hat{x}}_{1 \cup^{} 2} = P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2} - P_{1 \cap^{} 2}^{- 1} {\hat{x}}_{1 \cap^{} 2}$

(5.7)

$P_{1 \cup^{} 2}^{- 1} = P_{1}^{- 1} + P_{2}^{- 1} - P_{1 \cap^{} 2}^{- 1}$

(5.8)

where ${\hat{x}}_{1 \cap^{} 2}$ and P_1∩2 are the mean and covariance of the state estimate given the common information.

Equations 5.7 and 5.8 are the information matrix form of the fusion equations because the inverse of the covariance matrix is the information matrix. Equation 5.8 states that the information matrix of the fused estimate is the sum of the information matrices of the local estimates minus the information matrix of the common estimate. Equation 5.7 states that the information of the fused estimate is the sum of the local information minus the common information. As in the general case, optimal fusion requires knowing the ${\hat{x}}_{1 \cap^{} 2}$ and P_1∩2 in addition to the estimates and covariances to be fused.

The information matrix fusion equations can be derived directly from the information filter equations (Chong 1979). Suppose each fusion node i = 1, 2, has the observation equation

$Z_{i} = H_{i} x + v_{i}$

(5.9)

where

H_i is the observation matrix

v_i is a zero mean independent observation noise with error covariance R_i

Then the information filter form of the estimate ${\hat{x}}_{i}$ is given by

$P_{i}^{- 1} {\hat{x}}_{i} = {\bar{P}}^{- 1} \bar{x} + {H^{'}}_{i} R_{i}^{- 1} Z_{i}$

(5.10)

with error covariance given by

$P_{i}^{- 1} = {\bar{P}}^{- 1} + {H^{'}}_{i} R_{i}^{- 1} H_{i}$

(5.11)

where $\bar{x}$ and $\bar{P}$ are the mean and covariance of x. Given the measurements Z₁ and Z₂, the optimal estimate $\hat{x}$ and its error covariance P are given by the information filter equations

$P^{- 1} \hat{x} = {\bar{P}}^{- 1} \bar{x} + H^{'} R^{- 1} Z$

(5.12)

$P^{- 1} = {\bar{P}}^{- 1} + H^{'} R^{- 1} H$

(5.13)

where the measurement vector Z, observation matrix H, and noise covariance matrix R are

$\begin{matrix} Z = [\begin{array}{l} Z_{1} \\ Z_{2} \end{array}] & H = [\begin{array}{l} H_{1} \\ H_{2} \end{array}] & R = [\begin{array}{l} R_{1} & 0 \\ 0 & R_{2} \end{array}] \end{matrix}$

(5.14)

Since

$H^{'} R^{- 1} Z = {H^{'}}_{1} R_{1}^{- 1} Z_{1} + {H^{'}}_{2} R_{2}^{- 1} Z_{2}$

(5.15)

$H^{'} R^{- 1} H = {H^{'}}_{1} R_{1}^{- 1} H_{1} + {H^{'}}_{2} R_{2}^{- 1} H_{2}$

(5.16)

Equations 5.12 and 5.13 become

$P^{- 1} \hat{x} = {\bar{P}}^{- 1} \bar{x} + {H^{'}}_{1} R_{1}^{- 1} Z_{1} + {H^{'}}_{2} R_{2}^{- 1} Z_{2}$

(5.17)

$P^{- 1} = {\bar{P}}^{- 1} + {H^{'}}_{1} R_{1}^{- 1} H_{1} + {H^{'}}_{2} R_{2}^{- 1} H_{2}$

(5.18)

These are the information fusion equations used in Durrant-Whyte et al. (1990) to fuse measurements communicated by the fusion agents. Substituting into (5.10) and (5.11) produce the information matrix fusion equations similar to Equations 5.7 and 5.8

$P^{- 1} \hat{x} = P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2} - {\bar{P}}^{- 1} \bar{x}$

(5.19)

$P^{- 1} = P_{1}^{- 1} + P_{2}^{- 1} - {\bar{P}}^{- 1}$

(5.20)

5.4 OPTIMAL BAYESIAN DISTRIBUTED FUSION FOR DIFFERENT ARCHITECTURES

The Bayesian distributed fusion equation assumes a hierarchical architecture with no feedback. Furthermore, the local estimates are only fused once by the fusion agent. However, with the help of the information graph, this equation (in either general or linear form) can be used to derive optimal fusion equations for complex architectures and identify the information that needs to be communicated in addition to the estimates to be fused. It is also trivial to extend to fusing multiple local estimates. The following sections contain some examples.

5.4.1 HIERARCHICAL ARCHITECTURE

5.4.1.1 Hierarchical Fusion without Feedback

Consider the example of Figure 5.4 with F3 as the fusion site. When there is no feedback from the high level, the common information in the received estimate p(x | Z_L) and the current estimate p(x | Z_H) is the estimate p(x | Z_L̅) last communicated from the low-level fusion site F2. From (5.5), the fused estimate or probability function is given by

$p (x | Z_{H} \cup^{} Z_{L}) = C^{- 1} \frac{p (x | Z_{H}) p (x | Z_{L})}{p (x 1 Z_{\bar{L}})}$

(5.21)

When the probability distribution is Gaussian, applying Equations 5.7 and 5.8 yields

$P_{H \cup^{} L}^{- 1} = P_{H}^{- 1} + P_{L}^{- 1} - P_{\bar{L}}^{- 1}$

(5.22)

$P_{H \cap L}^{- 1} {\hat{x}}_{H \cup L} = P_{H}^{- 1} {\hat{x}}_{H} + P_{L}^{- 1} {\hat{x}}_{L} - P_{L}^{- 1} {\hat{x}}_{\bar{L}}$

(5.23)

where the subscripts represent the information nodes.

5.4.1.2 Hierarchical Fusion with Feedback

For hierarchical architecture with feedback in Figure 5.5, fusion takes place at both levels. For fusion at the low level node F1, the common predecessor of L₁ and H is $\bar{L}$ , the fusion node of the last communication from low level to high level. For fusion of H at the high-level node F3 with the estimate at L₂ from F2, the common predecessor is $\bar{H}$ , the fusion node of the last communication from high level to low level. For low-level fusion, the common information shared (from the information graph) is the last estimate sent to the high level. Thus the fusion equation for low level is

$p (x | Z_{H} \cup^{} Z_{L_{1}}) = C^{- 1} \frac{p (x | Z_{H}) p (x | Z_{L_{1}})}{p (x | Z_{\bar{L}})}$

(5.24)

Similarly, the high-level fusion equation is

$p (x | Z_{H} \cup^{} Z_{L_{2}}) = C^{- 1} \frac{p (x | Z_{H}) p (x | Z_{L_{2}})}{p (x | Z_{\bar{H}})}$

(5.25)

When the variables are Gaussian, the low-level fusion equations are

$P_{H \cup^{} L_{1}}^{- 1} = P_{H}^{- 1} + P_{L_{1}}^{- 1} - P_{\bar{L}}^{- 1}$

(5.26)

$P_{H \cup^{} L_{1}}^{- 1} {\hat{x}}_{H \cup^{} L_{1}} = P_{H}^{- 1} {\hat{x}}_{H} + P_{L_{1}}^{- 1} {\hat{x}}_{L_{1}} - P_{\bar{L}}^{- 1} {\hat{x}}_{\bar{L}}$

(5.27)

and the high-level fusion equations are

$P_{H \cup^{} L_{2}}^{- 1} = P_{H}^{- 1} + P_{L_{2}}^{- 1} - P_{\bar{H}}^{- 1}$

(5.28)

$P_{H \cup^{} L_{2}}^{- 1} {\hat{x}}_{H \cup^{} L_{2}} = P_{H}^{- 1} {\hat{x}}_{H} + P_{L_{2}}^{- 1} {\hat{x}}_{L_{2}} - P_{\bar{H}}^{- 1} {\hat{x}}_{\bar{H}}$

(5.29)

5.4.2 ARBITRARY DISTRIBUTED FUSION ARCHITECTURE

The optimal fusion algorithm for arbitrary distributed fusion architectures is found by repeated application of the Bayesian fusion equation (5.5). The algorithm starts by identifying the common predecessor nodes of the information nodes whose estimates are to be fused. If there is only one common predecessor node, then the information at that node becomes the p(Z₁ ∩ Z₂) in the denominator of (5.5). If there are multiple common predecessor nodes, then (5.5) is used again to compute p(Z₁ ∩ Z₂) terms of the information (probability) at these nodes and the information at their common predecessor nodes. The process is repeated until each conditional probability involves only one information node. Thus the fusion equation for the general fusion architecture consists of a product of probabilities representing information to be fused and divisions representing redundant information to be removed. The general fusion equation has the form (Chong et al. 1987, 1990)

$p (x | \cup_{i = 1}^{N} Z_{i}) = C^{- 1} \prod_{j \in J} p {(x | Z_{j})}^{α (j)}$

(5.30)

where

J is a set of predecessor nodes of the fusion node

C is a normalizing constant

α(·) is either +1 or −1 depending on whether information is to be added or deleted

For Gaussian case, the fusion equations are

$P^{- 1} \hat{x} = \sum_{j \in J} α (j) P_{j}^{- 1} {\hat{x}}_{j}$

(5.31)

$P^{- 1} = \sum_{j \in J} α (j) P_{j}^{- 1}$

(5.32)

The hierarchical fusion equations discussed earlier are special cases of these equations. For the cyclic architecture of Figure 5.8, repeated application of the fusion equation results in the following equation:

$\begin{array}{l} p (x | Z_{A}) & = C^{- 1} \frac{p (x | Z_{B}) p (x | Z_{C})}{p (x | Z_{D \cup^{} E})} \\ = C^{- 1} \frac{p (x | Z_{B}) p (x | Z_{C}) p (x | Z_{F \cup^{} G})}{p (x | Z_{D}) p (x | Z_{E})} \\ = C^{- 1} \frac{p (x | Z_{B}) p (x | Z_{C}) p (x | Z_{H})}{p (x | Z_{D}) p (x | Z_{E})} \end{array}$

(5.33)

The equations for the Gaussian case are

$P_{A}^{- 1} {\hat{x}}_{A} = P_{B}^{- 1} {\hat{x}}_{B} + P_{C}^{- 1} {\hat{x}}_{C} - P_{D}^{- 1} {\hat{x}}_{D} - P_{E}^{- 1} {\hat{x}}_{E} + P_{H}^{- 1} {\hat{x}}_{H}$

(5.34)

$P_{A}^{- 1} = P_{B}^{- 1} + P_{C}^{- 1} - P_{D}^{- 1} - P_{E}^{- 1} + P_{H}^{- 1}$

(5.35)

5.5 SUBOPTIMAL BAYESIAN DISTRIBUTED FUSION ALGORITHMS

The optimal distributed fusion algorithm described in the previous section is based upon identifying and removing redundant information using the information graph. When the bandwidth does not support communication of information pedigree, such as in ad hoc wireless sensor networks, the relevant part of the information graph cannot be constructed by the fusion node. Even if the information pedigree can be communicated, in a dynamic network with possible failures and adaptive communication strategies, the optimal distributed fusion algorithm may not be practical due to the long pedigree information needed for de-correlation. This section presents several practical and scalable algorithms (Chang et al. 2010) based on approximations of the optimal algorithms (5.30) through (5.32). To simplify the notation, we again focus on the fusion of two information nodes with either probability functions given by p₁(x) and p₂(x), or estimates ${\hat{x}}_{1}$ and ${\hat{x}}_{2}$ with error covariances P₁ and P₂. The fusion result is represented by probability function p(x) or an estimate $\hat{x}$ with error covariance P.

5.5.1 NAÏVE FUSION

Naïve fusion ignores the dependence in the information to be fused or the denominator in the optimal Bayesian fusion equation (5.5). Thus the naïve fusion algorithm is

$p (x) = C^{- 1} p_{1} (x) p_{2} (x)$

(5.36)

where C is the normalizing constant. For Gaussian case, the common information is similarly ignored in Equations 5.7 and 5.8, resulting in the following equations for the fused state estimate and error covariance

$\begin{array}{l} P^{- 1} = P_{1}^{- 1} + P_{2}^{- 1} \\ P^{- 1} \hat{x} = P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2} \end{array}$

(5.37)

By not subtracting the prior information matrix (inverse of the prior covariance matrix), the computed fused error covariance is smaller than the true error covariance, resulting in an estimate of naïve fusion that may be overconfident.

The naïve fusion equation for the Gaussian case is sometimes called the convex combination equation because it can be shown that the fused estimate is given by

$\hat{x} = P_{2} {(P + P_{2})}^{- 1} {\hat{x}}_{1} + P {(P + P_{2})}^{- 1} {\hat{x}}_{2}$

(5.38)

For the cyclic architecture of Figure 5.8, naïve fusion only retains p(x | Z_B) and p(x | Z_C).

5.5.2 CHANNEL FILTER FUSION

The channel filter (Grime and Durrant-Whyte 1994, Nicholson et al. 2001, Bourgault and Durrant-Whyte 2004) can be viewed as a first-order approximation of the optimal fusion algorithm. The distributed estimation system consists of a number of channels with each defined by a pair of transmitting and receiving nodes. In the channel filter, the fusion node keeps track of the communication history for all the information nodes that it receives data from. When it receives a new estimate to be fused from a node, it retrieves the more recent estimate from that node and considers it as the only common information to be removed, ignoring earlier information nodes that may have contributed to the common information. In that sense, the channel filter can be considered as a first-order approximation to the optimal information graph approach.

Specifically, the channel filter fusion equation is given as

$p (x) = C^{- 1} \frac{p_{1} (x) p_{2} (x)}{\bar{p} (x)}$

(5.39)

where

C is a normalizing constant

$\bar{p} (x)$ is the probability function received from the same channel at the previous communication time and is the common “prior information” to be removed in the fusion formula, with mean $\bar{x}$ and covariance $\bar{P}$ when Gaussian.

When both p₁(x) and p₂(x) are Gaussian with means and covariances ${\hat{x}}_{1}$ , P₁ and ${\hat{x}}_{2}$ , P₂ respectively, the fused state estimate and corresponding error covariance are given by

$P^{- 1} = P_{1}^{- 1} + P_{2}^{- 1} - {\bar{P}}^{- 1}$

(5.40)

$P^{- 1} \hat{x} = P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2} - {\bar{P}}^{- 1} \bar{x}$

(5.41)

The first-order approximation of channel filter fusion is suboptimal because it does not account for all common information shared by the estimates to be fused. However, it may only be slightly suboptimal if the time between when that redundancy occurred and the current processing time is relatively long. For the cyclic architecture of Figure 5.8, channel filter approximates (5.33) by the following

$p (x | Z_{A}) = C^{- 1} \frac{p (x | Z_{B}) p (x | Z_{C})}{p (x | Z_{D})}$

(5.42)

and ignores the other terms in the optimal fusion equation. Similarly, the fusion equations for the Gaussian case become

$P_{A}^{- 1} {\hat{x}}_{A} = P_{B}^{- 1} {\hat{x}}_{B} + P_{C}^{- 1} {\hat{x}}_{C} - P_{D}^{- 1} {\hat{x}}_{D}$

(5.43)

$P_{A}^{- 1} = P_{B}^{- 1} + P_{C}^{- 1} - P_{D}^{- 1}$

(5.44)

5.5.3 CHERNOFF FUSION

Chernoff information fusion also ignores completely the dependence in the information to be fused. However, instead of assigning equal weights as in naïve fusion, the fusion formula allows different weights for the probabilities to be fused, resulting in

$p (x) = C^{- 1} p_{1}^{w} (x) p_{2}^{1 - w} (x)$

(5.45)

where w ∊ [0 1] is an appropriate parameter which minimizes a chosen criterion. The fusion algorithm is called Chernoff fusion when the criterion to be minimized is the Chernoff information (Cover and Thomas 1991) defined by the normalizing constant C. It can be shown that the resulting fused probability function that minimizes the Chernoff information is the one “halfway” between the two original densities in terms of the Kullback Leibler distance (Cover and Thomas 1991). In the case when both p₁(x) and p₂(x) are Gaussian, the resulting fused density is also Gaussian with mean and covariance given by

$P^{- 1} = w P_{1}^{- 1} + (1 - w) P_{2}^{- 1}$

(5.46)

$P^{- 1} \hat{x} = w P_{1}^{- 1} {\hat{x}}_{1} + (1 - w) P_{2}^{- 1} {\hat{x}}_{2}$

(5.47)

This formula is identical to the covariance intersection (CI) fusion technique (Chong and Mori 2001, Nicholson et al. 2001, 2002, Hurley 2002, Julier 2006, Julier et al. 2006). Therefore, the CI technique can be considered as a special case of (5.45). In theory, Chernoff fusion can be used to combine any two arbitrary probabilities in a log-linear fashion. However, the resulting fused probability may not preserve the same form as the original ones. Also in general, obtaining the proper weighting parameter to satisfy a certain criterion may involve extensive search or computation.

5.5.4 BHATTACHARYYA FUSION

Bhattacharyya fusion is a special case of Chernoff fusion (5.45), when the parameter w is set to be 0.5. Then the normalizing constant of (5.45) becomes $B = \int \sqrt{p_{1} (x) p_{2} (x)} d x$ , which is the Bhattacharyya bound. The fusion algorithm is

$p (x) = B^{- 1} \sqrt{p_{1} (x) p_{2} (x)}$

(5.48)

When both p₁(x) and p₂(x) are Gaussian, the fusion equation can be written as

$P^{- 1} = \frac{1}{2} (P_{1}^{- 1} + P_{2}^{- 1})$

(5.49)

$P^{- 1} \hat{x} = \frac{1}{2} (P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2})$

$\hat{x} = {(P_{1}^{- 1} + P_{2}^{- 1})}^{- 1} (P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2})$

(5.50)

Therefore, Bhattacharyya fusion is similar to naïve fusion for the Gaussian case. However, the resulting fused covariance is twice as big as that of naïve fusion. Note that the fusion equation can be rewritten as

$P^{- 1} = \frac{1}{2} (P_{1}^{- 1} + P_{2}^{- 1}) = (P_{1}^{- 1} + P_{2}^{- 1}) - \frac{1}{2} (P_{1}^{- 1} + P_{2}^{- 1})$

(5.51)

$\begin{array}{l} P^{- 1} \hat{x} & = \frac{1}{2} (P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2}) \\ = (P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2}) - \frac{1}{2} (P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2}) \end{array}$

(5.52)

This formula replaces the common prior information of (5.40) and (5.41) for channel filter by the average of the two sets of information to be fused, namely, ${\bar{P}}^{- 1} \Leftarrow \frac{1}{2} (P_{1}^{- 1} + P_{2}^{- 1})$ and ${\bar{P}}^{- 1} \bar{x} \Leftarrow \frac{1}{2} (P_{1}^{- 1} {\hat{x}}_{1} + P_{2}^{- 1} {\hat{x}}_{2})$ . In other words, instead of removing the common prior information from the previous communication as in the channel filter case, the common information of Bhattacharyya fusion is approximated by the “average” of the two locally available information sets.

5.6 DISTRIBUTED ESTIMATION FOR GAUSSIAN DISTRIBUTIONS OR ESTIMATES WITH ERROR COVARIANCES

In Section 5.5, we presented several suboptimal algorithms that avoid the exact identification and removal of redundant information using the information graph. These algorithms can be viewed as approximations of the optimal fusion algorithm for general probability functions. This section presents fusion algorithms that are optimal according to some criteria when the information to be fused is either Gaussian or can be represented by estimates with error covariances.

In the following, we assume that the state to be estimated has mean $\bar{x}$ and covariance $\bar{P}$ , the estimates to be fused are ${\hat{x}}_{1}$ and ${\hat{x}}_{2}$ with error covariances P₁ and P₂, and cross-covariance $P_{12} = {P^{'}}_{21}$ . Note that in addition to the common prior $\bar{x}$ and $\bar{P}$ , there is additional dependence between ${\hat{x}}_{1}$ and ${\hat{x}}_{2}$ represented by the cross-covariance $P_{12} = {P^{'}}_{21}$ . Thus removing the common prior alone is not sufficient for generating the best fused estimate.

5.6.1 MAXIMUM A POSTERIORI FUSION OR BEST LEAST UNBIASED ESTIMATE

Let $z = [{\hat{x}}^{'}_{1} {\hat{x}}^{'}_{2}]'$ be the augmented vector of the estimates to be fused. Assume z and x are jointly Gaussian with mean $\bar{z}$ and $\bar{x}$ , with covariances

$P_{x z} = {P^{'}}_{z x} ≜ E {(x - \bar{x}) (z - \bar{z})'}$

(5.53)

$P_{z z} ≜ E {(z - \bar{z}) (z - \bar{z})'}$

(5.54)

Then given z, p(x | z) is also Gaussian with mean and covariance given by (Anderson and Moore 1979)

$\hat{x} = \bar{x} + P_{x z} P_{z z}^{- 1} (z - \bar{z})$

(5.55)

$P = \bar{P} - P_{x z} P_{z z}^{- 1} P_{z x}$

(5.56)

Note that (5.55) is also the maximum a posteriori (MAP) estimate (Mori et al. 2002, Chang et al. 2004) and can be expressed as

$\hat{x} = \bar{x} + W_{1} ({\hat{x}}_{1} - \bar{x}) + W_{2} ({\hat{x}}_{2} - \bar{x}) = W_{0} \bar{x} + W_{1} {\hat{x}}_{1} + W_{2} {\hat{x}}_{2}$

(5.57)

with W₀ = I − W₁ − W₂ and $P_{0 i} = E ((x - \bar{x}) ({\bar{x}}_{i} - \bar{x})')$ for i = 1, 2, where

$[W_{1} W_{2}] = P_{x z} P_{z z}^{- 1} = [P_{01} P_{02}] {[\begin{array}{l} P_{1} & P_{12} \\ P_{21} & P_{2} \end{array}]}^{- 1}$

(5.58)

If ${\hat{x}}_{1}$ and ${\hat{x}}_{2}$ are not jointly Gaussian but the moments are known, (5.57) is the best linear unbiased estimate (BLUE) (Zhu and Li 1999, Li et al. 2003). Note that the MAP estimate or BLUE requires more information for its calculation. In addition to the common prior $\bar{x}$ and $\bar{P}$ , and the estimates ${\hat{x}}_{1}$ and ${\hat{x}}_{2}$ with error covariances P₁ and P₂, it also requires the cross-covariances $P_{12} = {P^{'}}_{21}$ between the estimates, and the cross-covariances P₀₁ and P₀₂ between the estimates and the state. If the estimates ${\hat{x}}_{1}$ and ${\hat{x}}_{2}$ are generated from measurements with independent errors, (5.57) and (5.58) reduce to the standard fusion equations of (5.19) and (5.20).

5.6.2 CROSS-COVARIANCE FUSION

The cross-covariance fusion rule (Bar-Shalom and Campo 1986) considers explicitly the cross-covariance of the local estimates to be fused. The fusion rule is given by

$\hat{x} = W_{1} {\hat{x}}_{1} + W_{2} {\hat{x}}_{2}$

(5.59)

where

$W_{i} = (P_{j} - P_{j i}) {(P_{1} + P_{2} - P_{2} - P_{21})}^{- 1}$

(5.60)

for i = 1, 2 with j = 3 − i. Since W₁ + W₂ = I, the fused estimate is unbiased if the local estimates are also unbiased. It can be shown that Equation 5.59 maximizes the classical likelihood function $p ({\hat{x}}_{1}, {\hat{x}}_{2} | x)$ with x viewed as a parameter. Thus, the cross-covariance fusion rule is also the maximum likelihood fusion rule. As shown in (Chang et al. 1997), Equation 5.59 is the unique solution of the BLUE without a priori information, i.e., the linear solution obtained without using a priori information (initial condition). This follows from the fact the MAP estimates becomes the maximum likelihood estimate when the prior covariance becomes very large.

If we ignore the cross covariance P_ij, (5.60) becomes, for i = 1, 2 with j = 3 − i,

$W_{i} = P_{j} {(P + P_{2})}^{- 1} = {(P_{1}^{- 1} + P_{2}^{- 1})}^{- 1} P_{i}^{- 1}$

(5.61)

which is the fusion rule obtained by treating the two estimates ${\hat{x}}_{1}$ and ${\hat{x}}_{2}$ as if they were two conditionally independent observations of x. This is again the convex combination rule.

Since $\det ([\begin{array}{l} P_{1} & P_{12} \\ P_{21} & P_{2} \end{array}]) = \det (P - P_{12} P_{2}^{- 1} P_{21}) \det (P_{2})$ , ignoring the cross covariance as P₁₂ = 0 increases the size of the ellipsoid defined by the joint covariance matrix $[\begin{array}{l} P_{1} & P_{12} \\ P_{21} & P_{2} \end{array}]$ . Thus, the simplified fusion rule (5.61) is obtained by inflating the joint covariance matrix.

5.7 DISTRIBUTED ESTIMATION FOR OBJECT TRACKING

In this section, we discuss how the general approach for distributed estimation can support object tracking (Liggins et al. 1997, Chong et al. 2000). Multi-object tracking involves two steps: associating measurements to form object tracks, and estimating the states of the objects given the tracks. Our discussion will focus on single object state estimation or filtering. The association problem in object tracking will be addressed in Chapter 6.

For object state estimation, the state of the object is a random process that evolves according to a dynamic model given by the transition probability p(x_k+1 | x_k), where x_k is the state of the object at time t_k. Measurements are generated from the state according to a measurement model p(z_k | x_k). The objective of object state estimation is to generate the estimate of the state, p(x_k | Z_k), given the cumulative measurements Z_k = (z₀, z₁, …, z_k). Recursive state estimation or filtering consists of two steps: predicting p(x_k | Z_k) to the time of the next measurement to obtain p(x_k+1 | Z_k) and updating with the current measurement to generate p(x_k+1 | Z_k+1). Since the prediction step uses only the object dynamic model and does not depend on measurements, distributed estimation focuses on the update step.

We assume a hierarchical fusion architecture to discuss the approach. Each low-level fusion agent i generates an updated estimate of the object state given its local measurements p(x_k | Z_ik) where Z_ik = (z_i0, z_i1, …, z_ik). The high-level fusion site or agent combines the low-level (updated) estimates to form the fused estimate p(x_k | Z_k) where Z_k = Z_1k ∩ Z_2k.

5.7.1 DETERMINISTIC DYNAMICS

An object is said to have deterministic dynamics if its future state is determined completely by the current state, i.e., the state transition probability is a delta function. Ballistic missiles and space objects are examples of objects with deterministic dynamics. It can be easily shown that conditional independence of the measurements z_ik given x_k for all i and k implies conditional independence of the cumulative measurements Z_ik given x_k for all i and k, i.e.,

$p (Z_{1 k}, Z_{2 k} | x_{k}) = p (Z_{1 k} | x_{k}) p (Z_{2 k} | x_{k})$

(5.62)

Thus the Bayesian distributed fusion equation can be used and

$p (x_{k} | Z_{1 k}, Z_{2 k}) = C^{- 1} \frac{p (x_{k} | Z_{1 k}) p (x_{k} | Z_{2 k})}{p (x_{k} | Z_{k - 1})}$

(5.63)

where

C is a normalizing constant

p(x_k | Z_k−1) is the common prior that can be extrapolated from p(x_k−1 | Z_k−1) provided by the fusion site

When the random variables are Gaussian, the fusion equations are

$P_{k | k}^{- 1} {\hat{x}}_{k | k} = P_{1, k | k}^{- 1} {\hat{x}}_{1, k | k} + P_{2, k | k}^{- 1} {\hat{x}}_{2, k | k} - P_{k | k - 1}^{- 1} {\hat{x}}_{k | k - 1}$

(5.64)

$P_{k | k}^{- 1} = P_{1, k | k}^{- 1} + P_{2, k | k}^{- 1} - P_{k | k - 1}^{- 1}$

(5.65)

where ${\hat{x}}_{i, k | l}$ and P_i,k|l are the estimate and error covariance of x_k given Z_il and ${\hat{x}}_{k | l}$ and P_k|l are the fused estimate and error covariance given x_k and Z_l.

If there is no feedback from the fusion site, the fusion equation is

$p (x_{k} | Z_{1 k}, Z_{2 k}) = C^{- 1} \frac{p (x_{k} | Z_{1 k}) p (x_{k} | Z_{2 k})}{p (x_{k} | Z_{1, k - 1}) p (x_{k} | Z_{2, k - 1})} p (x_{k} | Z_{k - 1})$

(5.66)

When the random variables are Gaussian, the fusion equations become

$P_{k | k}^{- 1} {\hat{x}}_{k | k} = P_{1, k | k}^{- 1} {\hat{x}}_{1, k | k} - P_{1, k | k - 1}^{- 1} {\hat{x}}_{1, k | k - 1} + P_{2, k | k}^{- 1} {\hat{x}}_{2, k | k} - P_{2, k | k - 1}^{- 1} {\hat{x}}_{2, k | k - 1} + P_{k | k - 1}^{- 1} {\hat{x}}_{k | k - 1}$

(5.67)

$P_{k | k}^{- 1} = P_{1, k | k}^{- 1} - P_{1, k | k - 1}^{- 1} + P_{2, k | k}^{- 1} - P_{2, k | k - 1}^{- 1} + P_{k | k - 1}^{- 1}$

(5.68)

For deterministic object dynamics, the fusion equations reconstruct the optimal centralized estimate independent of number of sensor revisits between fusion times. This is not the case for nondeterministic object dynamics.

5.7.2 NONDETERMINISTIC DYNAMICS

When the object has nondeterministic dynamics, the cumulative measurements Z_ik are no longer conditionally independent given x_k. Effectively, the common process noise or nondeterministic dynamics destroys the conditional independence. Then the fusion equations (5.63), (5.64), (5.65), (5.66), (5.67), (5.68) are no longer optimal or exact unless the low-level fusion agents communicate with the high-level agent after each observation time. For hierarchical fusion with feedback, the high-level fusion agent also has to send the fused estimate back to the local agents after each fusion.

5.7.2.1 Augmented State Vector and Approximation

Let $X_{k} = [{x^{'}}_{0}, {x^{'}}_{1}, \dots, {x^{'}}_{k}]$ be the augmented state vector consisting of the states at multiple observation times. Then the cumulative measurements Z_ik are conditionally independent given X_k, and the optimal fusion equations are (5.63), (5.64), (5.65), (5.66), (5.67), (5.68) with x_k replaced by X_k. However, this approach may not be practical because the probability density functions or covariance matrices involve high dimensions.

5.7.2.2 Using Cross-Covariance at a Single Time

For problems that can be represented by Gaussian distributions or means and covariances, the approach of Section 5.6 can be used to handle the conditional dependence due to nondeterministic dynamics. Specifically, let ${\hat{x}}_{1, k | k}$ and ${\hat{x}}_{2, k | k}$ be the estimates to be fused with error covariances P_1,k|k and P_2,k|k, cross-covariance $P_{12, k | k} = {P^{'}}_{21, k | k}$ , and common prior ${\hat{x}}_{k | k - 1}$ with covariance P_k|k−1. Then the MAP, BLUE, or cross-covariance fusion rules can be used by replacing ${\hat{x}}_{i}$ , P_i, P₁₂, $\bar{x}$ , and $\bar{P}$ with ${\hat{x}}_{i, k | k}$ , P_i,k|k, P_12,k|k, ${\hat{x}}_{k | k - 1}$ and P_k|k−1 respectively in the fusion equations. Chapter 6 has a comparison of the different fusion rules for nondeterministic dynamics.

5.8 DISTRIBUTED ESTIMATION FOR OBJECT CLASSIFICATION

The general fusion approach in Section 5.3 can be used for distributed object classification (Chong and Mori 2005) where the state of interest is a discrete and constant random variable representing the object class. When the conditional independence assumption is satisfied, optimal distributed object classification can be performed using (5.5) of Section 5.3. In general, selecting object class as the state will not satisfy the conditional independence assumption because measurements containing class information may also depend on other variables such as viewing angles. In the following, we will consider hierarchical fusion at a single time to focus on the information that should be used in fusion. More complicated communication patterns will require checking for common information and removing it, using approximate algorithms if necessary. Chapter 9 contains a more detailed discussion on distributed object classification.

5.8.1 DISTRIBUTED OBJECT CLASSIFICATION ARCHITECTURES

The common fusion architectures for object classification are centralized measurement fusion, decision fusion, and probability fusion. In centralized measurement fusion, measurements containing object class information are fused at a central site. This architecture is theoretically optimal because the central site has access to all the measurements but requires the most communication. In decision fusion, each local site performs classification using the local measurements and sends the decision to the fusion site. Decisions require very little bandwidth to communicate but may not contain enough information for generating a good decision after fusion. Thus we will focus on probability fusion, in which each local site generates a conditional probability of the object class from the local measurements, and the fusion site combines the conditional probabilities to form the centralized conditional probability.

FIGURE 5.10 Bayesian network for object classification.

The key to high-performance probability fusion is determining the state used for generating the probability. In general, the object class is not sufficient as a state for optimal fusion because the measurements may depend on other object attributes in addition to the class (Chong and Mori 2004). Consider the example in Figure 5.10 where a Bayesian network is used to show that the measurements z₁ and z₂ depend on the object class x_C through the static object attribute x_S such as size and the dynamic attribute x_D such as viewing angle. As shown in Figure 5.10, the measurements are conditionally independent given x_S and x_D, i.e.,

$p (z_{1}, z_{2} | x_{S}, x_{D}) = p (z_{1} | x_{S}, x_{D}) p (z_{2} | x_{S}, x_{D})$

(5.69)

but not conditionally independent given only x_C, i.e.,

$\begin{array}{l} p (z_{1}, z_{2} | x_{C}) & = \int p (z_{1}, z_{2}, x_{S}, x_{D} | x_{C}) d x_{S} d x_{D} \\ = \int p (z_{1}, z_{2} | x_{S}, x_{D}) p (x_{S}, x_{D} | x_{C}) d x_{S} d x_{D} \neq p (z_{1} | x_{C}) p (z_{2} | x_{C}) \end{array}$

(5.70)

Thus, for optimal distributed fusion, the state to be communicated should be x_S and x_D.

5.8.2 DISTRIBUTED CLASSIFICATION ALGORITHMS

For optimal distributed classification, the object state in the probabilities should make the measurements conditionally independent. For the example in Figure 5.10, the state consists of the static attribute x_S and the dynamic attribute x_D. Then the optimal fusion equation is

$p (x_{S}, x_{D} | z_{1}, z_{2}) = C^{- 1} \frac{p (x_{S}, x_{D} | z_{1}) p (x_{S}, x_{D} | z_{2})}{p (x_{S}, x_{D})}$

(5.71)

From this, the object class probability can be computed as

$p (x_{C} | z_{1}, z_{2}) = \int p (x_{C} | x_{S}, x_{D}) p (x_{S}, x_{D} | z_{1}, z_{2}) d x_{S} d x_{D}$

(5.72)

When only the probabilities of the object class are communicated, naïve fusion can be used to obtain an approximate solution

$p (x_{C} | z_{1}, z_{2}) = C^{- 1} \frac{p (x_{C} | z_{1}) p (x_{C} | z_{2})}{p (x_{C})}$

(5.73)

Note that in the case the information ignored is not the common prior due to communication but the states that lead to conditional independence.

5.9 SUMMARY

This chapter presented the fundamental concepts for distributed estimation, which are crucial for developing distributed fusion algorithms. We discussed various distributed fusion architectures, their advantages and disadvantages, the use of information graph to represent information flow, and selection of an appropriate architecture. We presented the Bayesian fusion equation for combining two probability functions, and the equation for estimates given by means and covariances. The Bayesian fusion equation, when used with the information graph, can be used to derive fusion equations for various fusion architectures. Since the fusion equation can be complicated, requiring pedigree or network information for complicated architectures, it is necessary to approximate the optimal algorithm with suboptimal algorithms for implementation in real systems. When the estimates to be fused are Gaussian or can be characterized by means and covariances, there are several linear combination rules such as MAP, BLUE, and cross-covariance fusion. We also showed that the distributed estimation approach can be used for object tracking and object classification.

5.10 BIBLIOGRAPHIC NOTES

Research in distributed estimation started around 1980 and addresses the problem of reconstructing the optimal estimate from the local estimates (Chong 1979, Speyer 1979, Willsky et al. 1982, Castanon and Teneketzis 1985). A general distributed estimation approach (Chong et al. 1982, 1983, 1985, 1987) for arbitrary architectures was investigated under the Distributed Sensor Networks (DSN) program sponsored by the Defense Advanced Research Projects Agency (DARPA). By using the information graph to track information flow in the system, the optimal fusion algorithm avoids double counting of information or data incest. The DSN program also developed general distributed tracking algorithms (Chong et al. 1986, 1990). Around 1990, researchers in the United Kingdom and Australia developed similar decentralized fusion algorithms (Durrant-Whyte et al. 1990, Grime and Durrant-Whyte 1994) that avoid data incest, and CI algorithms (Nicholson et al. 2001, 2002) to address unknown correlation between local estimates to be fused. Bar-Shalom and Campo (1986) developed the first fusion algorithm that uses the cross-covariance between the local estimates. This paper was followed by the BLUE fusion algorithm (Zhu and Li 1999, Li et al. 2003) and the MAP fusion rule (Mori et al. 2002, Chang et al. 2004). The last two papers also contain performance evaluation of fusion algorithms, along with Chong and Mori (2001) and Chang et al. (2010).

REFERENCES

Anderson, B. D. O. and J. B. Moore. 1979. Optimal Filtering. Englewood Cliffs, NJ: Prentice-Hall.

Bar-Shalom, Y. and L. Campo. 1986. The effects of the common process noise on the two-sensor fused-track covariance. IEEE Transactions on Aerospace and Electronic Systems, 22: 803–805.

Bourgault, F. and H. F. Durrant-Whyte. 2004. Communication in general decentralized filter and the coordinated search strategy. In Proceedings of the 7th International Conference on Information Fusion, Stockholm, Sweden.

Castanon, D. and D. Teneketzis. 1985. Distributed estimation algorithms for nonlinear systems. IEEE Transactions on Automatic Control, 30: 418–425.

Chang, K. C., C. Y. Chong, and S. Mori. 2010. Analytical and computational evaluation of scalable fusion algorithms. IEEE Transactions on Aerospace and Electronic Systems, 46: 2022–2034.

Chang, K. C., R. K. Saha, and Y. Bar-Shalom. 1997. On optimal track-to-track fusion. IEEE Transactions on Aerospace and Electronic Systems, 33: 1271–1276.

Chang, K. C., T. Zhi, S. Mori, and C. Y. Chong. 2004. Performance evaluation for MAP state estimate fusion. IEEE Transactions on Aerospace and Electronic Systems, 40: 706–714.

Chong, C. Y. 1979. Hierarchical estimation. In Proceedings of MIT/ONR Workshop on C3, Monterey, CA.

Chong, C. Y. 1998. Distributed architectures for data fusion. In Proceedings of the 1st International Conference on Multisource Multisensor Information Fusion, Las Vegas, NV.

Chong, C. Y., K. C. Chang, and S. Mori. 1986. Distributed tracking in distributed sensor networks. In Proceedings of 1986 American Control Conference, Seattle, WA.

Chong, C. Y. and S. Mori. 2001. Convex combination and covariance intersection algorithms in distributed fusion. In Proceedings of the 4th International Conference on Information Fusion, Montréal, Québec, Canada.

Chong, C. Y. and S. Mori. 2004. Graphical models for nonlinear distributed estimation. In Proceedings of the 7th International Conference on Information Fusion, Stockholm, Sweden.

Chong, C. Y. and S. Mori. 2005. Distributed fusion and communication management for target identification. In Proceedings of the 8th International Conference on Information Fusion, Philadelphia, PA.

Chong, C. Y., S. Mori, and K. C. Chang. 1985. Information fusion in distributed sensor networks. In Proceedings of 1985 American Control Conference, Boston, MA.

Chong, C. Y., S. Mori, and K. C. Chang. 1987. Adaptive distributed estimation. In Proceedings of 26th IEEE Conference on Decision and Control, Los Angeles, CA.

Chong, C. Y., S. Mori, and K. C. Chang. 1990. Distributed multitarget multisensor tracking. In Multitarget Multi-Sensor Tracking: Advanced Applications, ed. Y. Bar-Shalom, pp. 247–295. Norwood, MA: Artech House.

Chong, C. Y., S. Mori, K. C. Chang, and W. H. Barker. 2000. Architectures and algorithms for track association and fusion. IEEE Aerospace and Electronic Systems Magazine, 15: 5–13.

Chong, C. Y., S. Mori, E. Tse, and R. P. Wishner. 1982. Distributed estimation in distributed sensor networks. In Proceedings of 1982 American Control Conference, Arlington, VA.

Chong, C. Y., E. Tse, and S. Mori. 1983. Distributed estimation in networks. In Proceedings of 1983 American Control Conference, San Francisco, CA.

Cover, T. M. and J. A. Thomas. 1991. Elements of Information Theory. New York: Wiley.

Durrant-Whyte, H. F., B. S. Y. Rao, and H. Hu. 1990. Toward a fully decentralized architecture for multi-sensor data fusion. In Proceedings of 1990 IEEE International Conference on Robotics and Automation, Cincinnati, OH.

Grime, S. and H. Durrant-Whyte (eds.). 1994. Communication in decentralized systems. In IFAC Control Engineering Practice. Oxford, U.K.: Pergamon Press.

Hurley, M. 2002. An information-theoretic justification for covariance intersection and its generalization. In Proceedings of the 5th International Conference on Information Fusion, Annapolis, MD.

Julier, S. J. 2006. An empirical study into the use of Chernoff information for robust, distributed fusion of Gaussian mixture models. In Proceedings of the 9th International Conference on Information Fusion, Florence, Italy.

Julier, S. J., J. K. Uhlmann, J. Walters, R. Mittu, and K. Palaniappan. 2006. The challenge of scalable and distributed fusion of disparate sources of information. In Proceedings of SPIE Conference on Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications, Vol. 6242, Kissimmee, FL.

Li, X., R. Y. Zhu, J. Wang, and C. Han. 2003. Optimal linear estimation fusion—Part I: Unified fusion rules. IEEE Transactions on Information Theory, 49: 2192–2208.

Liggins, M. E. and K. C. Chang. 2009. Distributed fusion architectures, algorithms, and performance within a network-centric architecture. In Handbook of Multisensor Data Fusion: Theory and Practice, eds. M. E. Liggins, D. H. Hall, and J. Llinas. Boca Raton, FL: CRC Press.

Liggins, II, M. E., C. Y. Chong, I. Kadar, M. G. Alford, V. Vannicola, and S. Thomopoulos. 1997. Distributed fusion architectures and algorithms for target tracking. Proceedings of IEEE, 85: 95–107.

McLaughlin, S. P., R. J. Evans, and V. Krishnamurthy. 2005. A graph theoretic approach to data incest management in network centric warfare. In Proceedings of the 8th International Conference on Information Fusion, Philadelphia, PA.

McLaughlin, S. P., V. Krishnamurthy, and R. J. Evans. 2004. Bayesian network model for data incest in a distributed sensor network. In Proceedings of the 7th International Conference on Information Fusion, Stockholm, Sweden.

Mori, S., W. H. Barker, C. Y. Chong, and K. C. Chang. 2002. Track association and track fusion with non-deterministic target dynamics. IEEE Transactions on Aerospace and Electronic Systems, 38: 659–668.

Nicholson, D., S. J. Julier, and J. K. Uhlmann. 2001. DDF: An evaluation of covariance intersection. In Proceedings of the 4th International Conference on Information Fusion, Montréal, Québec, Canada.

Nicholson, D., C. M. Lloyd, S. J. Julier, and J. K. Uhlmann. 2002. Scalable distributed data fusion. In Proceedings of the 5th International Conference on Information Fusion, Annapolis, MD.

Olfati-Saber, R. 2005. Distributed Kalman filter with embedded consensus filters. In Proceedings of the 44th IEEE Conference on Decision and Control, Seville, Spain.

Speyer, J. L. 1979. Computation and transmission requirements for a decentralized linear-quadratic-Gaussian control problem. IEEE Transactions on Automatic Control, 24: 266–269.

Teneketzis, D. and P. Varaiya. 1988. Consensus in distributed estimation. In Advances in Statistical Signal Processing, ed. H. V. Poor, pp. 361–386. Greenwich, CT: JAI Press.

Willsky, A., M. Bello, D. Castanon, B. Levy, and G. Verghese. 1982. Combining and updating of local estimates along sets of one-dimensional tracks. IEEE Transactions on Automatic Control, 27: 799–813.

Zhu, Y. and X. R. Li. 1999. Best linear unbiased estimation fusion. In Proceedings of the 2nd International Conference on Information Fusion, Sunnyvale, CA.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5 Fundamentals of Distributed Estimation

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 5 Fundamentals of Distributed Estimation