## 1. Introduction

Modern tracking and surveillance system development is increasingly driving technological trends and algorithm developments toward networks of dislocated sensors. Besides classical target tracking, many other applications can be found, for instance, in robotics, manufacturing, health care, and financial economics. A multisensor network can exploit spatial diversity to compensate for the weak attributes of a single sensor such as limited field of view or high measurement noise. Also, heterogeneous sensors can reveal a more complete situational awareness and a more precise estimate of the underlying phenomena. Additionally, a sensor network is more robust against a single point of failure in comparison to a standalone system, if its architecture is chosen carefully.

Despite its unquestioned advantages, a multisensor network must cope with design-specific challenges, for instance, a full transmission of all the measurements to a fusion center can be hindered, when the communication links are unreliable or constrained in bandwidth or costs. A well-known approach to cope with the limited bandwidth data links is to apply data processing on the sensor sites to generate local track parameters that are transmitted to the fusion center. The latter then reconstructs the global track parameters by an application of a *Track-to-Track Fusion* (T2TF) scheme. Depending on the scenario constraints, this is a nontrivial task, since the local tracks are mutually correlated due to the common process noise. The first known solution in the literature to the T2TF problem proposed to apply an information filter-based multisensor fusion algorithm in [1], which later became famous as the “tracklet fusion.” However, the tracklet fusion also requires a transmission from each sensor after each time step in order to reconstruct the optimal solution.

This chapter presents the theory and the derivation of the *distributed Kalman filter* (DKF), which is an optimal solution of the T2TF problem under Kalman filter assumptions with respect to the *mean squared error* (MSE). Assuming Kalman conditions means that linear Gaussian models are provided for the motion model and all measurement models of the sensors. Moreover, it is assumed that measurement-to-track (at the sensors) and track-to-track (at the fusion center) association has been solved. The DKF approach requires, however, all remote sensor models to be known at each local sensor site, which is infeasible in most practical scenarios. Therefore, this chapter also presents a solution based on the *accumulated state density* (ASD), which is closely related to the DKF but does not require the measurement models to be known. Surveys that reflect the history of research in distributed estimation can be found, for instance, in [2, 3].

This chapter is structured as follows: Section 2 summarizes the problem formulation. A basic approach to T2TF is given in Section 3, where we present the least squares solution. Section 4 presents a simple fusion methodology, which is easy to compute and provides practical results for various applications. The reason why this approach is suboptimal is investigated in Section 5 by means of a recursive computation of the cross-covariances of the local tracks. In Section 6, the product representation of Gaussian probability densities is introduced, which is the basis for the derivation of the distributed Kalman filter in Section 7. An alternative derivation in terms of information parameters is provided in Section 8. Since the local measurement models are often unknown in practical applications, the distributed accumulated state density filter is introduced in Section 9. The chapter concludes with Section 10.

## 2. Problem formulation

Throughout this chapter, it is assumed that all *S* sensors produce their measurements

where

Since the measurement processes across all sensors

The true state of the target itself is modeled as a time-variant stochastic process, where the transition from time

where *transition model*:

Based on the local processors, each sensor node produces a *track* at time *fusion center* (FC). It should be noted that the FC may also not be centralized, distinguished instance in the architecture, but each and every processing node can act as a FC. The introduction of a distinguished FC is only for clarification of different computation layers. An excellent overview of pros and cons of various data fusion layers can be found in [4].

The T2TF problem can now be stated as follows: compute a fused estimate

## 3. Least squares estimate

In order to compute an estimate as a well-suited combination of the local tracks, it is useful to consider the joint likelihood function given by the following Gaussian:

where

Since the joint likelihood from above is proportional to an exponential function:

the *maximum likelihood* (ML) estimate can be computed in terms of the least squares:

where

Therefore, the ML estimate is given by:

For information fusion applications, it is also important to have a consistent estimate of the squared error, in other words, we need to compute the corresponding error covariance:

(12) |

The last equation holds due to the fact that

## 4. Naïve fusion

It is obvious that for the ML estimate, it is assumed that the cross-covariances *Naïve fusion*. It implies that the joint error covariance is given in block-diagonal form:

As a direct consequence of the *matrix inversion lemma* (see Appendix 12.1), the inverse can be obtained in closed form solution:

Filling into the maximum likelihood formulas directly yields.

Thus, by means of a simple approximation of the ML estimate, we have obtained a first practical fusion rule for the FC, which we call *convex combination* due to its structure for further usage. The fusion scheme as a whole can be outlined schematically as in the flowing **Figure 1**. Each sensor node

## 5. What makes the Naïve fusion naïve?

For the Naïve fusion, we have assumed that the cross-covariances vanish. It is worth to be aware of the structure of the cross-covariances to see the conditions whether this holds or does not hold. This can be achieved by a recursive computation of the posterior cross-covariance *prediction-filtering* cycle.

### 5.1. Cross-covariance prediction

For the prediction step, it is assumed that a previous posterior cross-covariance

(17) |

where the last equality holds due to the fact that the estimation errors at time

### 5.2. Cross-covariance filtering

In the filtering step, both sensors compute their posterior parameters based on the produced measurements

(18) |

For these equations, we have used the fact that the prior estimate error

Concluding the calculations from this section, we have obtained a recursive solution for the cross-covariances:

One can see that the cross-covariances are zero, if and only if the process noise covariance

## 6. Gaussian product representation

The basic concept of the distributed Kalman filter is to make the local parameters stochastically independent, even if process noise is present. This is achieved by a *product representation*, which directly follows from the fact that.

(20) |

Thus, the Gaussian product representation is equivalent to uncorrelated track parameters for each processing node. It should be noted that the product representation is not normalized, that is, the integral for

## 7. Derivation of the distributed Kalman filter

For the DKF, we are going to modify the local processing scheme for each sensor in order to have the product representation hold at each instant of time. Then, when the fusion center receives the parameters from all sensors, the convex combination can be applied to compute the optimal global estimate. Note that the convex combination does not consider a local prior of the fusion center; therefore, the result will be independent from previous transmissions. This can be of great benefit, if communication channels with unreliable links have to be considered, since the full information on the target state is distributed in the sensor network. However, for completeness, it should also be noted that the modified local parameters are not optimal anymore in a local sense. One could say that local optimality is given up for the sake of global optimality [5].

In the following sections, the derivation of a prediction-filtering recursion of the DKF is discussed.

### 7.1. DKF prediction

For the prediction, it is assumed that the previous posterior is given in product representation:

For notational simplicity, we have conditioned the posterior on the full data set

To derive a closed form solution for the prediction of product representation, it is required to *globalize* the covariances

where

then the same fused density will be obtained, which is easily verified by means of the convex combination. It should be noted that the remote error covariances

The prediction formulas can now be obtained by a marginalization of the joint density of the current and the last time step:

The last equality holds due to the Markov property of the system. Filling in our linear Gaussian transition model and the previous posterior yields

By means of a simple algebraic manipulation, it is possible to factorize the transition kernel Gaussian up to proportionality:

(26) |

Thus, we can factorize the integration term of the global prior completely:

An application of the product formula (Section 12.2 in the appendix) yields:

where

(29) |

At this point, we have derived factorized prediction formulas for the DKF prediction, and to our knowledge, the remaining integral is part of the normalization constant. This, however, is not trivial, since the parameter

for some auxiliary variables

### 7.2. DKF filtering

Let

Due to the mutual independence of the measurement noises, the joint likelihood function is given by:

This is particularly useful for the structure of the product representation used for the DKF. Filling in the linear Gaussian models and neglecting the normalization constant in the denominator directly yields:

Thus, the product formula again can be applied to compute the posterior parameters:

where

(35) |

Again, we have omitted the factors, which are independent of

## 8. Information filter formulation of the DKF

In [6], an elegant derivation of the DKF formulas was published based on the *information filter* (IF). The IF uses information matrices, which are inverted covariances, and information states, which are information matrices multiplied with states. The optimal, centralized update formulas for

where

This summation structure can be used to provide a closed prediction-filtering cycle.

### 8.1. Information DKF prediction

The prediction of the state is easier than a direct transition of the information parameters. Based on the fused estimate, one can obtain.

(38) |

Thus, we have given the local predicted state parameters as:

Analogously, one obtains for the prior covariance:

Thus, if we set

### 8.2. Information DKF filtering

For the filtering, it is assumed that each sensor has computed its local information contribution parameter

As a direct consequence, the updated parameters of the local processors follow the standard IF filtering equations:

For the globalized information matrix, the remote information parameters from the sensor models are used:

It is important to note that the local processing nodes compute both the local pseudo information matrix

## 9. Distributed accumulated state density filter

The DKF from Sections 7 and 8 can be considered a big step toward distributed state estimation, tracking, and information inference. However, in practical applications, the exact solution is often hindered by the fact that the exact remote sensor model parameters are unknown and can only be approximated based on local state estimates. The good news is that there is another exact solution based on the *accumulated state density* (ASD). The distributed ASD equations turn the spatial correlations into temporal correlations of successive states. Originally, the ASD equations were introduced to solve the out-of-sequence problem, which handles delayed transmissions of measurements into an ongoing fusion process in an optimal manner. Therefore, the temporal correlations can well be coped with the ASD approach.

At first, let us introduce the ASD state

where *Rauch-Tung-Striebel* (RTS) smoothing equations are inherently integrated in the ASD posterior, since all states are conditioned on the full set of measurements up to time

A recursive computation of the ASD posterior can be achieved by using the Bayes theorem:

Since the measurements conditioned on the whole trajectory only depend on the state at time

The second factor can be reformulated as follows:

where we have used the Markov property of the system in the last equation. This recursive representation can now be repeated on the term

where we have neglected the normalization constant in the denominator. Filling in our Gaussian models and using the factorization of the transition model from above equation yields

Since the initial density usually is based on a first measurement, we can assume that it factorizes into independent local track starts:

When the posterior is fully factorized in the number of sensors and in the time steps, each processing node can compute the resulting ASD Gaussian with mean

where the parameters are given by:

We have used a short notation such that

Thus, when the FC receives the local ASD parameters, the optimal fused estimate can be obtained via the convex combination:

For a continuous state estimation process, it is convenient to formulate the distributed ASD solution in terms of a prediction-filtering cycle.

### 9.1. Distributed ASD prediction

For the prediction step, it is assumed that the local processing node

### 9.2. Distributed ASD filtering

Since the prior is factorized in form of a product representation and the current measurements from time

## 10. Conclusion

In this chapter, we have introduced the least squares solution to the track-to-track fusion problem, where cross-covariances of the track estimation errors are required. Neglecting the cross-covariances has led us to the Naïve fusion, a simple but powerful fusion algorithm for practical applications. By recursive computation of the cross-covariances, we have seen that they primarily depend on the process noise of the state transition kernel. Since a centralized computation of the cross-covariances is infeasible in practical applications, more sophisticated solutions are required for optimal fusion results. The distributed Kalman filter, which uses the product representation to keep the local parameters decorrelated achieved this. However, this approach only works, if the local processors know all measurement models at each time step. Then, the distributed accumulated state density filter uses the temporal correlations to factorize the global posterior density. This approach does not require remote sensor models and is therefore, well suited for extensions with measurement ambiguity or nonlinear measurement functions.

In the study, one can find more extensions based on the distributed Kalman filter to overcome the lack of knowledge on the remote sensor models. In [8] and the references therein, a debiasing matrix is introduced to compensate for globally biased gain matrices of the local filters. An application of the tracklet fusion based on the distributed accumulated state density filter can be found in [9]. Then, in [10], the information filter formulation of the distributed Kalman filter also was extended to scenarios with input information on the transition process.