The continuing trend toward connected sensors (“internet of things” and” ubiquitous computing”) drives a demand for powerful distributed estimation methodologies. In tracking applications, the distributed Kalman filter (DKF) provides an optimal solution under Kalman filter conditions. The optimal solution in terms of the estimation accuracy is also achieved by a centralized fusion algorithm, which receives all associated measurements. However, the centralized approach requires full communication of all measurements at each time step, whereas the DKF works at arbitrary communication rates since the calculation is fully distributed. A more recent methodology is based on ”accumulated state density” (ASD), which augments the states from multiple time instants to overcome spatial cross-correlations. This chapter explains the challenges in distributed tracking. Then, possible solutions are derived, which include the DKF and ASD approach.
- distributed Kalman filter
- target tracking
- multisensor fusion
- information fusion
- accumulated state density
Modern tracking and surveillance system development is increasingly driving technological trends and algorithm developments toward networks of dislocated sensors. Besides classical target tracking, many other applications can be found, for instance, in robotics, manufacturing, health care, and financial economics. A multisensor network can exploit spatial diversity to compensate for the weak attributes of a single sensor such as limited field of view or high measurement noise. Also, heterogeneous sensors can reveal a more complete situational awareness and a more precise estimate of the underlying phenomena. Additionally, a sensor network is more robust against a single point of failure in comparison to a standalone system, if its architecture is chosen carefully.
Despite its unquestioned advantages, a multisensor network must cope with design-specific challenges, for instance, a full transmission of all the measurements to a fusion center can be hindered, when the communication links are unreliable or constrained in bandwidth or costs. A well-known approach to cope with the limited bandwidth data links is to apply data processing on the sensor sites to generate local track parameters that are transmitted to the fusion center. The latter then reconstructs the global track parameters by an application of a Track-to-Track Fusion (T2TF) scheme. Depending on the scenario constraints, this is a nontrivial task, since the local tracks are mutually correlated due to the common process noise. The first known solution in the literature to the T2TF problem proposed to apply an information filter-based multisensor fusion algorithm in , which later became famous as the “tracklet fusion.” However, the tracklet fusion also requires a transmission from each sensor after each time step in order to reconstruct the optimal solution.
This chapter presents the theory and the derivation of the distributed Kalman filter (DKF), which is an optimal solution of the T2TF problem under Kalman filter assumptions with respect to the mean squared error (MSE). Assuming Kalman conditions means that linear Gaussian models are provided for the motion model and all measurement models of the sensors. Moreover, it is assumed that measurement-to-track (at the sensors) and track-to-track (at the fusion center) association has been solved. The DKF approach requires, however, all remote sensor models to be known at each local sensor site, which is infeasible in most practical scenarios. Therefore, this chapter also presents a solution based on the accumulated state density (ASD), which is closely related to the DKF but does not require the measurement models to be known. Surveys that reflect the history of research in distributed estimation can be found, for instance, in [2, 3].
This chapter is structured as follows: Section 2 summarizes the problem formulation. A basic approach to T2TF is given in Section 3, where we present the least squares solution. Section 4 presents a simple fusion methodology, which is easy to compute and provides practical results for various applications. The reason why this approach is suboptimal is investigated in Section 5 by means of a recursive computation of the cross-covariances of the local tracks. In Section 6, the product representation of Gaussian probability densities is introduced, which is the basis for the derivation of the distributed Kalman filter in Section 7. An alternative derivation in terms of information parameters is provided in Section 8. Since the local measurement models are often unknown in practical applications, the distributed accumulated state density filter is introduced in Section 9. The chapter concludes with Section 10.
2. Problem formulation
Throughout this chapter, it is assumed that all S sensors produce their measurements at each time step of the same target with its true state in a synchronized way. The extension to the unsynchronized case is straightforward by means of standard Kalman filter predictions, and is therefore omitted for the sake of notational simplicity. The measurement equation for sensor is given by
where is the Gaussian distributed, zero-mean random variable, which models the noise of the sensing process. Therefore, the likelihood for a single measurement is fully described by the Gaussian
Since the measurement processes across all sensors are mutually independent, the joint likelihood of all sensor data produced at time factorizes:
The true state of the target itself is modeled as a time-variant stochastic process, where the transition from time to time is given by the following motion equation:
where is the Gaussian distributed, zero-mean random variable to model the process noise of the system. Analogously to the likelihood, this provides the probability density function for the transition model:
Based on the local processors, each sensor node produces a track at time in terms of an estimate and a corresponding estimation error covariance . In a T2TF scheme, these parameters are the only information, which is transmitted to a fusion center (FC). It should be noted that the FC may also not be centralized, distinguished instance in the architecture, but each and every processing node can act as a FC. The introduction of a distinguished FC is only for clarification of different computation layers. An excellent overview of pros and cons of various data fusion layers can be found in .
The T2TF problem can now be stated as follows: compute a fused estimate of the state and a consistent error covariance by means of the local tracks and knowledge on their models:
3. Least squares estimate
In order to compute an estimate as a well-suited combination of the local tracks, it is useful to consider the joint likelihood function given by the following Gaussian:
where are the track covariances on the block-diagonal entries and are the cross-covariances on the off-diagonal entries of the joint error covariance.
Since the joint likelihood from above is proportional to an exponential function:
the maximum likelihood (ML) estimate can be computed in terms of the least squares:
where , and for the joint error covariance have been introduced for notational simplicity. A closed form solution of the ML estimates can be obtained by setting the gradient with respect to the state to zero:
Therefore, the ML estimate is given by:
For information fusion applications, it is also important to have a consistent estimate of the squared error, in other words, we need to compute the corresponding error covariance:
The last equation holds due to the fact that is the joint covariance. Concluding the derivations from above equation, one can obtain:
4. Naïve fusion
It is obvious that for the ML estimate, it is assumed that the cross-covariances are known. Since this might not be given in practical scenarios, a simple approximation is to assume them to be zero. This approach is called Naïve fusion. It implies that the joint error covariance is given in block-diagonal form:
As a direct consequence of the matrix inversion lemma (see Appendix 12.1), the inverse can be obtained in closed form solution:
Filling into the maximum likelihood formulas directly yields.
Thus, by means of a simple approximation of the ML estimate, we have obtained a first practical fusion rule for the FC, which we call convex combination due to its structure for further usage. The fusion scheme as a whole can be outlined schematically as in the flowing Figure 1. Each sensor node processes its produced sensor data by means of a local filter, which results in a track in terms of an estimate together with an error covariance. These parameters are transmitted to the fusion center, which applies the convex combination to compute the fused result.
5. What makes the Naïve fusion naïve?
For the Naïve fusion, we have assumed that the cross-covariances vanish. It is worth to be aware of the structure of the cross-covariances to see the conditions whether this holds or does not hold. This can be achieved by a recursive computation of the posterior cross-covariance of two sensors with indices and , which process their data by means of local Kalman filters. At the beginning of the estimation process at tracks are not yet correlated, that is, , due to the fact that initial measurements are mutually uncorrelated. A recursive computation can be achieved by a prediction-filtering cycle.
5.1. Cross-covariance prediction
For the prediction step, it is assumed that a previous posterior cross-covariance has been computed. The prior parameters are obtained by means of the motion model:
where the last equality holds due to the fact that the estimation errors at time of both sensor processors are uncorrelated to the process noise .
5.2. Cross-covariance filtering
In the filtering step, both sensors compute their posterior parameters based on the produced measurements and respectively. It is assumed that the local processors have applied local Kalman filter update steps with Kalman gains and . The cross-covariance posterior matrix can be obtained by a straightforward computation:
For these equations, we have used the fact that the prior estimate error is independent of the measurement noise , and that and are mutually independent.
Concluding the calculations from this section, we have obtained a recursive solution for the cross-covariances:
One can see that the cross-covariances are zero, if and only if the process noise covariance vanishes. In other words, if the tracks refer to a deterministically moving target and all sensors do local Kalman filtering, then the convex combination equations yield the optimal fusion result. If this is not the case, as maybe in most practical applications, then the Naïve fusion method is an approximation and its degree of approximation depends primarily on the level of the process noise.
6. Gaussian product representation
The basic concept of the distributed Kalman filter is to make the local parameters stochastically independent, even if process noise is present. This is achieved by a product representation, which directly follows from the fact that.
Thus, the Gaussian product representation is equivalent to uncorrelated track parameters for each processing node. It should be noted that the product representation is not normalized, that is, the integral for is not unity. This, however, is not of practical relevance since the fused estimate density is a Gaussian and the parameters of which are provided by the convex combination.
7. Derivation of the distributed Kalman filter
For the DKF, we are going to modify the local processing scheme for each sensor in order to have the product representation hold at each instant of time. Then, when the fusion center receives the parameters from all sensors, the convex combination can be applied to compute the optimal global estimate. Note that the convex combination does not consider a local prior of the fusion center; therefore, the result will be independent from previous transmissions. This can be of great benefit, if communication channels with unreliable links have to be considered, since the full information on the target state is distributed in the sensor network. However, for completeness, it should also be noted that the modified local parameters are not optimal anymore in a local sense. One could say that local optimality is given up for the sake of global optimality .
In the following sections, the derivation of a prediction-filtering recursion of the DKF is discussed.
7.1. DKF prediction
For the prediction, it is assumed that the previous posterior is given in product representation:
For notational simplicity, we have conditioned the posterior on the full data set , where are from all sensors and all time steps up to time , of which the local tracks are sufficient statistics. At the initialization phase, that is, , the product representation is trivial, since the initial estimates can be based on first measurements, which are mutually independent.
To derive a closed form solution for the prediction of product representation, it is required to globalize the covariances of the local processing nodes at first. This process changes the local parameters such that the same previous posterior density is factorized; however, the local covariances will be unified to a single . Rigorously speaking, this matrix does not represent a meaningful covariance in the sense of an expected estimation error squared anymore. Still, the fused result will be optimal since the global density is not changed during this process. Thus, if we set
then the same fused density will be obtained, which is easily verified by means of the convex combination. It should be noted that the remote error covariances are required to compute the globalized covariance matrix . Since Kalman filter conditions are assumed, they can be computed without data transmission, as they do not depend on the local sensor measurements. Therefore, it is sufficient to be aware of the remote sensor models.
The prediction formulas can now be obtained by a marginalization of the joint density of the current and the last time step:
The last equality holds due to the Markov property of the system. Filling in our linear Gaussian transition model and the previous posterior yields
By means of a simple algebraic manipulation, it is possible to factorize the transition kernel Gaussian up to proportionality:
Thus, we can factorize the integration term of the global prior completely:
An application of the product formula (Section 12.2 in the appendix) yields:
At this point, we have derived factorized prediction formulas for the DKF prediction, and to our knowledge, the remaining integral is part of the normalization constant. This, however, is not trivial, since the parameter depends on . A successive application of the product formula yields the desired result:
for some auxiliary variables and . All factors, which are independent of have been omitted. This proves that the integral is independent of the state variable . Concluding the derivations from above, we have derived the prediction formulas of the local estimation parameters as:
7.2. DKF filtering
Let denotes the set of measurements produced by all sensors at time . The posterior density can be inferred by using the Bayes theorem:
Due to the mutual independence of the measurement noises, the joint likelihood function is given by:
This is particularly useful for the structure of the product representation used for the DKF. Filling in the linear Gaussian models and neglecting the normalization constant in the denominator directly yields:
Thus, the product formula again can be applied to compute the posterior parameters:
Again, we have omitted the factors, which are independent of .
8. Information filter formulation of the DKF
In , an elegant derivation of the DKF formulas was published based on the information filter (IF). The IF uses information matrices, which are inverted covariances, and information states, which are information matrices multiplied with states. The optimal, centralized update formulas for sensors based on the predicted information parameters and are given by:
where and are the local information contribution from the current measurements at time , which were received by the FC. If we want to distribute the computation to nodes, we will have them uncorrelated as in the DKF previously. Since the fused estimate is obtained via the convex combination, the local information parameters are summed up:
This summation structure can be used to provide a closed prediction-filtering cycle.
8.1. Information DKF prediction
The prediction of the state is easier than a direct transition of the information parameters. Based on the fused estimate, one can obtain.
Thus, we have given the local predicted state parameters as:
Analogously, one obtains for the prior covariance:
Thus, if we set , then the convex combination yields the exact global fused covariance.
8.2. Information DKF filtering
For the filtering, it is assumed that each sensor has computed its local information contribution parameter and from its own sensor model and, in addition, the information matrix contributions from all remote sensors by using the individual sensor models which are again assumed to be known. Then, the information state is updated via.
As a direct consequence, the updated parameters of the local processors follow the standard IF filtering equations:
For the globalized information matrix, the remote information parameters from the sensor models are used:
It is important to note that the local processing nodes compute both the local pseudo information matrix and the global fused error covariance , which is required for the prediction to the next time step.
9. Distributed accumulated state density filter
The DKF from Sections 7 and 8 can be considered a big step toward distributed state estimation, tracking, and information inference. However, in practical applications, the exact solution is often hindered by the fact that the exact remote sensor model parameters are unknown and can only be approximated based on local state estimates. The good news is that there is another exact solution based on the accumulated state density (ASD). The distributed ASD equations turn the spatial correlations into temporal correlations of successive states. Originally, the ASD equations were introduced to solve the out-of-sequence problem, which handles delayed transmissions of measurements into an ongoing fusion process in an optimal manner. Therefore, the temporal correlations can well be coped with the ASD approach.
At first, let us introduce the ASD state as
where refers to the current time of the filtering process and refers to the initialization time. The ASD approach now considers the conditional joint density , that is, the posterior of the full trajectory of the target between and . In particular, the individual state densities of a single instant of time can be obtained via marginalization. Also, it should be noted that the Rauch-Tung-Striebel (RTS) smoothing equations are inherently integrated in the ASD posterior, since all states are conditioned on the full set of measurements up to time .
A recursive computation of the ASD posterior can be achieved by using the Bayes theorem:
Since the measurements conditioned on the whole trajectory only depend on the state at time , the joint likelihood function is given by:
The second factor can be reformulated as follows:
where we have used the Markov property of the system in the last equation. This recursive representation can now be repeated on the term . A successive application of this procedure yields
where we have neglected the normalization constant in the denominator. Filling in our Gaussian models and using the factorization of the transition model from above equation yields
Since the initial density usually is based on a first measurement, we can assume that it factorizes into independent local track starts:
When the posterior is fully factorized in the number of sensors and in the time steps, each processing node can compute the resulting ASD Gaussian with mean and covariance matrix :
where the parameters are given by:
We have used a short notation such that are the smoothed state estimates and are the covariances, respectively, which result from the Rauch-Tung-Striebel equations. Also, the combined retrodiction gain matrices are known from the RTS smoother:
Thus, when the FC receives the local ASD parameters, the optimal fused estimate can be obtained via the convex combination:
For a continuous state estimation process, it is convenient to formulate the distributed ASD solution in terms of a prediction-filtering cycle.
9.1. Distributed ASD prediction
For the prediction step, it is assumed that the local processing node has computed the previous filtering parameters and , which refer to time . Then, the prior parameters are given by:
9.2. Distributed ASD filtering
Since the prior is factorized in form of a product representation and the current measurements from time are mutually uncorrelated, local Kalman filters can be applied to obtain the posterior parameters:
In this chapter, we have introduced the least squares solution to the track-to-track fusion problem, where cross-covariances of the track estimation errors are required. Neglecting the cross-covariances has led us to the Naïve fusion, a simple but powerful fusion algorithm for practical applications. By recursive computation of the cross-covariances, we have seen that they primarily depend on the process noise of the state transition kernel. Since a centralized computation of the cross-covariances is infeasible in practical applications, more sophisticated solutions are required for optimal fusion results. The distributed Kalman filter, which uses the product representation to keep the local parameters decorrelated achieved this. However, this approach only works, if the local processors know all measurement models at each time step. Then, the distributed accumulated state density filter uses the temporal correlations to factorize the global posterior density. This approach does not require remote sensor models and is therefore, well suited for extensions with measurement ambiguity or nonlinear measurement functions.
In the study, one can find more extensions based on the distributed Kalman filter to overcome the lack of knowledge on the remote sensor models. In  and the references therein, a debiasing matrix is introduced to compensate for globally biased gain matrices of the local filters. An application of the tracklet fusion based on the distributed accumulated state density filter can be found in . Then, in , the information filter formulation of the distributed Kalman filter also was extended to scenarios with input information on the transition process.
A.1. Matrix inversion lemma
Let , and be matrices of a block matrix such that and are invertable, and also such that the Schur-complements and have full rank. Then, the inversion of the block matrix is given by:
In particular, it holds that
Proof. Let the inverted block matrix be given by submatrices and . By definition, it holds that
where and are the identity and zero matrix, respectively. A matrix multiplication of the first and second equality yields two blocks of equations:
which we call block A and
which we call block B. Resolving block A for and yields the first version of the inverted block matrix, whereas resolving block B yields the second version.
A.2. Product formula for Gaussian densities
For two Gaussian distributed random variables and , it holds that
A proof can be found, for instance, in .