Open access peer-reviewed chapter

Distributed Kalman Filter

Written By

Felix Govaers

Submitted: 20 April 2017 Reviewed: 25 October 2017 Published: 20 December 2017

DOI: 10.5772/intechopen.71941

From the Edited Volume

Kalman Filters - Theory for Advanced Applications

Edited by Ginalber Luiz de Oliveira Serra

Chapter metrics overview

1,530 Chapter Downloads

View Full Metrics


The continuing trend toward connected sensors (“internet of things” and” ubiquitous computing”) drives a demand for powerful distributed estimation methodologies. In tracking applications, the distributed Kalman filter (DKF) provides an optimal solution under Kalman filter conditions. The optimal solution in terms of the estimation accuracy is also achieved by a centralized fusion algorithm, which receives all associated measurements. However, the centralized approach requires full communication of all measurements at each time step, whereas the DKF works at arbitrary communication rates since the calculation is fully distributed. A more recent methodology is based on ”accumulated state density” (ASD), which augments the states from multiple time instants to overcome spatial cross-correlations. This chapter explains the challenges in distributed tracking. Then, possible solutions are derived, which include the DKF and ASD approach.


  • distributed Kalman filter
  • target tracking
  • multisensor fusion
  • information fusion
  • accumulated state density

1. Introduction

Modern tracking and surveillance system development is increasingly driving technological trends and algorithm developments toward networks of dislocated sensors. Besides classical target tracking, many other applications can be found, for instance, in robotics, manufacturing, health care, and financial economics. A multisensor network can exploit spatial diversity to compensate for the weak attributes of a single sensor such as limited field of view or high measurement noise. Also, heterogeneous sensors can reveal a more complete situational awareness and a more precise estimate of the underlying phenomena. Additionally, a sensor network is more robust against a single point of failure in comparison to a standalone system, if its architecture is chosen carefully.

Despite its unquestioned advantages, a multisensor network must cope with design-specific challenges, for instance, a full transmission of all the measurements to a fusion center can be hindered, when the communication links are unreliable or constrained in bandwidth or costs. A well-known approach to cope with the limited bandwidth data links is to apply data processing on the sensor sites to generate local track parameters that are transmitted to the fusion center. The latter then reconstructs the global track parameters by an application of a Track-to-Track Fusion (T2TF) scheme. Depending on the scenario constraints, this is a nontrivial task, since the local tracks are mutually correlated due to the common process noise. The first known solution in the literature to the T2TF problem proposed to apply an information filter-based multisensor fusion algorithm in [1], which later became famous as the “tracklet fusion.” However, the tracklet fusion also requires a transmission from each sensor after each time step in order to reconstruct the optimal solution.

This chapter presents the theory and the derivation of the distributed Kalman filter (DKF), which is an optimal solution of the T2TF problem under Kalman filter assumptions with respect to the mean squared error (MSE). Assuming Kalman conditions means that linear Gaussian models are provided for the motion model and all measurement models of the sensors. Moreover, it is assumed that measurement-to-track (at the sensors) and track-to-track (at the fusion center) association has been solved. The DKF approach requires, however, all remote sensor models to be known at each local sensor site, which is infeasible in most practical scenarios. Therefore, this chapter also presents a solution based on the accumulated state density (ASD), which is closely related to the DKF but does not require the measurement models to be known. Surveys that reflect the history of research in distributed estimation can be found, for instance, in [2, 3].

This chapter is structured as follows: Section 2 summarizes the problem formulation. A basic approach to T2TF is given in Section 3, where we present the least squares solution. Section 4 presents a simple fusion methodology, which is easy to compute and provides practical results for various applications. The reason why this approach is suboptimal is investigated in Section 5 by means of a recursive computation of the cross-covariances of the local tracks. In Section 6, the product representation of Gaussian probability densities is introduced, which is the basis for the derivation of the distributed Kalman filter in Section 7. An alternative derivation in terms of information parameters is provided in Section 8. Since the local measurement models are often unknown in practical applications, the distributed accumulated state density filter is introduced in Section 9. The chapter concludes with Section 10.


2. Problem formulation

Throughout this chapter, it is assumed that all S sensors produce their measurements zksRm at each time step tk of the same target with its true state xkRn in a synchronized way. The extension to the unsynchronized case is straightforward by means of standard Kalman filter predictions, and is therefore omitted for the sake of notational simplicity. The measurement equation for sensor s1S is given by


where vksΝvks0Rks is the Gaussian distributed, zero-mean random variable, which models the noise of the sensing process. Therefore, the likelihood for a single measurement is fully described by the Gaussian


Since the measurement processes across all sensors Zk=zk1zkS are mutually independent, the joint likelihood of all sensor data produced at time tk factorizes:


The true state of the target itself is modeled as a time-variant stochastic process, where the transition from time tk1 to time tk is given by the following motion equation:


where wkΝwk0Qk is the Gaussian distributed, zero-mean random variable to model the process noise of the system. Analogously to the likelihood, this provides the probability density function for the transition model:


Based on the local processors, each sensor node produces a track at time tk in terms of an estimate xkks and a corresponding estimation error covariance Pkks. In a T2TF scheme, these parameters are the only information, which is transmitted to a fusion center (FC). It should be noted that the FC may also not be centralized, distinguished instance in the architecture, but each and every processing node can act as a FC. The introduction of a distinguished FC is only for clarification of different computation layers. An excellent overview of pros and cons of various data fusion layers can be found in [4].

The T2TF problem can now be stated as follows: compute a fused estimate xkk of the state xk and a consistent error covariance Pkk by means of the local tracks and knowledge on their models:


3. Least squares estimate

In order to compute an estimate as a well-suited combination of the local tracks, it is useful to consider the joint likelihood function given by the following Gaussian:


where Pkks,s=Pkks are the track covariances on the block-diagonal entries and Pkki,jcovxkki,xkkjxk=Pkkj,iT are the cross-covariances on the off-diagonal entries of the joint error covariance.

Since the joint likelihood from above is proportional to an exponential function:


the maximum likelihood (ML) estimate can be computed in terms of the least squares:


where xkk1:S=xkk1TxkkSTT, I1:S=IIT, and Pkk1:S for the joint error covariance have been introduced for notational simplicity. A closed form solution of the ML estimates can be obtained by setting the gradient with respect to the state to zero:


Therefore, the ML estimate is given by:


For information fusion applications, it is also important to have a consistent estimate of the squared error, in other words, we need to compute the corresponding error covariance:

cov[ xkk|xk ]=E[ (xkxkk)(xkxkk)T ]E[ (xkxkk)2 ]E[ (xk((I 1:S)T(P kk 1:S)1(I1:S))1(I1:S)T(Pkk1:S)1xkk1:S)2 ]E [ ( ((I1:S)T(Pkk1:S)1(I1:S))1( ((I1:S)T(P(k|k)1:S)1(I1:S))xk (I1:S)T(Pkk1:S)1xkk1:S ) )2 ]=((I1:S)T(Pkk1:S)1(I1:S))1(I1:S)T(Pkk1:S)1E[ ((I1:S)xkxkk1:S)2 ](Pkk1:S)1(I1:S)((I1:S)T(Pkk1:S)1(I1:S))1=((I1:S)T(Pkk1:S)1(I1:S))1.E12

The last equation holds due to the fact that EI1:Sxkxkk1:S2=Pkk1:S is the joint covariance. Concluding the derivations from above equation, one can obtain:


4. Naïve fusion

It is obvious that for the ML estimate, it is assumed that the cross-covariances Pkki,j,i,j1S are known. Since this might not be given in practical scenarios, a simple approximation is to assume them to be zero. This approach is called Naïve fusion. It implies that the joint error covariance is given in block-diagonal form:


As a direct consequence of the matrix inversion lemma (see Appendix 12.1), the inverse can be obtained in closed form solution:


Filling into the maximum likelihood formulas directly yields.


Thus, by means of a simple approximation of the ML estimate, we have obtained a first practical fusion rule for the FC, which we call convex combination due to its structure for further usage. The fusion scheme as a whole can be outlined schematically as in the flowing Figure 1. Each sensor node s processes its produced sensor data by means of a local filter, which results in a track in terms of an estimate together with an error covariance. These parameters are transmitted to the fusion center, which applies the convex combination to compute the fused result.

Figure 1.

Fusion scheme of the Naive Fusion approach.


5. What makes the Naïve fusion naïve?

For the Naïve fusion, we have assumed that the cross-covariances vanish. It is worth to be aware of the structure of the cross-covariances to see the conditions whether this holds or does not hold. This can be achieved by a recursive computation of the posterior cross-covariance Pkki,j of two sensors with indices i and j, which process their data by means of local Kalman filters. At the beginning of the estimation process at t0 tracks are not yet correlated, that is, P00i,j=0, due to the fact that initial measurements are mutually uncorrelated. A recursive computation can be achieved by a prediction-filtering cycle.

5.1. Cross-covariance prediction

For the prediction step, it is assumed that a previous posterior cross-covariance Pkk1i,j has been computed. The prior parameters are obtained by means of the motion model:

Pkk1i,j=E[ (xkxkk1i)(xkxkk1j)T ]=E[ (Fkk1xk1+wkFkk1xk1k1i)(Fkk1xk1+wkFkk1xk1k1j)T ]=Fkk1E[ (xk1xk1k1i)(xk1xk1k1j)T ]Fkk1TFkk1E[ (xk1xk1k1i)wkT ]E[ wk(xk1xk1k1j)T ]Fkk1T+E[ wkwkT ]=Fkk1Pk1k1i,jFkk1T+Qk,E17

where the last equality holds due to the fact that the estimation errors at time tk1 of both sensor processors are uncorrelated to the process noise wk.

5.2. Cross-covariance filtering

In the filtering step, both sensors compute their posterior parameters based on the produced measurements zki and zkj, respectively. It is assumed that the local processors have applied local Kalman filter update steps with Kalman gains Wkk1i and Wkk1j. The cross-covariance posterior matrix can be obtained by a straightforward computation:

Pkki,j=E[ (xkxkki)(xkxkkj)T ]=E[ (xkxkk1iWkk1i(zkiHkixkk1i))(xkxkk1jWkk1j(zkjHkjxkk1j))T ]=E [ (xkxkk1iWkk1i(Hkixk+vkiHkixkk1i))( xkxkk1jWkk1j (Hkjxk+vkjHkjxkk1j) )T ]=E[ ((IWkk1iHki)(xkxkk1i)Wkk1ivki)((IWkk1jHkj)(xkxkk1j)Wkk1jvkj)T ]=(IWkk1iHki)E[ (xkxkk1i)(xkxkk1j)T ](IWkk1jHki)T+Wkk1iE[ vkivkjT ]Wkk1jT=(IWkk1iHki)Pkk1i,j(IWkk1iHki)TE18

For these equations, we have used the fact that the prior estimate error xkxkk1i is independent of the measurement noise vki, and that vki and vkj are mutually independent.

Concluding the calculations from this section, we have obtained a recursive solution for the cross-covariances:


One can see that the cross-covariances are zero, if and only if the process noise covariance Qk vanishes. In other words, if the tracks refer to a deterministically moving target and all sensors do local Kalman filtering, then the convex combination equations yield the optimal fusion result. If this is not the case, as maybe in most practical applications, then the Naïve fusion method is an approximation and its degree of approximation depends primarily on the level of the process noise.


6. Gaussian product representation

The basic concept of the distributed Kalman filter is to make the local parameters stochastically independent, even if process noise is present. This is achieved by a product representation, which directly follows from the fact that.


Thus, the Gaussian product representation is equivalent to uncorrelated track parameters for each processing node. It should be noted that the product representation is not normalized, that is, the integral for S>1 is not unity. This, however, is not of practical relevance since the fused estimate density is a Gaussian and the parameters of which are provided by the convex combination.


7. Derivation of the distributed Kalman filter

For the DKF, we are going to modify the local processing scheme for each sensor in order to have the product representation hold at each instant of time. Then, when the fusion center receives the parameters from all sensors, the convex combination can be applied to compute the optimal global estimate. Note that the convex combination does not consider a local prior of the fusion center; therefore, the result will be independent from previous transmissions. This can be of great benefit, if communication channels with unreliable links have to be considered, since the full information on the target state is distributed in the sensor network. However, for completeness, it should also be noted that the modified local parameters are not optimal anymore in a local sense. One could say that local optimality is given up for the sake of global optimality [5].

In the following sections, the derivation of a prediction-filtering recursion of the DKF is discussed.

7.1. DKF prediction

For the prediction, it is assumed that the previous posterior is given in product representation:


For notational simplicity, we have conditioned the posterior on the full data set Zk1=Z1k1,,Zsk1, where Zsk1Z1sZk1s are from all sensors and all time steps up to time tk1, of which the local tracks are sufficient statistics. At the initialization phase, that is, k1=0, the product representation is trivial, since the initial estimates can be based on first measurements, which are mutually independent.

To derive a closed form solution for the prediction of product representation, it is required to globalize the covariances Pk1k1s of the local processing nodes at first. This process changes the local parameters such that the same previous posterior density is factorized; however, the local covariances will be unified to a single Pk1k1. Rigorously speaking, this matrix does not represent a meaningful covariance in the sense of an expected estimation error squared anymore. Still, the fused result will be optimal since the global density is not changed during this process. Thus, if we set




then the same fused density will be obtained, which is easily verified by means of the convex combination. It should be noted that the remote error covariances Pk1k1s are required to compute the globalized covariance matrix Pk1k1. Since Kalman filter conditions are assumed, they can be computed without data transmission, as they do not depend on the local sensor measurements. Therefore, it is sufficient to be aware of the remote sensor models.

The prediction formulas can now be obtained by a marginalization of the joint density of the current and the last time step:


The last equality holds due to the Markov property of the system. Filling in our linear Gaussian transition model and the previous posterior yields


By means of a simple algebraic manipulation, it is possible to factorize the transition kernel Gaussian up to proportionality:


Thus, we can factorize the integration term of the global prior completely:


An application of the product formula (Section 12.2 in the appendix) yields:




At this point, we have derived factorized prediction formulas for the DKF prediction, and to our knowledge, the remaining integral is part of the normalization constant. This, however, is not trivial, since the parameter ys depends on xk. A successive application of the product formula yields the desired result:


for some auxiliary variables y¯¯ and Y¯¯. All factors, which are independent of xk have been omitted. This proves that the integral is independent of the state variable xk. Concluding the derivations from above, we have derived the prediction formulas of the local estimation parameters as:


7.2. DKF filtering

Let Zk denotes the set of measurements produced by all sensors at time tk. The posterior density can be inferred by using the Bayes theorem:


Due to the mutual independence of the measurement noises, the joint likelihood function is given by:


This is particularly useful for the structure of the product representation used for the DKF. Filling in the linear Gaussian models and neglecting the normalization constant in the denominator directly yields:


Thus, the product formula again can be applied to compute the posterior parameters:




Again, we have omitted the factors, which are independent of xk.


8. Information filter formulation of the DKF

In [6], an elegant derivation of the DKF formulas was published based on the information filter (IF). The IF uses information matrices, which are inverted covariances, and information states, which are information matrices multiplied with states. The optimal, centralized update formulas for S sensors based on the predicted information parameters Pkk11 and Pkk11xkk1 are given by:


where iks=HksTRks1zks and Iks=HksTRks1Hks are the local information contribution from the current measurements at time tk, which were received by the FC. If we want to distribute the computation to S nodes, we will have them uncorrelated as in the DKF previously. Since the fused estimate is obtained via the convex combination, the local information parameters are summed up:


This summation structure can be used to provide a closed prediction-filtering cycle.

8.1. Information DKF prediction

The prediction of the state is easier than a direct transition of the information parameters. Based on the fused estimate, one can obtain.


Thus, we have given the local predicted state parameters as:


Analogously, one obtains for the prior covariance:


Thus, if we set Pkk1s=SPkk1=Fkk1SPk1k1Fkk1T+SQk, then the convex combination yields the exact global fused covariance.

8.2. Information DKF filtering

For the filtering, it is assumed that each sensor has computed its local information contribution parameter iks and Iks from its own sensor model and, in addition, the information matrix contributions Ikl from all remote sensors l by using the individual sensor models which are again assumed to be known. Then, the information state is updated via.


As a direct consequence, the updated parameters of the local processors follow the standard IF filtering equations:


For the globalized information matrix, the remote information parameters from the sensor models are used:


It is important to note that the local processing nodes compute both the local pseudo information matrix Pkks1 and the global fused error covariance Pkk, which is required for the prediction to the next time step.


9. Distributed accumulated state density filter

The DKF from Sections 7 and 8 can be considered a big step toward distributed state estimation, tracking, and information inference. However, in practical applications, the exact solution is often hindered by the fact that the exact remote sensor model parameters are unknown and can only be approximated based on local state estimates. The good news is that there is another exact solution based on the accumulated state density (ASD). The distributed ASD equations turn the spatial correlations into temporal correlations of successive states. Originally, the ASD equations were introduced to solve the out-of-sequence problem, which handles delayed transmissions of measurements into an ongoing fusion process in an optimal manner. Therefore, the temporal correlations can well be coped with the ASD approach.

At first, let us introduce the ASD state xk:n as


where tk refers to the current time of the filtering process and tn refers to the initialization time. The ASD approach now considers the conditional joint density pxk:nzk, that is, the posterior of the full trajectory of the target between tn and tk. In particular, the individual state densities of a single instant of time can be obtained via marginalization. Also, it should be noted that the Rauch-Tung-Striebel (RTS) smoothing equations are inherently integrated in the ASD posterior, since all states are conditioned on the full set of measurements up to time tk [7].

A recursive computation of the ASD posterior can be achieved by using the Bayes theorem:


Since the measurements conditioned on the whole trajectory only depend on the state at time tk, the joint likelihood function is given by:


The second factor can be reformulated as follows:


where we have used the Markov property of the system in the last equation. This recursive representation can now be repeated on the term pxk1:nZk1. A successive application of this procedure yields


where we have neglected the normalization constant in the denominator. Filling in our Gaussian models and using the factorization of the transition model from above equation yields


Since the initial density usually is based on a first measurement, we can assume that it factorizes into independent local track starts:


When the posterior is fully factorized in the number of sensors and in the time steps, each processing node can compute the resulting ASD Gaussian with mean xk:nks and covariance matrix Pk:nks [7]:


where the parameters are given by:


We have used a short notation such that xlk=Exlzk are the smoothed state estimates and Plk=covxlzk are the covariances, respectively, which result from the Rauch-Tung-Striebel equations. Also, the combined retrodiction gain matrices are known from the RTS smoother:


Thus, when the FC receives the local ASD parameters, the optimal fused estimate can be obtained via the convex combination:


For a continuous state estimation process, it is convenient to formulate the distributed ASD solution in terms of a prediction-filtering cycle.

9.1. Distributed ASD prediction

For the prediction step, it is assumed that the local processing node s has computed the previous filtering parameters xk1:nk1s and Pk1:nk1s, which refer to time tk1. Then, the prior parameters are given by:


9.2. Distributed ASD filtering

Since the prior is factorized in form of a product representation and the current measurements from time tk are mutually uncorrelated, local Kalman filters can be applied to obtain the posterior parameters:


10. Conclusion

In this chapter, we have introduced the least squares solution to the track-to-track fusion problem, where cross-covariances of the track estimation errors are required. Neglecting the cross-covariances has led us to the Naïve fusion, a simple but powerful fusion algorithm for practical applications. By recursive computation of the cross-covariances, we have seen that they primarily depend on the process noise of the state transition kernel. Since a centralized computation of the cross-covariances is infeasible in practical applications, more sophisticated solutions are required for optimal fusion results. The distributed Kalman filter, which uses the product representation to keep the local parameters decorrelated achieved this. However, this approach only works, if the local processors know all measurement models at each time step. Then, the distributed accumulated state density filter uses the temporal correlations to factorize the global posterior density. This approach does not require remote sensor models and is therefore, well suited for extensions with measurement ambiguity or nonlinear measurement functions.

In the study, one can find more extensions based on the distributed Kalman filter to overcome the lack of knowledge on the remote sensor models. In [8] and the references therein, a debiasing matrix is introduced to compensate for globally biased gain matrices of the local filters. An application of the tracklet fusion based on the distributed accumulated state density filter can be found in [9]. Then, in [10], the information filter formulation of the distributed Kalman filter also was extended to scenarios with input information on the transition process.

A. Appendix

A.1. Matrix inversion lemma

Let A,B,C, and D be matrices of a block matrix such that A and D are invertable, and also such that the Schur-complements DCA1B and ABD1C have full rank. Then, the inversion of the block matrix is given by:


In particular, it holds that


Proof. Let the inverted block matrix be given by submatrices E,F,G, and H. By definition, it holds that


where I and O are the identity and zero matrix, respectively. A matrix multiplication of the first and second equality yields two blocks of equations:


which we call block A and


which we call block B. Resolving block A for E,F,G, and H yields the first version of the inverted block matrix, whereas resolving block B yields the second version.

A.2. Product formula for Gaussian densities

For two Gaussian distributed random variables x and z, it holds that




A proof can be found, for instance, in [11].


  1. 1. Chong CY. Hierarchical estimation. Proceedings of the 2nd MIT/ONR Workshop on Distributed Information and Decision Systems Motivated by Naval Command Control Communication Systems, Monterey, CA, USA. 1979
  2. 2. Mahmoud MS, Khalid HM. Distributed Kalman Filtering: A Bibliographic Review. IET Control Theory & Applications; 2013
  3. 3. Campbell ME, Ahmed NR. Distributed Data Fusion: Neighbors, Rumors, and the Art of Collective Knowledge. IEEE Control Systems; 2016
  4. 4. Hall D, Chong C-Y, Llinas J, Martin Liggins II, editors. Distributed Data Fusion for Network-Centric Operations. CRC Press; 2013
  5. 5. Govaers F, Koch W. An exact solution to track-to-track-fusion at arbitrary communication rates. In: IEEE Transactions on Aerospace and Electronic Systems. Vol. 48. 2012
  6. 6. Chong CY, Mori S. Optimal Fusion For Non-Zero Process Noise. Proceedings of the 16th International Conference on Information Fusion, Istanbul. 2013
  7. 7. Koch W, Govaers F. On accumulated state densities with applications to out-of-sequence measurement processing. In: IEEE Transactions on Aerospace and Electronic Systems. Vol. 47. 2011
  8. 8. Reinhardt M. Linear Estimation in Interconnected Sensor Systems with Information Costraints. PhD thesis: KIT Scientific Publishing; 2014
  9. 9. Govaers F, Chong CY, Mori S, Koch W. Comparison of Augmented State Track Fusion Methods for Non-Full-Rate Communication. 18th International Conference on Information Fusion, Washington, DC. 2015
  10. 10. Pfaff F, Noack B, Hanebeck UD, Govaers F, Koch W. Information Form Distributed Kalman Filtering (IDKF) with Explicit Inputs. 20th International Conference on Information Fusion, Xi’an, China. 2017
  11. 11. Koch W. Tracking and Sensor Data Fusion: Methodological Framework and Selected Applications. Springer; 2014:251

Written By

Felix Govaers

Submitted: 20 April 2017 Reviewed: 25 October 2017 Published: 20 December 2017