Examples of average-consensus protocols forming the weights in Eq. (31).
Kalman filtering in its distributed information form is reviewed and applied to a network of receivers tracking Global Navigation Satellite Systems (GNSS). We show, by employing consensus-based data-fusion rules between GNSS receivers, how the consensus-based Kalman filter (CKF) of individual receivers can deliver GNSS parameter solutions that have a comparable precision performance as their network-derived, fusion center dependent counterparts. This is relevant as in the near future the proliferation of low-cost receivers will give rise to a significant increase in the number of GNSS users. With the CKF or other distributed filtering techniques, GNSS users can therefore achieve high-precision solutions without the need of relying on a centralized computing center.
- distributed filtering
- consensus-based Kalman filter (CKF)
- global navigation satellite systems (GNSS)
- GNSS networks
- GNSS ionospheric observables
Kalman filtering in its decentralized and distributed forms has received increasing attention in the sensor network community and has been extensively studied in recent years, see e.g. [1, 2, 3, 4, 5, 6, 7, 8]. While in the traditional centralized Kalman filter setup all sensor nodes have to send their measurements to a computing (fusion) center to obtain the state estimate, in the distributed filtering schemes the nodes only share limited information with their neighboring nodes (i.e. a subset of all other nodes) and yet obtain state estimates that are comparable to that of the centralized filter in a minimum-mean-squared-error sense. This particular feature of the distributed filters would potentially make the data communication between the nodes cost-effective and develop the nodes’ capacity to perform
Next to sensor networks, distributed filtering can therefore benefit several other applications such as formation flying of aerial vehicles , cooperative robotics  and disciplines that concern the Global Navigation Satellite Systems (GNSS). The latter is the topic of this present contribution. The GNSS have been proven to be an efficient tool for determination of time varying parameters that are of importance for Earth science disciplines like positioning, deformation, timing and atmosphere [11, 12]. Parameter estimation in GNSS often relies on the data processing of a
The structure of this contribution is as follows. We first briefly review the principles of the standard Kalman filter and its information form in Section 2. The additivity property of the information filter that makes this filter particularly useful for distributed processing is also highlighted. In Section 3 we discuss average consensus rules on which the sensor nodes agree to fuse each other information. Different consensus protocols are discussed and a ‘probabilistic’ measure for the evaluation of their convergence rates is proposed. Section 4 is devoted to the CKF algorithmic steps. Its two time-scale nature is remarked and a three-step recursion for evaluating the consensus-based error variance matrix is developed. In Section 5 we apply the CKF theory to a small-scale network of GNSS receivers collecting ionospheric observables over time. Conducting a precision analysis, we compare the precision of the network-based ionospheric solutions with those of their single-receiver and consensus-based counterparts. It is shown how the CKF of each receiver responses to an increase in the number of iterative communications between the neighboring nodes. Concluding remarks and future outlook are provided in Section 6.
2. Kalman filtering
Consider a time series of observable random vectors . The goal is to
2.1. The Kalman filter standard assumptions
To predict the state-vectors in an optimal sense, one often uses the minimum mean squared error (MMSE) principle as the optimality criterion, see e.g., [19, 20, 21, 22, 23, 24, 25]. In case no restrictions are placed on the class of predictors, the MMSE predictor is given by the conditional mean , known as the Best Predictor (BP). The BP is unbiased, but generally nonlinear, with exemptions, for instance in the Gaussian case. In case and are jointly Gaussian, the BP becomes linear and identical to its linear counterpart, i.e. the Best Linear Predictor (BLP)
Eq. (1) implies that (1) the BLP is unbiased, i.e. , and that (2) the prediction error of a BLP is always uncorrelated with observables on which the BLP is based, i.e. . These two basic properties can be alternatively used to uniquely specify a BLP .
The Kalman filter is a
for the time instances , with being the Kronecker delta. The nonsingular matrix denotes the transition matrix and the random vector is the system noise. The system noise is thus assumed to have a zero mean, to be uncorrelated in time and to be uncorrelated with the initial state-vector . The transition matrix from epoch to is denoted as . Thus and (the identity matrix).
for , with being the known design matrix. Thus the zero-mean measurement noise is assumed to be uncorrelated in time and to be uncorrelated with the initial state-vector and the system noise .
2.2. The three-step recursion
That the initial error variance matrix is identical to the variance matrix follows from the equality .
that are both uncorrelated with the previous data . Hence, the time update is given by
The error variance matrix follows by applying the covariance propagation law to (8), together with .
that are both uncorrelated with . Therefore, cannot be predicted by the previous data , showing that contains truly
since and . The measurement update reads then
The error variance matrix follows by an application of the covariance propagation law, together with . Application of the covariance propagation law to (10) gives the variance matrix of as follows
2.3. A remark on the filter initialization
In the derivation of the Kalman filter one assumes the mean of the random initial state-vector , in Eq. (3), to be known, see e.g. [30, 31, 32, 33, 34, 35, 36, 37]. This is because of the BLP structure (1) that needs knowledge of the means and . Since in many, if not most, applications the means of the state-vectors are unknown, such derivation is therefore not appropriate. As shown in Ref. , one can do away with this need to have both the initial mean and variance matrix , given in Eq. (3), known. The corresponding three-step recursion would then follow the Best Linear Unbiased Prediction (BLUP) principle and not that of the BLP. The BLUP is also a MMSE predictor, but within a more restrictive class of predictors. It replaces the means and by their corresponding Best Linear Unbiased Estimators (BLUEs). Within such BLUP recursion, the initialization Eq. (7) is revised and takes place at time instance in the presence of the data . Provided that matrix is of full column rank, the predictor follows from solving the normal equations
The above error variance matrix is thus
showing that . Matrices and () are used for two
The MMSE of the BLUP recursion is never smaller than that of the Kalman filter, as the Kalman filter makes use of additional information, namely, the known mean and variance matrix . When the stated information is available, the BLUP recursion is shown to encompass the Kalman filter as a special case . In the following we therefore assume that the means of the state-vectors are unknown, a situation that often applies to GNSS applications.
2.4. Filtering in information form
The three-step recursion presented in Eqs. (7), (9) and (12) concerns the time-evolution of the predictor and the error variance matrix . As shown in Eq. (15), both and can be determined by the normal matrix and the right-hand-side vector . One can therefore alternatively develop recursion concerning the time-evolution of and . From a computational point of view, such recursion is found to be very suitable when the inverse-variance or
Given the definition above, the information filter recursion concerning the time-evolution of and would then follow from the recursion Eqs. (15), (9) and (12), along with the following matrix-inversion equalities
The algorithmic steps of the information filter are presented in Figure 1 . In the absence of data, the filter is initialized by the zero information and . In the presence of the data , the corresponding normal matrix and right-hand-side vector are added to the time update information and to obtain the measurement update information and . The transition matrix and inverse-variance matrix would then be used to time update the previous information and .
Matrix can be, for instance, structured by the columns of the identity matrix corresponding to the columns of on which the sub-matrix is positioned. The special case
Thus instead of , the inverse-matrix and are assumed available.
2.5. Additivity property of the information measurement update
As stated previously, the information filter delivers outcomes equivalent to those of the Kalman filter recursion. Thus any particular preference for the information filter must be attributed to the
As shown in Figure 1 , the information measurement update is
Accordingly, the observable vector is partitioned into sub-vectors (), each having its own design matrix and measurement noise vector . One can think of a network of sensor nodes where each collects its own observable vector , but aiming to determine a common state-vector . Let us further assume that the nodes collect observables independently from one another. This yields
Thus the measurement noise vectors () are assumed to be
According to Eq. (24), the measurement information of each node, say and , is individually added to the information states and , that is
Now consider the situation where each node runs its own
The local states and would then be measurement updated as (cf. 26)
that are equal to the central states and , respectively. In this way one has multiple distributed local filters , where each recursively delivers results identical to those of a central filter.
To compute the average quantities and , node may need to receive all other information and (). In other words, node would require direct connections to
3. Average consensus rules
In the previous section, we briefly discussed the potential applicability of the information filter as a tool for handling the measurement model Eq. (22) in a distributed manner. With the representation Eq. (28) however, one may be inclined to conclude that such applicability is limited to the case where the nodes , have ‘direct’ communication connections to one another in order to receive/send their measurement information and ().
Instead of having direct connections, the idea is now to relax such a stringent requirement by assuming that the nodes are linked to each other
3.1. Communication graphs
The way the nodes interact with each other to transfer information is referred to as the interaction topology between the nodes. The interaction topology is often described by a directed graph whose vertices and edges, respectively, represent the nodes and communication links . The interaction topology may also undergo a
is therefore assumed to be connected. We define the
3.2. Consensus protocols
Given the right-hand-vector , suppose that node aims to obtain the average for which all other vectors () are required to be available (cf. 27). But the node only has access to those of its neighbors, i.e. the vectors (). For the first session , it would then seem to be reasonable to compute a
with . Choosing a set of weights , the nodes agree on the consensus protocol (31) to iteratively fuse their information vectors . Here and in the following, we use the letter ‘’ for the ‘
The question that now comes to the fore is how to choose the weights such that the approximation gets close to through the iteration Eq. (31). More precisely, the stated iteration becomes favorable if when for all nodes . To address this question, we use a multivariate formulation. Let be the size of the vectors (). We define the higher-dimensioned vector . The multivariate version of Eq. (31) reads then
The weight matrix is structured by () and .
where the -vector contains ones. If the condition Eq. (34) is met, the set of nodes can
Examples of such consensus protocols are given in Table 1 . As shown, the weights form a symmetric weight matrix , i.e. . In all protocols presented, self-weights are chosen so that the condition is satisfied. The weights of Protocols 1 and 2 belong to the class of ‘maximum-degree’ weights, while those of Protocol 3 are referred to as ‘Metropolis’ weights . The weights of Protocols 1 and 3 are driven by the degrees (number of neighbors) of nodes , denoted by . For instance, in network (a) of Figure 2 we have as node 1 has 4 neighbors, while as node 14 has 7 neighbors. Protocol 4 is only applicable to networks like (b) in Figure 2 , i.e. when each node has
To provide insight into the applicability of the protocols given in Table 1 , we apply them to the networks of Figure 2 . Twenty values (scalars), say (), are generated whose average is equal to 5, i.e. . Each value is assigned to its corresponding node. For network (a), Protocols 1, 2 and 3 are separately applied, whereas Protocol 4 is only applied to network (b). The corresponding results, up to 30 iterations, are presented in Figure 3 . As shown, the iterated values () get closer to their average (i.e. ), the more the number of iterative communications.
3.3. On convergence of consensus states
Figure 3 shows that the states () converge to their average , but with
Now let the initial states have the same mean and the same variance matrix (), but mutually uncorrelated. An application of the covariance propagation law to (35), together with , gives
Thus the closer the squared matrix to , the smaller the variance matrix Eq. (36) becomes. In the limit when , the stated variance matrix tends to zero. This is what one would expect, since . Under the conditions stated in Eq. (34), matrices have as the largest absolute value of their eigenvalues . A symmetric weight matrix can then be expressed in its
The above equation shows that the entries of the variance matrix (36) are largely driven by the second largest eigenvalue of , i.e. . The smaller the scalar , the faster the quantity tends to zero, as . The scalar is thus often used as a measure to judge the convergence performances of the protocols . For the networks of Figure 2 , of Protocols 1, 2 and 3 are about 0.92, 0.97, 0.91, respectively. As Protocol 3 has the smallest , it is therefore expected to have the best performance. Note, in Protocol 4, that the weight matrix varies in every session, the performance of which cannot be judged by a single eigenvalue . One can therefore think of another means of measuring the convergence performance. Due to the randomness of the information vectors (), one may propose ‘probabilistic’ measures such as
to evaluate the convergence rates of the protocols, where . Eq. (39) refers to the probability that the maximum-norm of the difference vectors () is not larger than a given positive scalar for a fixed number of iterations . The higher the probability Eq. (39), the better the performance of a protocol. For the scalar case , Eq. (39) is reduced to
which is the probability that the absolute differences () are not larger than q times the standard-deviation . For the networks of Figure 2 , 100,000 normally-distributed vectors as samples of are simulated to evaluate the probability (40). The results for Protocols 1, 2, 3 and 4 are presented in Figure 4 . The stated probability is plotted as a function of for three numbers of iterative communications and 30. As shown, Protocol 3 gives rise to highest probabilities, while Protocol 2 delivers lowest probabilities. After 10 iterations, the probability of having absolute differences smaller than one-fifth of the standard-deviation (i.e. ) is about 80% for Protocol 1, whereas it is less than 5% for Protocol 2. After 30 iterations, the stated probability increases to 80% for Protocol 2, but close to 100% for Protocols 1 and 3.
Figure 4 demonstrates that the convergence performance of Protocol 4 is clearly better than that of Protocol 2, as it delivers higher probabilities (for the networks of Figure 2 ). Such a conclusion however, cannot be made on the basis of the results of Figure 3 . This shows that results obtained on the basis of specific ‘realizations’ of () are not necessarily representative.
4. Consensus-based Kalman filters
4.1. Two time-scale approach
In Section 2.5 we discussed how the additivity property of the measurement update Eq. (26) offers possibilities for developing multiple distributed local filters , each delivering local states and equal to their central counterparts and . In doing so, each node has to evaluate the averages and at
Under the assumption , the CKF recursion follows from the Kalman filter recursion by considering an extra step, namely, the ‘consensus update’. The algorithmic steps of the CKF in information form are presented in Figure 6 . Compare the recursion with that of the information filter given in Figure 1 . Similar to the information filter, the CKF at node is initialized by the zero information and . In the presence of the data , node computes its local normal matrix and right-hand-side vector to send them to its neighboring nodes (). In the consensus update, iterative communications between the neighboring nodes are carried out to approximate the averages and by and , respectively. After a finite number of communications , the consensus states and are, respectively, added to the time update information and to obtain their measurement update version and at node (cf. 28). The time update goes along the same lines as that of the information filter.
4.2. Time evolution of the CKF error covariances
With the consensus-based information filter, presented in Figure 6 , it is therefore feasible to develop multiple distributed filters, all running in parallel over time. By taking recourse to an average-consensus protocol, not all the nodes are needed to be directly linked, thereby allowing non-neighboring nodes to also benefit from information states of each other. The price one has to pay for such an attractive feature of the CKF is that the local predictors
will have a
Note that the terms and are uncorrelated. With as the entries of the product matrix in Eq. (34), one obtains4
since . With this in mind, an application of the covariance propagation law to (42) results in the error variance matrix
In Figure 7 we present the three-step recursion of the error variance matrix (for node ). As shown, the node would need an
To better appreciate the recursion given in Figure 7 , let us consider a special case where a
The central error variance matrix would then simply follow by inverting the sum of all normal matrices over nodes and time instances. Collecting observables up to and including time instance , the stated variance matrix reads . We now compare with its consensus-based local counterpart at node , i.e. . The aforementioned assumptions, together with , give
in which the scalar is given by
Substitution into the stated recursion provides us with the time-evolution of the error variance matrix as follows ( Figure 7 )
This shows that the consensus-based error variance matrix is times its central counterpart . With the vector , application of the Cauchy-Schwarz inequality gives the lower-bound
as . Thus scalar is never smaller than , i.e. , showing that the performance of the consensus-based predictor is never better than that of its central version . The lower-bound Eq. (48) is reached when , i.e. when (). According to Eq. (34), this can be realized if , for which the number of iterations might be required to be reasonably large. The conclusion reads therefore that the local filters at nodes generate information matrices , the inverse of which are different from the actual error variance matrices of the predictors , i.e. .
5. Applications to GNSS
The purpose of this section is to demonstrate how the CKF theory, discussed in Section 4, can play a pivotal role in applications for which the GNSS measurements of a
As previously discussed, consensus-based algorithms and in particular the CKF can be employed to process network data in a distributed filtering scheme, i.e. no computing center is required. In order to illustrate such applicability, we simulate a network of 13 GNSS receivers located in Perth, Western Australia ( Figure 8 ). As shown in the figure, each node (white circle) represents a receiver having data links (red lines) to its neighbors with inter-station distances up to 4 km. We therefore assume that the receivers receive each other data within the ranges not longer than 4 km. For instance, receiver 1 is directly connected to receivers 2 and 6, but not to receiver 3 (the inter-station distance between receivers 1 and 3 is about 8 km).
5.1. GNSS ionospheric observables: Dynamic and measurement models
Although the GNSS observables contain information on various positioning and non-positioning parameters, here we restrict ourselves to
Let the scalar denote the pseudo-range ionospheric observable that the receiver collects from satellite at time instance . The corresponding measurement model, formed by the between-satellite differences , reads (cf. 5)
where the term within refers to the first-order slant ionospheric delays, and denotes the between-satellite differential code biases (DCBs). We use a regional single-layer model [45, 46] to represent the slant ionospheric delays in terms of 1) as the vertical total electron content (TEC), 2) and 3) as the south-to-north and west-to-east spatial gradient of , respectively. The corresponding known coefficients follow from Ref. 
with . The angles and , respectively, denote the longitude and latitude of the ionospheric piercing points (IPPs) corresponding to the receiver-to-satellite line-of-sight (see Figure 9 ). They are computed with respect to those of the reference IPP at time instance , i.e. and . The angle denotes the zenith angle of the IPPs. These angles are computed based on the mean Earth’s radius km and height of layer 450 km. The measurement noises are assumed to be mutually uncorrelated with the dispersion (cf. 6)
forming the variance matrices in Eq. (6), where is the satellite elevation angle. The scalar is set to cm as the zenith-referenced standard-deviation of the GPS ‘geometry-free’ pseudo-range measurements .
Suppose that number of satellites , are tracked by the network receivers , during the observational campaign. The state-vector sought is structured as
Thus the state-vector contains three TEC parameters , , and between-satellite DCBs (). The dynamic model is assumed to be given by (cf. 2, 4 and 21)
Thus the DCBs are assumed constant in time, while the temporal behavior of the TEC parameters , , is captured by a random-walk process. The corresponding zero-mean process noises are assumed to be mutually uncorrelated, having the standard-deviations mm/and mm/rad/.
5.2. Observational campaign
The network receivers (), shown in Figure 8 , are assumed to track GPS satellites over 16 hours from 8:00 to 24:00 Perth local time, on 02-06-2016. The observation sampling rate is set to minute. Thus the number of observational epochs (time instances) is 960. As to the data sending rate (cf. 5), we assume three different sending rates and 15 seconds. Thus the number of iterative communications between the neighboring receivers takes the values and 12. The consensus protocol 3 ( Table 1 ) is applied to the CKF of each receiver.
As the satellites revolve around the Earth, not all of which are simultaneously visible to the ‘small-scale’ network of Figure 8 . Their visibility over time is shown in Figure 10 (left panel) in which the satellites with elevation angles smaller than 10 degrees are excluded. There are 31 GPS satellites (i.e. ), with PRN 4 absent (PRN refers to the satellite identifier). PRN 22 has the maximum duration of visibility, while PRN 21 has the minimum duration of visibility. Note also that PRNs 2, 6, 16, 17, 19, 26 and 32 disappear (set) and reappear (re-rise). That is why their visibility is shown via two separate time intervals. Figure 10 (right panel) shows the trajectories of the ionospheric pierce points on the ionospheric single layer that are made by receiver-to-satellite line-of-sight paths. It is the spatial distribution of these points that drives the coefficients , , in Eq. (49).
In the following we present precision analyses on the
5.3. Central (network-based) versus local (single-receiver) solutions
Before discussing the precision performance of the CKF solutions, we first compare the network-based (central) TEC solutions with the solutions that are obtained by the data of one single-receiver only (referred to as the local solutions). At the filter initialization, the standard-deviations of the local TEC solutions are times larger than those of the central TEC solutions (i.e. square-root of the number of nodes). This is because of the fact that each of the 13 network receivers independently provides equally precise solutions. In that case, the central solution follows then by averaging all the 13 local solutions. Due to the common dynamic model Eq. (53) however, the local solutions become correlated over time. After the filter initialization, the central solution would therefore
5.4. Role of CKF in improving local solutions
With the results of Figure 11 , we observed that the central TEC solutions considerably outperform their local counterparts in the sense of delivering more precise outcomes, i.e. the local-to-central standard-deviation ratios are considerably larger than 1. We now employ the CKF for each node (receiver) to
Next to the solutions of the TEC parameters , and , we also analyze CKF solutions of the between-satellite DCBs () in Eq. (52). Because of the difference in the satellites visibility over time (cf. Figure 10 ), the DCBs’ standard-deviations are quite distinct and very much depend on the duration of the satellites visibility. The more a pair of satellites are visible, the smaller the standard-deviation is expected. We now consider the required time to have between-satellite DCBs solutions with standard-deviation smaller than 0.5 nanoseconds. Because of the stated difference in the standard-deviations, each between-satellite DCB corresponds to a different required time. For the central filter, the minimum value of such required time is 7 minutes, with the 25th percentile as 12, median as 38, 75th percentile as 63 and the maximum as 84 minutes. Thus after 84 minutes of the filter initialization, all central DCB solutions have standard-deviations smaller than 0.5 nanoseconds. Such percentiles can be represented by a ‘boxplot’. We compute the stated percentiles for all the CKF solutions and compare their boxplots with the central one in Figure 13 . The results are presented for three different data sending rates seconds (top), seconds (middle) and seconds (bottom). As shown, the more the number of iterative communications, the more similar the boxplots becomes, i.e. the nodes (receivers) are reaching consensus. Similar to the TEC solutions, the DCB precision performance of the CKF corresponding to the receivers 4, 7, 9 and 8 is almost similar to that of the central one, irrespective of the number of iterative communications. This follows from the fact that the stated receivers have the maximum number of neighbors ( Figure 8 ), thus efficiently approximating the averages and in Eq. (28) after a few iterations. On the other hand, the receivers with the minimum number of neighbors require more number of iterative communications in order for their CKF precision performance to get similar to that of the central filter.
6. Concluding remarks and future outlook
In this contribution we reviewed Kalman filtering in its information form and showed how the additive measurement update (28) can be realized by employing average-consensus rules, even when
We developed a three-step recursion of the CKF error variance matrix ( Figure 7 ). This recursion conveys useful information about the precision performance of the local filters , thereby enabling one to a-priori design and analyze sensor networks with different numbers of iterative communications. As an illustrative example, we applied the stated recursion to a small-scale network of GNSS receivers and showed the role taken by the CKF in improving the precision of the solutions at each single receiver. In near future the proliferation of low-cost receivers will give rise to an increase in the number of GNSS users. Employing the CKF or other distributed filtering techniques, GNSS users can therefore potentially deliver high-precision parameter solutions without the need of having a computing center.
The second author is the recipient of an Australian Research Council Federation Fellowship (project number FF0883188). This support is gratefully acknowledged.