Open access peer-reviewed chapter

Sequential Mini-Batch Noise Covariance Estimator

Written By

Hee-Seung Kim, Lingyi Zhang, Adam Bienkowski, Krishna R. Pattipati, David Sidoti, Yaakov Bar-Shalom and David L. Kleinman

Reviewed: 07 November 2022 Published: 19 December 2022

DOI: 10.5772/intechopen.108917

From the Edited Volume

Kalman Filter - Engineering Applications

Edited by Yuri V. Kim

Chapter metrics overview

98 Chapter Downloads

View Full Metrics

Abstract

Noise covariance estimation in an adaptive Kalman filter is a problem of significant practical interest in a wide array of industrial applications. Reliable algorithms for their estimation are scarce, and the necessary and sufficient conditions for identifiability of the covariances were in dispute until very recently. This chapter presents the necessary and sufficient conditions for the identifiability of noise covariances, and then develops sequential mini-batch stochastic optimization algorithms for estimating them. The optimization criterion involves the minimization of the sum of the normalized temporal cross-correlations of the innovations; this is based on the property that the innovations of an optimal Kalman filter are uncorrelated over time. Our approach enforces the structural constraints on noise covariances and ensures the symmetry and positive definiteness of the estimated covariance matrices. Our approach is applicable to non-stationary and multiple model systems, where the noise covariances can occasionally jump up or down by an unknown level. The validation of the proposed method on several test cases demonstrates its computational efficiency and accuracy.

Keywords

  • Adaptive Kalman filtering
  • Minimal polynomial
  • Noise covariance estimation
  • Stochastic gradient descent (SGD)
  • Mini-batch SGD

1. Introduction

This chapter addresses the following learning problem: Given a vector time series and a library of models for the time evolution of the data (e.g., a Wiener process, a white noise acceleration model, also called nearly constant velocity model, or a white noise jerk model, also called nearly constant acceleration model), find suitable process and measurement noise covariances and select the best dynamic model for the time series. This problem is of considerable interest in a number of applications, such as fault diagnosis, robotics, signal processing, navigation, and target tracking, to name a few [1, 2].

The Kalman filter (KF) [3] is the optimal minimum mean square error (MMSE) state estimator for linear systems with mutually uncorrelated Gaussian white process and measurement noises, and is the best linear state estimator when the noises are non-Gaussian with known covariances. However, the noise covariances are unknown or only partially known in many practical applications.

We derived the necessary and sufficient conditions for the identifiability of unknown noise covariances, and presented a batch optimization algorithm for their estimation using the sum of the normalized temporal cross-correlations of the innovation sequence as the optimization criterion [4]. The motivation for this optimization metric stems from the fact that the innovations of an optimal Kalman filter are white, meaning that they are uncorrelated over time [2]. In [5], we proposed a sequential mini-batch stochastic gradient descent (SGD) algorithm that required multiple passes through the measurements for estimating noise covariances. We also presented its applicability to non-stationary systems by detecting changes in noise covariances. In this chapter, we present a practical single-pass stochastic gradient descent algorithm for noise covariance estimation in non-stationary systems. Extensions to multiple models where the system behavior can stem from a member of a known subset of models are discussed in [6].

1.1 Prior work

The key to noise covariance estimation is an expression for the covariance of the state estimation error and of the innovations of any stable, but not necessarily optimal, filter as a function of noise covariances. This expression serves as a foundational building block for the correlation-based methods for noise covariance estimation. Pioneering contributions using this approach were made by [7, 8, 9]. Sarkka and Nummenmaa [10] proposed a recursive noise-adaptive Kalman filter for linear state space models using variational Bayesian approximations. However, the variational methods generally require tuning hyper-parameters to converge to the correct covariance parameters and these algorithms often converge to local minima.

In [5], we presented a computationally efficient and accurate sequential estimation algorithm that is a major improvement over the batch estimation algorithm in [4]. The novelties of this algorithm stem from its sequential nature and the use of mini-batches, adaptive step size rules and dynamic thresholds for convergence in the stochastic gradient descent (SGD) algorithm. The innovation cross-correlations are obtained by a sequential fading memory filter. We applied a change-point detection algorithm described in [11] to extract the change points in noise covariances for non-stationary systems.

This chapter seeks to develop a streaming algorithm that reads measurements exactly once, thus making it real-time and practical. The only caveat is that the changes in noise covariances are assumed to occasionally jump up or down by an unknown magnitude. Extensions of this algorithm to a multiple model setting may be found in [6].

1.2 Organization of the chapter

The organisation of the chapter is as follows. Section 2 presents the mathematical formulation of the sequential mini-batch gradient descent algorithm for estimating the unknown noise covariances. In this section, we also present an overview of our approach based on a fading memory filter-based innovation correlation estimation, and an accelerated SGD update of the Kalman gain. In Section 3, we show that our single-pass method can track unknown noise covariances in non-stationary systems. Lastly, we conclude the chapter with a brief summary of the contributions in Section 4.

Advertisement

2. Sequential mini-batch SGD method for estimating process and measurement noise covariances

Consider a discrete-time linear dynamic system

xk+1=Fxk+ΓvkE1
zk=Hxk+wkE2

where xk is the nx-dimensional state vector, vk is the sequence of zero-mean white Gaussian process noise with unknown process noise covariance Qk in the plant equation. The measurement equation, zk with nz-dimensional vector, is given in (2). Here, wk is the sequence of zero-mean white Gaussian measurement noise with unknown measurement noise covariance Rk. In the system, F and Γ are the nx×nx state transition matrix and the noise gain matrix, respectively, and H is the nz×nx measurement matrix. Here, the two noise sequences and the initial state error are assumed to be mutually uncorrelated. We assume that noise covariances Qk and Rk are piecewise constant such that the filter reaches a steady-state between any two jumps of unknown magnitude.

Given Qk and Rk, the Kalman filter involves the consecutive processes of prediction and update given by [2, 3].

x̂k+1k=Fx̂kkE3
νk+1=zk+1Hx̂k+1kE4
x̂k+1k+1=x̂k+1k+Wk+1νk+1E5
Pk+1k=FPkkF+ΓQkΓE6
Sk+1=HPK+1kH+RkE7
Wk+1=Pk+1kHSk+11E8
Pk+1k+1=InxWk+1HPk+1kInxWk+1H
+Wk+1RkWk+1E9

The Kalman filter predicts the next state estimate at time index k+1, given the observations up to time index k in (3) and the concomitant predicted state estimation error covariance in (6), using system dynamics, the updated state error covariance Pkk at time index k and the process noise covariance, Qk. The updated state estimate at time k+1 in (5) incorporates the measurement at time k+1 via the Kalman gain matrix in (8), which depends on the innovation covariance Sk+1 (which in turn depends on the measurement noise covariance Rk, and the predicted state error covariance Pk+1k). The updated state error covariance Pk+1k+1 is computed via (9); this is the Joseph form, which is less sensitive to round-off error because it guarantees that the updated state covariance matrix will remain positive definite.

2.1 Identifiability conditions for estimating Q and R

The necessary and sufficient conditions for identifiability of the covariances in adaptive Kalman filters were in dispute until very recently [4, 7, 8, 9, 12]. When Q and R are unknown, consider the innovations corresponding to a stable, suboptimal closed-loop filter matrix F¯=FInxWH given by [4, 13].

νk=HF¯mxkmx̂kmkm1+Hj=0m1F¯m1jΓvkm+jFWwkm+j+wkE10

Given the innovation sequence (10), a weighted sum of innovations, ξk, can be computed as

ξk=i=0maiνkiE11

where the weights are the coefficients of the minimal polynomial of the closed-loop filter matrix F¯, i=0maiF¯mi=0,a0=1. It is easy to see that ξk is the sum of two moving average processes driven by the process noise and measurement noise, respectively, given by [4].

ξk=l=1mBlvkl+l=0mGlwklE12

Here, Bl and Gl are given by

Bl=Hi=0l1aiF¯li1ΓE13
Gl=alInzHi=0l1aiF¯li1FW;G0=InzE14

Then, the cross-covariance between ξk and ξkj, Lj, can be obtained as

Lj=Eξkξkj=i=j+1mBiQBij'+i=jmGiRGij'E15

The noise covariance matrices Q=qij of dimension nv×nv and R=rij of dimension nz×nz are positive definite and symmetric. By converting the noise covariance matrices and the Lj matrices to vectors, Zhang et al. [4] show that they are related by the noise covariance identifiability matrix I given by

IvecQvecR=L0L1LmE16

As shown in [4], if the matrix I has full rank, then the unknown noise covariance matrices, Q and R, are identifiable. Direct solution of linear equations in (16) for Q and R is highly ill-conditioned and is prone to numerical errors.

2.2 Recursive fading memory-based innovation correlation estimation

We compute the sample correlation matrix Ĉseqki at sample k for time lag i as a weighted combination of the correlation matrix Ĉseqk1i at the previous sample (k1) and time lag i, and the samples of innovations νki and νk. The tuning parameter λ, a positive constant between 0 and 1, is the weight associated with the previous sample correlation matrix. The current M sample correlation matrices at time k are used as the initial values for the next pairs of samples for the recursive computation. Let us define the number of measurement samples as N. Then,

Ĉseqki=1λνkiνk+λĈseqk1i,E17
Ĉseq0i=0,i=0,1,2,,M1;k=M,,NE18

2.3 Objective function and the gradient

The ensemble cross-correlation of a steady-state suboptimal Kalman filter is related to the closed-loop filter matrix F¯=FInxWH, the matrix F, the measurement matrix H, the steady-state predicted covariance matrix P¯, steady-state filter gain W and the steady-state innovation covariance, C0 via [8, 9].

Ci=Eνkνki=HF¯i1FP¯HWC0E19

To avoid the scaling effects of measurements, the objective function Ψ formulated in [4] involves a minimization of the sum of normalized Ci with respect to the corresponding diagonal elements of C0 for i>0. Formally, we can define the objective function Ψ to be minimized with respect to W as

Ψ=12tri=1M1diagC012CidiagC01CidiagC012E20

where diagC denotes the diagonal matrix of C. We can rewrite the objective function by substituting (20) into (19) as

Ψ=12tri=1M1ϕiXE21

where

ϕi=HF¯i1FφHF¯i1FE22
X=P¯HWC0E23
φ=diagC01E24

The gradient of objective function, WΨ, can be computed as [4].

WΨ=i=1M1HF¯i1FφCiφC0FZFXl=0i2Cl+1φCiφHF¯il2E25

where

Z=F¯'ZF¯+12i=1M1HF¯i1FφCiφH+HF¯i1FφCiφHE26

The Z term in (26) is computed by a Lyapunov equation; it is often small and can be neglected in (25) for computational efficiency.

In computing the objective function and the gradient, we replace Ci by their sample estimates, Ĉseqi. With this replacement, the noise covariance estimation becomes a data-dependent stochastic optimization/learning problem.

2.4 Estimation of Q and R

2.4.1 Estimation of R

We define μk as the post-fit residual sequence of the Kalman filter, which is related to the innovations νk via

μk=zkHx̂kk=InzHWνk;k=1,2,,NE27

From the joint covariance of the innovation sequence νk and the post-fit residual sequence μk, and the Schur determinant identity [14, 15], one can show that [4].

G=Eμkμk=RS1RE28

where S is the innovation covariance. Knowing the sampled estimates of G and S=Ĉseq0, the measurement noise covariance R is estimated. Because (28) can be interpreted as a continuous-time algebraic Riccati equation or as a simultaneous diagonalization problem in linear algebra [15], the measurement noise covariance R can be estimated by solving a continuous-time Riccati equation as in [4, 16] or by solving the simultaneous diagonalization problem via Cholesky decomposition and eigen decomposition.

2.4.2 Estimation of Q

Since the process noise covariance Q and the steady-state updated covariance P are generally coupled, Q and P can be obtained via a Gauss–Seidel type iterative computation given the estimated R. Wiener process is an exception where an explicit non-iterative solution Q=WSW is possible [4]. Let t and l denote the indices of iteration starting with t = 0 and l = 0. The initial steady-state updated covariance, P0, can be computed as the solution of the Lyapunov equation given by

P0=F˜P0F˜+WRW+InxWHΓQtΓInxWH;Q0=WSWE29

where F˜=InxWHF. We iteratively update P as in (30) until convergence

Pl+1=FPlF+ΓQtΓ1+HR1H1E30

Given the converged P, Q will be updated in the t-loop until the estimate of Q converges.

Qt+1=ΓP+WSWFPFt+1+λQInxΓE31

where λQ is a regularization parameter used for ill-conditioned estimation problems.

2.5 Updating the gain W sequentially

The estimation algorithm sequentially computes the M sample covariance matrices at every measurement sample k as in (17). Let B be the mini-batch size for updating the Kalman filter gain W in the SGD. Our proposed method updates the gain W when the sample index k is divisible by the mini-batch size B. When compared to the batch estimation algorithm, the sequential mini-batch SGD algorithm allows more opportunities to converge to a better local minimum of (20) by frequently updating the filter the gain [5]. The generic form of the gain update is given by

Wr+1=WrαrWrΨE32

where r is the updating index starting with r=0. In our previous research [5], we explored the performance of accelerated SGD methods (e.g., bold driver [17], constant, subgradient [18], RMSProp [19], Adam [20], Adadelta [21]) for updating adaptive step size αr in (32). The root mean square propagation (RMSProp) method is applied for the estimation procedure in this chapter. The RMSProp keeps track of the moving average of the squared incremental gradients for each gain element by adapting the step size element-wise as in the following.

τr,ij=γτr1,ij+1γWrΨij2;τ0=0E33
αijr=α0τr,ij+εE34

Here, γ=0.9 is the default value and ε=108 to prevent division by zero.

The pseudocode for the sequential mini-batch SGD estimation algorithm for a non-stationary system is included as Algorithm 1.

Algorithm 1 Pseudocode of sequential mini-batch SGD algorithm.

1: input: W0, Q0, R0, α0, BW0: initial gain, Q0: initial Q, R0: initial R, α0: initial step size, B: batch size.

2: r = 0 { Initialize the updating index r}.

3: fork = 1 to Ndo {N: Number of samples}.

4:  compute innovation correlations νk.

5:  ifk>Nb+Mthen {Nb: Number of burn-in samples}.

6:   compute Ĉseqki, i = 0,1,2,...M-1.

7:   ifModkB=0then.

8:    compute the objective function Ψ.

9:    compute the gradient WΨ.

10:   update the step size αr.

11:   update the gain Wijr+1=WijrαijrWrΨ]ij.

12:   update Rr+1 and Qr+1.

13:   r = r + 1.

14:   end if.

15:  end if.

16: end for.

Advertisement

3. Numerical examples

In [5], we provided the evidence that the multi-pass sequential mini-batch stochastic gradient descent (SGD) algorithms improve the computational efficiency of the batch estimation algorithm via a number of test cases used in [2, 7, 8, 9, 12], and also showed their applicability to non-stationary systems when coupled with a change-point detection algorithm [11]. In [22], we proposed a single-pass sequential mini-batch SGD estimation algorithm by accessing measurements exactly once for non-stationary systems by modifying the example used in [12] to periodically change the process and measurement noise covariances.

In this section, we illustrate the utility of our proposed single-pass sequential mini-batch SGD estimation algorithm by applying it to general diverse examples involving detectable (but not completely observable) systems, non-stationary systems and a bearings-only tracking problem.

For the non-stationary systems, we assumed the process and measurement noise covariances occasionally change by an unknown level. Here “occasionally” implies the jumps are infrequent enough that the Kalman filter is in the steady-state prior to a jump in the noise covariance. We define the number of subgroups in which the noise covariances are not changing as Nsg. Given the number of observation samples, N, each subgroup has constant noise covariances with N/Nsg samples. In this section, we consider two non-stationary scenarios for tracking time-varying Q and R with Nsg=5 subgroups. We also consider the bearings-only tracking problem where Q changes continuously.

Note that the number of “burn-in” samples and the number of lags are Nb=50 and M=5, respectively in the estimation procedure. The root mean square propagation (RMSProp) method is applied to update the filter gain. All Monte Carlo simulations were run using a computer with an Intel Core i7-8665U processor and 16 GB of RAM.

We used the averaged normalized innovation squared (NIS) metric [2] to measure the consistency of the proposed algorithm.

ε¯νk=1Nmci=1NmcνkS1νkE35

where Nmc is the number of Monte Carlo runs. The root mean square error (RMSE) in resultant position and velocity is computed using

RMSEk=1Nmci=1Nmcxktruex̂k2E36

3.1 Case 1: A detectable (but not completely observable) system that satisfies the identifiability conditions

Mehra [8] stated, without proof, that a necessary and sufficient condition for noise covariance estimation is that the system satisfies the observability property. This example, due to Odelson et al. [7], demonstrates that this condition is not necessary. The example does satisfy the full column rank condition for the identifiability matrix in (16).

Odelson et al. [7] proposed a noise covariance estimation method based on the autocovariance least-squares formulation by using the Kronecker operator δ. This method computes the covariances from the residuals of the state estimation. Note that the incompletely observable (but detectable 1) system used in [7] is described by

F=0.1000.2,H=10,Γ=12E37

where F is the non-singular transition matrix, H is the constant output matrix, and Γ is the constant input matrix in (1) and (2). Note that this system is a hypothetical numerical example. The process noise vk and the measurement noise wk are supposed to be uncorrelated Gaussian white noise sequences with zero-mean and covariances as in the following

Evkvj=QδkjE38
Ewkwj=RδkjE39

In this scenario, the true R values for the five subgroups are [0.30, 0.81, 0.49, 0.72, 0.42], and the true Q values for the five subgroups are [0.16, 0.49, 0.25, 0.36, 0.20]. The values are changed every 10,000 samples. Table 1 shows the results of 100 Monte Carlo simulations based on the single-pass SGD algorithm in estimating Q and R. As can be seen, the estimated Q and R are close to their corresponding true values. In this scenario, the single-pass SGD estimation method has a speedup factor of 31 over the batch and multiple-pass SGD estimation methods (not shown).

Subgroup indexRQP¯11P¯22W11W21
TruthMeanRMSETruthMeanRMSETruthMeanRMSETruthMeanRMSETruthMeanRMSETruthMeanRMSE
1st0.300.310.060.160.150.040.160.150.040.660.620.180.350.330.030.700.670.03
2nd0.810.860.490.420.490.432.011.740.380.340.760.67
3rd0.490.490.250.240.250.241.030.970.340.340.680.67
4th0.720.710.360.350.360.351.481.440.330.340.670.67
5th0.420.410.200.200.200.210.830.840.330.340.660.67

Table 1.

Single-pass SGD estimation for Case 1 (100 MC Runs; 50,000 samples; M = 5; RMSProp; Batch size = 64).

Figure 1 demonstrates that the sequential mini-batch gradient descent algorithm can track Q and R correctly. Here, the trajectories of Q and R estimates are smoothed by a simple first order fading memory filter with a smoothing weight of 0.7. Figure 1e shows the averaged NIS of SGD (RMSProp; batch size of 64) method with the 95% probability region [0.74, 1.30], and shows that the SGD-based Kalman filter is consistent. The only place at which the NIS values are large are immediately after the jump in the noise variances. This is because adaptation requires a few samples.

Figure 1.

Trajectories of Q and R estimates without signal smoothing and with a smoothing weight of 0.7 for Case 1.

3.2 Case 2: a five-state inertial navigation system with diagonal Q and R

For estimating the unknown noise covariance parameters and the optimal Kalman filter gain for part of an inertial navigation system (INS), Mehra [8] proposed an iterative innovation correlations-based method starting from an arbitrary initial stabilizing gain. Inertial navigation [25] involves tracking the position and orientation of an object relative to a known starting orientation and velocity and it uses measurements provided by accelerometers and gyroscopes. These systems have found universal use in military and commercial applications [26].

Since the earth is not flat, the inertial navigation systems need to keep tilting the platform (with respect to inertial space) to keep the axes of the accelerometers horizontal. Here, small error sources that drive the Schuler-loop cause the navigation errors, and these errors are” damped” by making use of external velocity measurements, such as are furnished by a Doppler radar [27, 28].

In this problem, Mehra [8] used a system based on the damped Schuler-loop error propagation forced by exponentially correlated as well as white noise input. The system matrices for this navigation system are given by

F=0.751.740.300.150.090.910.001500.008000.95000000.55000000.905;H=1000101010;Γ=00000024.640000.8350001.83E40

where the system is discretized using a time step of 0.1 seconds. In this five-state system, the first two states represent the a velocity damping term and velocity error, respectively, and the other three states model the correlated noise processes. The noise corresponding to states 3 and 5 impacts both the velocity error and the velocity damping term; the fourth state impacts the sensor error in state 2 only.

In this problem, the true values corresponding to each subgroup with Nsg=5 subgroups and N=100,000 samples are as in (41). Each parameter of Q and R changes every 20,000 samples.

R11R22=0.250.560.640.420.360.250.250.490.160.04;Q11Q22Q33=0.250.640.490.250.490.250.360.560.160.040.360.490.640.250.09E41

Table 2 shows the results of 100 Monte Carlo simulations for estimating the noise parameters using RMSProp update. The estimated parameters are close to the corresponding true values. Given 100,000 samples, the proposed method with a batch-size of 64 requires 1,891 seconds for 100 Monte Carlo simulations, i.e., 18.91 seconds per run. The batch and multi-pass SGD estimation methods need more than 3,000 seconds for a single MC run (not shown); the single-pass SGD algorithm has a speedup factor of 158.6.

(a) R, Q and P¯ Estimates for Case 2
Subgroup indexR11RMSER22RMSEQ11RMSEQ22RMSEQ33RMSE
TruthMeanTruthMeanTruthMeanTruthMeanTruthMean
1st0.250.230.340.250.240.040.250.240.050.250.230.050.360.350.06
2nd0.560.520.250.250.640.620.360.310.490.44
3rd0.640.850.490.500.490.500.560.520.640.71
4th0.420.440.160.160.250.260.160.160.250.25
5th0.360.300.040.040.490.480.040.040.090.08
Subgroup indexP¯11RMSEP¯22RMSEP¯33RMSEP¯44RMSEP¯55RMSE
TruthMeanTruthMeanTruthMeanTruthMeanTruthMean
1st19.4618.844.120.330.310.05306.26297.6973.270.230.220.054.023.860.71
2nd43.7642.110.430.40766.59744.980.340.305.454.99
3rd37.8140.150.660.67600.91621.040.520.497.338.00
4th18.4918.960.220.22303.75312.240.150.152.782.78
5th29.3128.550.070.07574.37562.900.040.040.970.93
(b) W Estimates for Case 2
Subgroup indexW11RMSEW21RMSEW31RMSEW41RMSEW51RMSE
TruthMeanTruthMeanTruthMeanTruthMeanTruthMean
1st0.940.960.022.781033.561030.01−2.80−2.750.042.221056.501030.010.040.040.01
2nd0.960.983.221034.32103−2.91−2.888.761048.371030.030.01
3rd0.940.952.841033.84103−2.79−2.831.191048.791030.040.03
4th0.940.953.841032.08104−2.80−2.846.801043.891040.030.02
5th0.981.003.501031.33104−3.03−3.001.091032.101040.010.01
Subgroup indexW12RMSEW22RMSEW32RMSEW42RMSEW52RMSE
TruthMeanTruthMeanTruthMeanTruthMeanTruthMean
1st0.940.940.080.380.390.02−1.63−1.650.250.230.230.02−0.94−0.930.05
2nd1.021.000.400.42−1.65−1.700.270.25−1.02−0.98
3rd0.860.960.360.37−1.56−1.680.260.25−0.86−0.93
4th0.990.980.390.39−1.57−1.690.230.23−0.98−0.99
5th1.191.060.440.46−1.26−1.770.200.22−1.17−1.11

Table 2.

Single-pass SGD estimation for Case 2 (100 MC Runs, 100,000 samples; M = 5; RMSProp; Batch size = 64).

Figure 2 shows the trajectories of the estimated Q and R with a signal smoothing with a smoothing weight of 0.7. For this example, it is known that accurate estimation of R11 is hard as shown in Figure 2d. The reason is that R11 is dominated by the state uncertainty, i.e., the measurement noise is “buried” in a much larger innovation [4]. In spite of the difficulty in estimating R11, the filter is consistent as measured by NIS as shown in Figure 2f.

Figure 2.

Trajectories of Q and R estimates with a signal smoothing at smoothing weight = 0.7 for Case 2.

3.3 Case 3: Bearings-only tracking problem

In many practical situations, it is generally hard to get a closed-form solution for state estimation because the noise covariances are often unknown and the dynamics are nonlinear. Arasaratnam et al. [29] proposed a nonlinear filter using bearings-only measurements for estimating the position and velocity of a target in a high-dimensional state. This method is based on the measurements from a passive sensor that measures only the direction of arrival of a signal emitted by the target [2]. This so-called bearings-only tracking problem arises in a variety of practical applications, such as air traffic control, underwater sonar tracking and aircraft surveillance [2, 30, 31].

In this example, we consider a two-dimensional bearings-only tracking problem of a nearly-constant velocity target from a single moving observer used in [32]. The dynamics of the target (relative to the observer) are described by

xk+1=Fxk+ΓvkUkE42
zk=hxk+wkE43

Formally, if the state vector of the target is xtk=ζtηtζ̇tη̇t', and the state vector of the observer is xok=ζoηoζ̇oη̇o' for position and velocity along the ζ and η axes, xk=xtkxok represents the relative state vector of the target with respect to the observer and the input vector Uk=xokFxok1; wk is a zero-mean white Gaussian noise with variance σθ2. The nonlinear measurement involves the bearing of the target from the observer’s platform, given by hxk=tan1ζ/η. Here, Γ is the identity matrix with ones on the diagonal and zeros elsewhere.

The system matrices for this problem are given by

F=10T0010T00100001,Γ=1000010000100001,Q=T3/30T2/200T3/30T2/2T2/20T00T2/20Tq˜kE44

where the sampling interval, T is 1 second. The zero-mean white process noise intensity q˜k is q˜0=91012km2/s3, except for the interval where it starts to increase rapidly to 1.5 q˜0 around the sample index k = 480 and then decreases again rapidly around k = 960 as below:

q˜k=q˜0+0.25q˜01+tanh0.015k480,k720q˜0+0.25q˜01+tanh0.015960k,otherwiseE45

The linearized measurement matrix, Hk, is the Jacobian of the measurement function given by

Hk=hxkxk=ηkζ2k+η2kζkζ2k+η2k00E46

A total of 1920 measurement samples were generated for this scenario. The observer moves straight with a speed of 5 knots, except for 480 seconds (between k = 480 and k = 960), where it turns with 2.4/s as shown in Figure 3 (these times are marked by cross sign).

Figure 3.

Observer and target trajectory (100 MC runs).

For a fair comparison of the estimation algorithms, we initialized all filters with the same mean and covariance using the prior knowledge of the initial target range and the initial bearing measurement [33, 34]. Here, the initial target range and the initial bearing measurement are generated as r¯Nrσr2 and θ¯0Nθσθ2, respectively, where r is the true initial target range and θ is the true initial bearing measurement. The initial target speed is initialized as s¯Nsσs2, where s is the true initial target speed. Let σ<> denotes the standard deviation of the parameter. Assuming that the target is moving towards the observer, the initial course estimate can be obtained as c¯=θ¯0+π. The initial state vector and the initial covariance are given by

x̂0=ζ̂η̂ζ̇̂η̇̂=r¯cosθ¯0r¯sinθ¯0s¯sinc¯ζ̇0os¯cosc¯η̇0o;P0=PζζPζη00PηζPηη0000Pζ̇ζ̇Pζ̇η̇00Pη̇ζ̇Pη̇η̇E47

where

Pζζ=r¯2σθ2cos2θ¯0+σr2sin2θ¯0;Pηη=r¯2σθ2sin2θ¯0+σr2cos2θ¯0E48
Pζη=Pηζ=σr2r¯2σθ2sinθ¯0cosθ¯0;Pζ̇ζ̇=s¯2σc2cos2c¯+σs2sin2c¯E49
Pη̇η̇=s¯2σc2sin2c¯+σs2cos2c¯;Pζ̇η̇=Pη̇ζ̇=σs2s¯2σc2sinc¯cosc¯E50

where r and s are 5 km and 4 knots, respectively, and the target course is 140. Here, σr is 2 km, σθ is 1.5 , σs is 2 knots and σc=π/12 for this problem.

Figure 4 shows a comparison of algorithms for the bearings-only tracking problem. The cubature Kalman filter (CKF) uses a third-degree spherical-radial cubature rule that provides the set of cubature points scaling linearly with the state-vector dimension [29]. The cubature Kalman filter and our single-pass SGD extended KF (EKF) method can track the target well, but our proposed method shows better computational efficiency compared to CKF by a factor of 2.5 (not shown). Root mean square error (RMSE) in position and velocity over 100 Monte Carlo runs are shown in Figure 4c and Figure 4d. During the whole maneuver, the RMSE of the proposed single-pass SGD-EKF algorithm was slightly lower than that with the CKF method.

Figure 4.

Comparison of estimation algorithms for the bearings-only tracking problem (100 MC runs).

Advertisement

4. Conclusions

In this chapter, we derived a single-pass sequential mini-batch SGD algorithm for estimating the noise covariances in an adaptive Kalman filter. We demonstrated the utility of the method using diverse examples involving a detectable (but not completely observable) system, a non-stationary system, and a nonlinear bearings-only tracking problem. The evaluation showed that the proposed method has acceptable state estimation root mean square error (RMSE) and exhibits filter consistency as measured by the normalized innovation squared (NIS) criterion.

Advertisement

Acknowledgments

This work was supported in part by the U.S. Office of Naval Research (ONR), in part by the U.S. Naval Research Laboratory (NRL) under Grant N00014-18-1-1238, N00014-21-1-2187 and Grant N00173-16-1-G905.

Advertisement

Abbreviations

CKFCubature Kalman filter
EKFExtended Kalman filter
INSInertial navigation system
KFKalman filter
MMSEMinimum mean square error
NISNormalized innovation squared
RMSERoot mean square error
RMSPropRoot mean square propagation
SGDStochastic gradient descent

References

  1. 1. Auger F, Hilairet M, Guerrero JM, Monmasson E, Orlowska-Kowalska T, Katsura S. Industrial applications of the Kalman filter: A review. IEEE Transactions on Industrial Electronics. 2013;60(12):5458-5471
  2. 2. Bar-Shalom Y, Li XR, Kirubarajan T. Estimation with applications to tracking and navigation: theory algorithms and software. New York, NY: John Wiley & Sons; 2001
  3. 3. Kalman RE. A new approach to linear filtering and prediction problems. Journal of Basic Engineering. 1960;82(1):35-45
  4. 4. Zhang L, Sidoti D, Bienkowski A, Pattipati KR, Bar-Shalom Y, Kleinman DL. On the identification of noise covariances and adaptive Kalman Filtering: A new look at a 50 year-old problem. IEEE Access. 2020;8:59362-59388
  5. 5. Kim HS, Zhang L, Bienkowski A, Pattipati KR. Multi-pass sequential mini-batch stochastic gradient descent algorithms for noise covariance estimation in adaptive Kalman filtering. IEEE Access. 2021;9:99220-99234
  6. 6. Kim HS, Bienkowski A, Pattipati KR. A single-pass noise covariance estimation algorithm in multiple-model adaptive Kalman filtering for non-stationary systems. TechRxiv. 2022. DOI: 10.36227/techrxiv.14761005.v2
  7. 7. Odelson BJ, Rajamani MR, Rawlings JB. A new autocovariance least-squares method for estimating noise covariances. Automatica. 2006;42(2):303-308
  8. 8. Mehra R. On the identification of variances and adaptive Kalman filtering. IEEE Transactions on automatic control. 1970;15(2):175-184
  9. 9. Belanger PR. Estimation of noise covariance matrices for a linear time-varying stochastic process. Automatica. 1974;10(3):267-275
  10. 10. Sarkka S, Nummenmaa A. Recursive noise adaptive Kalman filtering by variational Bayesian approximations. IEEE Transactions on Automatic Control. 2009;54(3):596-600
  11. 11. Killick R, Fearnhead P, Eckley IA. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association. 2012;107(500):1590-1598
  12. 12. Neethling C, Young P. Comments on “Identification of optimum filter steady-state gain for systems with unknown noise covariances”. IEEE Transactions on Automatic Control. 1974;19(5):623-625
  13. 13. Tajima K. Estimation of steady-state Kalman filter gain. IEEE Transactions on Automatic Control. 1978;23(5):944-945
  14. 14. Bertsekas DP. Athena scientific optimization and computation series. In: Nonlinear Programming. Belmont, MA: Athena Scientific; 2016. Available from: http://www.athenasc.com/nonlinbook.html
  15. 15. Golub GH, Van Loan CF. Matrix computations. Vol. 3. Baltimore, MD: JHU press; 2013
  16. 16. Arnold WF, Laub AJ. Generalized eigenproblem algorithms and software for algebraic Riccati equations. Proceedings of the IEEE. 1984;72(12):1746-1754
  17. 17. Battiti R. Accelerated backpropagation learning: Two optimization methods. Complex Systems. 1989;3(4):331-342
  18. 18. Boyd S, Xiao L, Mutapcic A. Subgradient methods. lecture notes of EE392o, Stanford University, Autumn Quarter. 2003;2004:2004-2005. Available from: www.stanford.edu/class/ee392o/subgrad method.pdf
  19. 19. Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning. 2012;4(2):26-31
  20. 20. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv. 2014. DOI: 10.48550/arXiv.1412.6980
  21. 21. Zeiler MD. Adadelta: an adaptive learning rate method. arXiv. 2012. DOI: 10.48550/arXiv.1212.5701
  22. 22. Kim HS, Zhang L, Bienkowski A, Pattipati KR. A single-pass noise covariance estimation algorithm in adaptive Kalman filtering for non-stationary systems. In: 2021 IEEE 24th International Conference on Information Fusion (FUSION). IEEE; 2021. pp. 556-563. DOI: 10.23919/FUSION49465.2021.9626861
  23. 23. Naets F, Cuadrado J, Desmet W. Stable force identification in structural dynamics using Kalman filtering and dummy-measurements. Mechanical Systems and Signal Processing. 2015;50:235-248
  24. 24. Simon D. Optimal state estimation: Kalman, H infinity, and nonlinear approaches. Hoboken, NJ: John Wiley & Sons; 2006
  25. 25. Woodman OJ. An introduction to inertial navigation. Technical Report, Computer Laboratory. Cambridge, UK: University of Cambridge; 2007 UCAM-CL-TR-696. DOI: 10.48456/tr-696
  26. 26. Kuritsky MM, Goldstein MS. Inertial navigation. Proceedings of the IEEE. 1983;71:1156-1176
  27. 27. Heller WG. Free-Inertial and Damped-Inertial Navigation Mechanization and Error Equations. Reading, MA: Analytic Sciences Corp; 1975
  28. 28. King A. Inertial navigation-forty years of evolution. GEC review. 1998;13(3):140-149
  29. 29. Arasaratnam I, Haykin S. Cubature Kalman Filters. IEEE Transactions on Automatic Control. 2009;54(6):1254-1269
  30. 30. Farina A. Target tracking with bearings – Only measurements. Signal Processing. 1999;78(1):61-78
  31. 31. Aidala VJ. Kalman Filter Behavior in Bearings-Only Tracking Applications. IEEE Transactions on Aerospace and Electronic Systems. 1979;AES-15(1):29-39
  32. 32. Leong PH, Arulampalam S, Lamahewa TA, Abhayapala TD. A Gaussian-Sum Based Cubature Kalman Filter for Bearings-Only Tracking. IEEE Transactions on Aerospace and Electronic Systems. 2013;49(2):1161-1176
  33. 33. Ristic B, Arulampalam S, Gordon N. Beyond the Kalman filter: Particle filters for tracking applications. Norwood, MA: Artech house; 2003
  34. 34. Kumar K, Bhaumik S, Arulampalam S. Tracking an Underwater Object with Unknown Sensor Noise Covariance Using Orthogonal Polynomial Filters. Sensors. 2022;22(13):4870

Notes

  • The pair (F, H) in the system should be detectable in order for the continuous-time algebraic Riccati equation to have at least one positive semidefinite solution and in this case at least one solution results in a marginally stable steady-state KF [23, 24].

Written By

Hee-Seung Kim, Lingyi Zhang, Adam Bienkowski, Krishna R. Pattipati, David Sidoti, Yaakov Bar-Shalom and David L. Kleinman

Reviewed: 07 November 2022 Published: 19 December 2022