Sequential Mini-Batch Noise Covariance Estimator

Hee-Seung Kim; Lingyi Zhang; Adam Bienkowski; Krishna R. Pattipati; David Sidoti; Yaakov Bar-Shalom; David L. Kleinman

doi:10.5772/intechopen.108917

Abstract

Noise covariance estimation in an adaptive Kalman filter is a problem of significant practical interest in a wide array of industrial applications. Reliable algorithms for their estimation are scarce, and the necessary and sufficient conditions for identifiability of the covariances were in dispute until very recently. This chapter presents the necessary and sufficient conditions for the identifiability of noise covariances, and then develops sequential mini-batch stochastic optimization algorithms for estimating them. The optimization criterion involves the minimization of the sum of the normalized temporal cross-correlations of the innovations; this is based on the property that the innovations of an optimal Kalman filter are uncorrelated over time. Our approach enforces the structural constraints on noise covariances and ensures the symmetry and positive definiteness of the estimated covariance matrices. Our approach is applicable to non-stationary and multiple model systems, where the noise covariances can occasionally jump up or down by an unknown level. The validation of the proposed method on several test cases demonstrates its computational efficiency and accuracy.

Keywords

Adaptive Kalman filtering
Minimal polynomial
Noise covariance estimation
Stochastic gradient descent (SGD)
Mini-batch SGD

Author Information

Show +

Hee-Seung Kim*
- Department of Electrical and Computer Engineering, University of Connecticut, USA
Lingyi Zhang
- HRL Laboratories, Intelligent System Laboratory, USA
Adam Bienkowski
- Department of Electrical and Computer Engineering, University of Connecticut, USA
Krishna R. Pattipati
- Department of Electrical and Computer Engineering, University of Connecticut, USA
David Sidoti
- U.S. Naval Research Laboratory, Marine Meteorology Division, USA
Yaakov Bar-Shalom
- Department of Electrical and Computer Engineering, University of Connecticut, USA
David L. Kleinman
- Department of Electrical and Computer Engineering, University of Connecticut, USA

*Address all correspondence to: hee-seung.kim@uconn.edu

1. Introduction

This chapter addresses the following learning problem: Given a vector time series and a library of models for the time evolution of the data (e.g., a Wiener process, a white noise acceleration model, also called nearly constant velocity model, or a white noise jerk model, also called nearly constant acceleration model), find suitable process and measurement noise covariances and select the best dynamic model for the time series. This problem is of considerable interest in a number of applications, such as fault diagnosis, robotics, signal processing, navigation, and target tracking, to name a few [1, 2].

The Kalman filter (KF) [3] is the optimal minimum mean square error (MMSE) state estimator for linear systems with mutually uncorrelated Gaussian white process and measurement noises, and is the best linear state estimator when the noises are non-Gaussian with known covariances. However, the noise covariances are unknown or only partially known in many practical applications.

We derived the necessary and sufficient conditions for the identifiability of unknown noise covariances, and presented a batch optimization algorithm for their estimation using the sum of the normalized temporal cross-correlations of the innovation sequence as the optimization criterion [4]. The motivation for this optimization metric stems from the fact that the innovations of an optimal Kalman filter are white, meaning that they are uncorrelated over time [2]. In [5], we proposed a sequential mini-batch stochastic gradient descent (SGD) algorithm that required multiple passes through the measurements for estimating noise covariances. We also presented its applicability to non-stationary systems by detecting changes in noise covariances. In this chapter, we present a practical single-pass stochastic gradient descent algorithm for noise covariance estimation in non-stationary systems. Extensions to multiple models where the system behavior can stem from a member of a known subset of models are discussed in [6].

1.1 Prior work

The key to noise covariance estimation is an expression for the covariance of the state estimation error and of the innovations of any stable, but not necessarily optimal, filter as a function of noise covariances. This expression serves as a foundational building block for the correlation-based methods for noise covariance estimation. Pioneering contributions using this approach were made by [7, 8, 9]. Sarkka and Nummenmaa [10] proposed a recursive noise-adaptive Kalman filter for linear state space models using variational Bayesian approximations. However, the variational methods generally require tuning hyper-parameters to converge to the correct covariance parameters and these algorithms often converge to local minima.

In [5], we presented a computationally efficient and accurate sequential estimation algorithm that is a major improvement over the batch estimation algorithm in [4]. The novelties of this algorithm stem from its sequential nature and the use of mini-batches, adaptive step size rules and dynamic thresholds for convergence in the stochastic gradient descent (SGD) algorithm. The innovation cross-correlations are obtained by a sequential fading memory filter. We applied a change-point detection algorithm described in [11] to extract the change points in noise covariances for non-stationary systems.

This chapter seeks to develop a streaming algorithm that reads measurements exactly once, thus making it real-time and practical. The only caveat is that the changes in noise covariances are assumed to occasionally jump up or down by an unknown magnitude. Extensions of this algorithm to a multiple model setting may be found in [6].

1.2 Organization of the chapter

The organisation of the chapter is as follows. Section 2 presents the mathematical formulation of the sequential mini-batch gradient descent algorithm for estimating the unknown noise covariances. In this section, we also present an overview of our approach based on a fading memory filter-based innovation correlation estimation, and an accelerated SGD update of the Kalman gain. In Section 3, we show that our single-pass method can track unknown noise covariances in non-stationary systems. Lastly, we conclude the chapter with a brief summary of the contributions in Section 4.

2. Sequential mini-batch SGD method for estimating process and measurement noise covariances

Consider a discrete-time linear dynamic system

xk+1=Fxk+ΓvkE1

zk=Hxk+wkE2

where xk is the nx-dimensional state vector, vk is the sequence of zero-mean white Gaussian process noise with unknown process noise covariance Qk in the plant equation. The measurement equation, zk with nz-dimensional vector, is given in (2). Here, wk is the sequence of zero-mean white Gaussian measurement noise with unknown measurement noise covariance Rk. In the system, F and Γ are the nx×nx state transition matrix and the noise gain matrix, respectively, and H is the nz×nx measurement matrix. Here, the two noise sequences and the initial state error are assumed to be mutually uncorrelated. We assume that noise covariances Qk and Rk are piecewise constant such that the filter reaches a steady-state between any two jumps of unknown magnitude.

Given Qk and Rk, the Kalman filter involves the consecutive processes of prediction and update given by [2, 3].

x̂k+1k=Fx̂kkE3

νk+1=zk+1−Hx̂k+1kE4

x̂k+1k+1=x̂k+1k+Wk+1νk+1E5

Pk+1k=FPkkF′+ΓQkΓ′E6

Sk+1=HPK+1kH′+RkE7

Wk+1=Pk+1kH′Sk+1−1E8

Pk+1k+1=Inx−Wk+1HPk+1kInx−Wk+1H′

+Wk+1RkWk+1′E9

The Kalman filter predicts the next state estimate at time index k+1, given the observations up to time index k in (3) and the concomitant predicted state estimation error covariance in (6), using system dynamics, the updated state error covariance Pkk at time index k and the process noise covariance, Qk. The updated state estimate at time k+1 in (5) incorporates the measurement at time k+1 via the Kalman gain matrix in (8), which depends on the innovation covariance Sk+1 (which in turn depends on the measurement noise covariance Rk, and the predicted state error covariance Pk+1k). The updated state error covariance Pk+1k+1 is computed via (9); this is the Joseph form, which is less sensitive to round-off error because it guarantees that the updated state covariance matrix will remain positive definite.

2.1 Identifiability conditions for estimating Q and R

The necessary and sufficient conditions for identifiability of the covariances in adaptive Kalman filters were in dispute until very recently [4, 7, 8, 9, 12]. When Q and R are unknown, consider the innovations corresponding to a stable, suboptimal closed-loop filter matrix F¯=FInx−WH given by [4, 13].

νk=HF¯mxk−m−x̂k−mk−m−1+H∑j=0m−1F¯m−1−jΓvk−m+j−FWwk−m+j+wkE10

Given the innovation sequence (10), a weighted sum of innovations, ξk, can be computed as

ξk=∑i=0maiνk−iE11

where the weights are the coefficients of the minimal polynomial of the closed-loop filter matrix F¯, ∑i=0maiF¯m−i=0,a0=1. It is easy to see that ξk is the sum of two moving average processes driven by the process noise and measurement noise, respectively, given by [4].

ξk=∑l=1mBlvk−l+∑l=0mGlwk−lE12

Here, Bl and Gl are given by

Bl=H∑i=0l−1aiF¯l−i−1ΓE13

Gl=alInz−H∑i=0l−1aiF¯l−i−1FW;G0=InzE14

Then, the cross-covariance between ξk and ξk−j, Lj, can be obtained as

Lj=Eξkξk−j′=∑i=j+1mBiQBi−j'+∑i=jmGiRGi−j'E15

The noise covariance matrices Q=qij of dimension nv×nv and R=rij of dimension nz×nz are positive definite and symmetric. By converting the noise covariance matrices and the Lj matrices to vectors, Zhang et al. [4] show that they are related by the noise covariance identifiability matrix I given by

IvecQvecR=L0L1⋮LmE16

As shown in [4], if the matrix I has full rank, then the unknown noise covariance matrices, Q and R, are identifiable. Direct solution of linear equations in (16) for Q and R is highly ill-conditioned and is prone to numerical errors.

2.2 Recursive fading memory-based innovation correlation estimation

We compute the sample correlation matrix Ĉseqki at sample k for time lag i as a weighted combination of the correlation matrix Ĉseqk−1i at the previous sample (k−1) and time lag i, and the samples of innovations νk−i and νk. The tuning parameter λ, a positive constant between 0 and 1, is the weight associated with the previous sample correlation matrix. The current M sample correlation matrices at time k are used as the initial values for the next pairs of samples for the recursive computation. Let us define the number of measurement samples as N. Then,

Ĉseqki=1−λνk−iνk′+λĈseqk−1i,E17

Ĉseq0i=0,i=0,1,2,⋯,M−1;k=M,⋯,NE18

2.3 Objective function and the gradient

The ensemble cross-correlation of a steady-state suboptimal Kalman filter is related to the closed-loop filter matrix F¯=FInx−WH, the matrix F, the measurement matrix H, the steady-state predicted covariance matrix P¯, steady-state filter gain W and the steady-state innovation covariance, C0 via [8, 9].

Ci=Eνkνk−i′=HF¯i−1FP¯H′−WC0E19

To avoid the scaling effects of measurements, the objective function Ψ formulated in [4] involves a minimization of the sum of normalized Ci with respect to the corresponding diagonal elements of C0 for i>0. Formally, we can define the objective function Ψ to be minimized with respect to W as

Ψ=12tr∑i=1M−1diagC0−12Ci′diagC0−1CidiagC0−12E20

where diagC denotes the diagonal matrix of C. We can rewrite the objective function by substituting (20) into (19) as

Ψ=12tr∑i=1M−1ϕiXφX′E21

where

ϕi=HF¯i−1F′φHF¯i−1FE22

X=P¯H′−WC0E23

φ=diagC0−1E24

The gradient of objective function, ∇WΨ, can be computed as [4].

∇WΨ=−∑i=1M−1HF¯i−1F′φCiφC0−F′ZFX−∑l=0i−2Cl+1φCi′φHF¯i−l−2′E25

where

Z=F¯'ZF¯+12∑i=1M−1HF¯i−1F′φCiφH+HF¯i−1F′φCiφH′E26

The Z term in (26) is computed by a Lyapunov equation; it is often small and can be neglected in (25) for computational efficiency.

In computing the objective function and the gradient, we replace Ci by their sample estimates, Ĉseqi. With this replacement, the noise covariance estimation becomes a data-dependent stochastic optimization/learning problem.

2.4 Estimation of Q and R

2.4.1 Estimation of R

We define μk as the post-fit residual sequence of the Kalman filter, which is related to the innovations νk via

μk=zk−Hx̂kk=Inz−HWνk;k=1,2,⋯,NE27

From the joint covariance of the innovation sequence νk and the post-fit residual sequence μk, and the Schur determinant identity [14, 15], one can show that [4].

G=Eμkμk′=RS−1RE28

where S is the innovation covariance. Knowing the sampled estimates of G and S=Ĉseq0, the measurement noise covariance R is estimated. Because (28) can be interpreted as a continuous-time algebraic Riccati equation or as a simultaneous diagonalization problem in linear algebra [15], the measurement noise covariance R can be estimated by solving a continuous-time Riccati equation as in [4, 16] or by solving the simultaneous diagonalization problem via Cholesky decomposition and eigen decomposition.

2.4.2 Estimation of Q

Since the process noise covariance Q and the steady-state updated covariance P are generally coupled, Q and P can be obtained via a Gauss–Seidel type iterative computation given the estimated R. Wiener process is an exception where an explicit non-iterative solution Q=WSW′ is possible [4]. Let t and l denote the indices of iteration starting with t = 0 and l = 0. The initial steady-state updated covariance, P0, can be computed as the solution of the Lyapunov equation given by

P0=F˜P0F˜′+WRW′+Inx−WHΓQtΓ′Inx−WH′;Q0=WSW′E29

where F˜=Inx−WHF. We iteratively update P as in (30) until convergence

Pl+1=FPlF′+ΓQtΓ′−1+H′R−1H−1E30

Given the converged P, Q will be updated in the t-loop until the estimate of Q converges.

Qt+1=Γ†P+WSW′−FPF′t+1+λQInxΓ′†E31

where λQ is a regularization parameter used for ill-conditioned estimation problems.

2.5 Updating the gain W sequentially

The estimation algorithm sequentially computes the M sample covariance matrices at every measurement sample k as in (17). Let B be the mini-batch size for updating the Kalman filter gain W in the SGD. Our proposed method updates the gain W when the sample index k is divisible by the mini-batch size B. When compared to the batch estimation algorithm, the sequential mini-batch SGD algorithm allows more opportunities to converge to a better local minimum of (20) by frequently updating the filter the gain [5]. The generic form of the gain update is given by

Wr+1=Wr−αr∇WrΨE32

where r is the updating index starting with r=0. In our previous research [5], we explored the performance of accelerated SGD methods (e.g., bold driver [17], constant, subgradient [18], RMSProp [19], Adam [20], Adadelta [21]) for updating adaptive step size αr in (32). The root mean square propagation (RMSProp) method is applied for the estimation procedure in this chapter. The RMSProp keeps track of the moving average of the squared incremental gradients for each gain element by adapting the step size element-wise as in the following.

τr,ij=γτr−1,ij+1−γ∇WrΨij2;τ0=0E33

αijr=α0τr,ij+εE34

Here, γ=0.9 is the default value and ε=10−8 to prevent division by zero.

The pseudocode for the sequential mini-batch SGD estimation algorithm for a non-stationary system is included as Algorithm 1.

Algorithm 1 Pseudocode of sequential mini-batch SGD algorithm.

1: input: W0, Q0, R0, α0, B⊳W0: initial gain, Q0: initial Q, R0: initial R, α0: initial step size, B: batch size.

2: r = 0 {⊳ Initialize the updating index r_}.

3: fork = 1 to Ndo {⊳N: Number of samples}.

4: compute innovation correlations νk.

5: ifk>Nb+Mthen {⊳Nb: Number of burn-in samples}.

6: compute Ĉseqki, i = 0,1,2,...M-1.

7: ifModkB=0then.

8: compute the objective function Ψ.

9: compute the gradient ∇WΨ.

10: update the step size αr.

11: update the gain Wijr+1=Wijr−αijr∇WrΨ]ij.

12: update Rr+1 and Qr+1.

13: r = r + 1.

14: end if.

15: end if.

16: end for.

3. Numerical examples

In [5], we provided the evidence that the multi-pass sequential mini-batch stochastic gradient descent (SGD) algorithms improve the computational efficiency of the batch estimation algorithm via a number of test cases used in [2, 7, 8, 9, 12], and also showed their applicability to non-stationary systems when coupled with a change-point detection algorithm [11]. In [22], we proposed a single-pass sequential mini-batch SGD estimation algorithm by accessing measurements exactly once for non-stationary systems by modifying the example used in [12] to periodically change the process and measurement noise covariances.

In this section, we illustrate the utility of our proposed single-pass sequential mini-batch SGD estimation algorithm by applying it to general diverse examples involving detectable (but not completely observable) systems, non-stationary systems and a bearings-only tracking problem.

For the non-stationary systems, we assumed the process and measurement noise covariances occasionally change by an unknown level. Here “occasionally” implies the jumps are infrequent enough that the Kalman filter is in the steady-state prior to a jump in the noise covariance. We define the number of subgroups in which the noise covariances are not changing as Nsg. Given the number of observation samples, N, each subgroup has constant noise covariances with N/Nsg samples. In this section, we consider two non-stationary scenarios for tracking time-varying Q and R with Nsg=5 subgroups. We also consider the bearings-only tracking problem where Q changes continuously.

Note that the number of “burn-in” samples and the number of lags are Nb=50 and M=5, respectively in the estimation procedure. The root mean square propagation (RMSProp) method is applied to update the filter gain. All Monte Carlo simulations were run using a computer with an Intel Core i7-8665U processor and 16 GB of RAM.

We used the averaged normalized innovation squared (NIS) metric [2] to measure the consistency of the proposed algorithm.

ε¯νk=1Nmc∑i=1Nmcνk′S−1νkE35

where Nmc is the number of Monte Carlo runs. The root mean square error (RMSE) in resultant position and velocity is computed using

RMSEk=1Nmc∑i=1Nmcxktrue−x̂k2E36

3.1 Case 1: A detectable (but not completely observable) system that satisfies the identifiability conditions

Mehra [8] stated, without proof, that a necessary and sufficient condition for noise covariance estimation is that the system satisfies the observability property. This example, due to Odelson et al. [7], demonstrates that this condition is not necessary. The example does satisfy the full column rank condition for the identifiability matrix in (16).

Odelson et al. [7] proposed a noise covariance estimation method based on the autocovariance least-squares formulation by using the Kronecker operator δ. This method computes the covariances from the residuals of the state estimation. Note that the incompletely observable (but detectable ¹) system used in [7] is described by

F=0.1000.2,H=10,Γ=12E37

where F is the non-singular transition matrix, H is the constant output matrix, and Γ is the constant input matrix in (1) and (2). Note that this system is a hypothetical numerical example. The process noise vk and the measurement noise wk are supposed to be uncorrelated Gaussian white noise sequences with zero-mean and covariances as in the following

Evkvj′=QδkjE38

Ewkwj′=RδkjE39

In this scenario, the true R values for the five subgroups are [0.30, 0.81, 0.49, 0.72, 0.42], and the true Q values for the five subgroups are [0.16, 0.49, 0.25, 0.36, 0.20]. The values are changed every 10,000 samples. Table 1 shows the results of 100 Monte Carlo simulations based on the single-pass SGD algorithm in estimating Q and R. As can be seen, the estimated Q and R are close to their corresponding true values. In this scenario, the single-pass SGD estimation method has a speedup factor of 31 over the batch and multiple-pass SGD estimation methods (not shown).

Subgroup index	R			Q			P¯11			P¯22			W11			W21
Subgroup index	Truth	Mean	RMSE	Truth	Mean	RMSE	Truth	Mean	RMSE	Truth	Mean	RMSE	Truth	Mean	RMSE	Truth	Mean	RMSE
1st	0.30	0.31	0.06	0.16	0.15	0.04	0.16	0.15	0.04	0.66	0.62	0.18	0.35	0.33	0.03	0.70	0.67	0.03
2nd	0.81	0.86		0.49	0.42		0.49	0.43		2.01	1.74		0.38	0.34		0.76	0.67
3rd	0.49	0.49		0.25	0.24		0.25	0.24		1.03	0.97		0.34	0.34		0.68	0.67
4th	0.72	0.71		0.36	0.35		0.36	0.35		1.48	1.44		0.33	0.34		0.67	0.67
5th	0.42	0.41		0.20	0.20		0.20	0.21		0.83	0.84		0.33	0.34		0.66	0.67

Table 1.

Single-pass SGD estimation for Case 1 (100 MC Runs; 50,000 samples; M = 5; RMSProp; Batch size = 64).

Figure 1 demonstrates that the sequential mini-batch gradient descent algorithm can track Q and R correctly. Here, the trajectories of Q and R estimates are smoothed by a simple first order fading memory filter with a smoothing weight of 0.7. Figure 1e shows the averaged NIS of SGD (RMSProp; batch size of 64) method with the 95% probability region [0.74, 1.30], and shows that the SGD-based Kalman filter is consistent. The only place at which the NIS values are large are immediately after the jump in the noise variances. This is because adaptation requires a few samples.

Figure 1.
Trajectories of Q and R estimates without signal smoothing and with a smoothing weight of 0.7 for Case 1.

3.2 Case 2: a five-state inertial navigation system with diagonal Q and R

For estimating the unknown noise covariance parameters and the optimal Kalman filter gain for part of an inertial navigation system (INS), Mehra [8] proposed an iterative innovation correlations-based method starting from an arbitrary initial stabilizing gain. Inertial navigation [25] involves tracking the position and orientation of an object relative to a known starting orientation and velocity and it uses measurements provided by accelerometers and gyroscopes. These systems have found universal use in military and commercial applications [26].

Since the earth is not flat, the inertial navigation systems need to keep tilting the platform (with respect to inertial space) to keep the axes of the accelerometers horizontal. Here, small error sources that drive the Schuler-loop cause the navigation errors, and these errors are” damped” by making use of external velocity measurements, such as are furnished by a Doppler radar [27, 28].

In this problem, Mehra [8] used a system based on the damped Schuler-loop error propagation forced by exponentially correlated as well as white noise input. The system matrices for this navigation system are given by

F=0.75−1.74−0.30−0.150.090.91−0.00150−0.008000.95000000.55000000.905;H=1000101010;Γ=00000024.640000.8350001.83E40

where the system is discretized using a time step of 0.1 seconds. In this five-state system, the first two states represent the a velocity damping term and velocity error, respectively, and the other three states model the correlated noise processes. The noise corresponding to states 3 and 5 impacts both the velocity error and the velocity damping term; the fourth state impacts the sensor error in state 2 only.

In this problem, the true values corresponding to each subgroup with Nsg=5 subgroups and N=100,000 samples are as in (41). Each parameter of Q and R changes every 20,000 samples.

R11R22=0.250.560.640.420.360.250.250.490.160.04;Q11Q22Q33=0.250.640.490.250.490.250.360.560.160.040.360.490.640.250.09E41

Table 2 shows the results of 100 Monte Carlo simulations for estimating the noise parameters using RMSProp update. The estimated parameters are close to the corresponding true values. Given 100,000 samples, the proposed method with a batch-size of 64 requires 1,891 seconds for 100 Monte Carlo simulations, i.e., 18.91 seconds per run. The batch and multi-pass SGD estimation methods need more than 3,000 seconds for a single MC run (not shown); the single-pass SGD algorithm has a speedup factor of 158.6.

(a) R, Q and P¯ Estimates for Case 2
Subgroup index	R11		RMSE	R22		RMSE	Q11		RMSE	Q22		RMSE	Q33		RMSE
Subgroup index	Truth	Mean		Truth	Mean		Truth	Mean		Truth	Mean		Truth	Mean
1st	0.25	0.23	0.34	0.25	0.24	0.04	0.25	0.24	0.05	0.25	0.23	0.05	0.36	0.35	0.06
2nd	0.56	0.52		0.25	0.25		0.64	0.62		0.36	0.31		0.49	0.44
3rd	0.64	0.85		0.49	0.50		0.49	0.50		0.56	0.52		0.64	0.71
4th	0.42	0.44		0.16	0.16		0.25	0.26		0.16	0.16		0.25	0.25
5th	0.36	0.30		0.04	0.04		0.49	0.48		0.04	0.04		0.09	0.08
Subgroup index	P¯11		RMSE	P¯22		RMSE	P¯33		RMSE	P¯44		RMSE	P¯55		RMSE
Subgroup index	Truth	Mean		Truth	Mean		Truth	Mean		Truth	Mean		Truth	Mean
1st	19.46	18.84	4.12	0.33	0.31	0.05	306.26	297.69	73.27	0.23	0.22	0.05	4.02	3.86	0.71
2nd	43.76	42.11		0.43	0.40		766.59	744.98		0.34	0.30		5.45	4.99
3rd	37.81	40.15		0.66	0.67		600.91	621.04		0.52	0.49		7.33	8.00
4th	18.49	18.96		0.22	0.22		303.75	312.24		0.15	0.15		2.78	2.78
5th	29.31	28.55		0.07	0.07		574.37	562.90		0.04	0.04		0.97	0.93
(b) W Estimates for Case 2
Subgroup index	W11		RMSE	W21		RMSE	W31		RMSE	W41		RMSE	W51		RMSE
Subgroup index	Truth	Mean		Truth	Mean		Truth	Mean		Truth	Mean		Truth	Mean
1st	0.94	0.96	0.02	2.78⋅10−3	−3.56⋅10−3	0.01	−2.80	−2.75	0.04	−2.22⋅10−5	6.50⋅10−3	0.01	0.04	0.04	0.01
2nd	0.96	0.98		3.22⋅10−3	−4.32⋅10−3		−2.91	−2.88		−8.76⋅10−4	8.37⋅10−3		0.03	0.01
3rd	0.94	0.95		2.84⋅10−3	−3.84⋅10−3		−2.79	−2.83		−1.19⋅10−4	8.79⋅10−3		0.04	0.03
4th	0.94	0.95		3.84⋅10−3	2.08⋅10−4		−2.80	−2.84		−6.80⋅10−4	3.89⋅10−4		0.03	0.02
5th	0.98	1.00		3.50⋅10−3	−1.33⋅10−4		−3.03	−3.00		−1.09⋅10−3	2.10⋅10−4		0.01	0.01
Subgroup index	W12		RMSE	W22		RMSE	W32		RMSE	W42		RMSE	W52		RMSE
Subgroup index	Truth	Mean		Truth	Mean		Truth	Mean		Truth	Mean		Truth	Mean
1st	0.94	0.94	0.08	0.38	0.39	0.02	−1.63	−1.65	0.25	0.23	0.23	0.02	−0.94	−0.93	0.05
2nd	1.02	1.00		0.40	0.42		−1.65	−1.70		0.27	0.25		−1.02	−0.98
3rd	0.86	0.96		0.36	0.37		−1.56	−1.68		0.26	0.25		−0.86	−0.93
4th	0.99	0.98		0.39	0.39		−1.57	−1.69		0.23	0.23		−0.98	−0.99
5th	1.19	1.06		0.44	0.46		−1.26	−1.77		0.20	0.22		−1.17	−1.11

Table 2.

Single-pass SGD estimation for Case 2 (100 MC Runs, 100,000 samples; M = 5; RMSProp; Batch size = 64).

Figure 2 shows the trajectories of the estimated Q and R with a signal smoothing with a smoothing weight of 0.7. For this example, it is known that accurate estimation of R11 is hard as shown in Figure 2d. The reason is that R11 is dominated by the state uncertainty, i.e., the measurement noise is “buried” in a much larger innovation [4]. In spite of the difficulty in estimating R11, the filter is consistent as measured by NIS as shown in Figure 2f.

Figure 2.
Trajectories of Q and R estimates with a signal smoothing at smoothing weight = 0.7 for Case 2.

3.3 Case 3: Bearings-only tracking problem

In many practical situations, it is generally hard to get a closed-form solution for state estimation because the noise covariances are often unknown and the dynamics are nonlinear. Arasaratnam et al. [29] proposed a nonlinear filter using bearings-only measurements for estimating the position and velocity of a target in a high-dimensional state. This method is based on the measurements from a passive sensor that measures only the direction of arrival of a signal emitted by the target [2]. This so-called bearings-only tracking problem arises in a variety of practical applications, such as air traffic control, underwater sonar tracking and aircraft surveillance [2, 30, 31].

In this example, we consider a two-dimensional bearings-only tracking problem of a nearly-constant velocity target from a single moving observer used in [32]. The dynamics of the target (relative to the observer) are described by

xk+1=Fxk+Γvk−UkE42

zk=hxk+wkE43

Formally, if the state vector of the target is xtk=ζtηtζ̇tη̇t', and the state vector of the observer is xok=ζoηoζ̇oη̇o' for position and velocity along the ζ and η axes, xk=xtk−xok represents the relative state vector of the target with respect to the observer and the input vector Uk=xok−Fxok−1; wk is a zero-mean white Gaussian noise with variance σθ2. The nonlinear measurement involves the bearing of the target from the observer’s platform, given by hxk=tan−1ζ/η. Here, Γ is the identity matrix with ones on the diagonal and zeros elsewhere.

The system matrices for this problem are given by

F=10T0010T00100001,Γ=1000010000100001,Q=T3/30T2/200T3/30T2/2T2/20T00T2/20Tq˜kE44

where the sampling interval, T is 1 second. The zero-mean white process noise intensity q˜k is q˜0=9⋅10−12km2/s3, except for the interval where it starts to increase rapidly to 1.5 ⋅q˜0 around the sample index k = 480 and then decreases again rapidly around k = 960 as below:

q˜k=q˜0+0.25q˜01+tanh0.015k−480,k≤720q˜0+0.25q˜01+tanh0.015960−k,otherwiseE45

The linearized measurement matrix, Hk, is the Jacobian of the measurement function given by

Hk=∂hxk∂xk=ηkζ2k+η2k−ζkζ2k+η2k00E46

A total of 1920 measurement samples were generated for this scenario. The observer moves straight with a speed of 5 knots, except for 480 seconds (between k = 480 and k = 960), where it turns with 2.4∘/s as shown in Figure 3 (these times are marked by cross sign).

Figure 3.
Observer and target trajectory (100 MC runs).

For a fair comparison of the estimation algorithms, we initialized all filters with the same mean and covariance using the prior knowledge of the initial target range and the initial bearing measurement [33, 34]. Here, the initial target range and the initial bearing measurement are generated as r¯∼Nrσr2 and θ¯0∼Nθσθ2, respectively, where r is the true initial target range and θ is the true initial bearing measurement. The initial target speed is initialized as s¯∼Nsσs2, where s is the true initial target speed. Let σ<⋅> denotes the standard deviation of the parameter. Assuming that the target is moving towards the observer, the initial course estimate can be obtained as c¯=θ¯0+π. The initial state vector and the initial covariance are given by

x̂0=ζ̂η̂ζ̇̂η̇̂=r¯cosθ¯0r¯sinθ¯0s¯sinc¯−ζ̇0os¯cosc¯−η̇0o;P0=PζζPζη00PηζPηη0000Pζ̇ζ̇Pζ̇η̇00Pη̇ζ̇Pη̇η̇E47

where

Pζζ=r¯2σθ2cos2θ¯0+σr2sin2θ¯0;Pηη=r¯2σθ2sin2θ¯0+σr2cos2θ¯0E48

Pζη=Pηζ=σr2−r¯2σθ2sinθ¯0cosθ¯0;Pζ̇ζ̇=s¯2σc2cos2c¯+σs2sin2c¯E49

Pη̇η̇=s¯2σc2sin2c¯+σs2cos2c¯;Pζ̇η̇=Pη̇ζ̇=σs2−s¯2σc2sinc¯cosc¯E50

where r and s are 5 km and 4 knots, respectively, and the target course is −140∘. Here, σr is 2 km, σθ is 1.5 ∘, σs is 2 knots and σc=π/12 for this problem.

Figure 4 shows a comparison of algorithms for the bearings-only tracking problem. The cubature Kalman filter (CKF) uses a third-degree spherical-radial cubature rule that provides the set of cubature points scaling linearly with the state-vector dimension [29]. The cubature Kalman filter and our single-pass SGD extended KF (EKF) method can track the target well, but our proposed method shows better computational efficiency compared to CKF by a factor of 2.5 (not shown). Root mean square error (RMSE) in position and velocity over 100 Monte Carlo runs are shown in Figure 4c and Figure 4d. During the whole maneuver, the RMSE of the proposed single-pass SGD-EKF algorithm was slightly lower than that with the CKF method.

Figure 4.
Comparison of estimation algorithms for the bearings-only tracking problem (100 MC runs).

4. Conclusions

In this chapter, we derived a single-pass sequential mini-batch SGD algorithm for estimating the noise covariances in an adaptive Kalman filter. We demonstrated the utility of the method using diverse examples involving a detectable (but not completely observable) system, a non-stationary system, and a nonlinear bearings-only tracking problem. The evaluation showed that the proposed method has acceptable state estimation root mean square error (RMSE) and exhibits filter consistency as measured by the normalized innovation squared (NIS) criterion.

Acknowledgments

This work was supported in part by the U.S. Office of Naval Research (ONR), in part by the U.S. Naval Research Laboratory (NRL) under Grant N00014-18-1-1238, N00014-21-1-2187 and Grant N00173-16-1-G905.

Abbreviations

CKF	Cubature Kalman filter
EKF	Extended Kalman filter
INS	Inertial navigation system
KF	Kalman filter
MMSE	Minimum mean square error
NIS	Normalized innovation squared
RMSE	Root mean square error
RMSProp	Root mean square propagation
SGD	Stochastic gradient descent

References

1. Auger F, Hilairet M, Guerrero JM, Monmasson E, Orlowska-Kowalska T, Katsura S. Industrial applications of the Kalman filter: A review. IEEE Transactions on Industrial Electronics. 2013;60(12):5458-5471
2. Bar-Shalom Y, Li XR, Kirubarajan T. Estimation with applications to tracking and navigation: theory algorithms and software. New York, NY: John Wiley & Sons; 2001
3. Kalman RE. A new approach to linear filtering and prediction problems. Journal of Basic Engineering. 1960;82(1):35-45
4. Zhang L, Sidoti D, Bienkowski A, Pattipati KR, Bar-Shalom Y, Kleinman DL. On the identification of noise covariances and adaptive Kalman Filtering: A new look at a 50 year-old problem. IEEE Access. 2020;8:59362-59388
5. Kim HS, Zhang L, Bienkowski A, Pattipati KR. Multi-pass sequential mini-batch stochastic gradient descent algorithms for noise covariance estimation in adaptive Kalman filtering. IEEE Access. 2021;9:99220-99234
6. Kim HS, Bienkowski A, Pattipati KR. A single-pass noise covariance estimation algorithm in multiple-model adaptive Kalman filtering for non-stationary systems. TechRxiv. 2022. DOI: 10.36227/techrxiv.14761005.v2
7. Odelson BJ, Rajamani MR, Rawlings JB. A new autocovariance least-squares method for estimating noise covariances. Automatica. 2006;42(2):303-308
8. Mehra R. On the identification of variances and adaptive Kalman filtering. IEEE Transactions on automatic control. 1970;15(2):175-184
9. Belanger PR. Estimation of noise covariance matrices for a linear time-varying stochastic process. Automatica. 1974;10(3):267-275
10. Sarkka S, Nummenmaa A. Recursive noise adaptive Kalman filtering by variational Bayesian approximations. IEEE Transactions on Automatic Control. 2009;54(3):596-600
11. Killick R, Fearnhead P, Eckley IA. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association. 2012;107(500):1590-1598
12. Neethling C, Young P. Comments on “Identification of optimum filter steady-state gain for systems with unknown noise covariances”. IEEE Transactions on Automatic Control. 1974;19(5):623-625
13. Tajima K. Estimation of steady-state Kalman filter gain. IEEE Transactions on Automatic Control. 1978;23(5):944-945
14. Bertsekas DP. Athena scientific optimization and computation series. In: Nonlinear Programming. Belmont, MA: Athena Scientific; 2016. Available from: http://www.athenasc.com/nonlinbook.html
15. Golub GH, Van Loan CF. Matrix computations. Vol. 3. Baltimore, MD: JHU press; 2013
16. Arnold WF, Laub AJ. Generalized eigenproblem algorithms and software for algebraic Riccati equations. Proceedings of the IEEE. 1984;72(12):1746-1754
17. Battiti R. Accelerated backpropagation learning: Two optimization methods. Complex Systems. 1989;3(4):331-342
18. Boyd S, Xiao L, Mutapcic A. Subgradient methods. lecture notes of EE392o, Stanford University, Autumn Quarter. 2003;2004:2004-2005. Available from: www.stanford.edu/class/ee392o/subgrad method.pdf
19. Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning. 2012;4(2):26-31
20. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv. 2014. DOI: 10.48550/arXiv.1412.6980
21. Zeiler MD. Adadelta: an adaptive learning rate method. arXiv. 2012. DOI: 10.48550/arXiv.1212.5701
22. Kim HS, Zhang L, Bienkowski A, Pattipati KR. A single-pass noise covariance estimation algorithm in adaptive Kalman filtering for non-stationary systems. In: 2021 IEEE 24th International Conference on Information Fusion (FUSION). IEEE; 2021. pp. 556-563. DOI: 10.23919/FUSION49465.2021.9626861
23. Naets F, Cuadrado J, Desmet W. Stable force identification in structural dynamics using Kalman filtering and dummy-measurements. Mechanical Systems and Signal Processing. 2015;50:235-248
24. Simon D. Optimal state estimation: Kalman, H infinity, and nonlinear approaches. Hoboken, NJ: John Wiley & Sons; 2006
25. Woodman OJ. An introduction to inertial navigation. Technical Report, Computer Laboratory. Cambridge, UK: University of Cambridge; 2007 UCAM-CL-TR-696. DOI: 10.48456/tr-696
26. Kuritsky MM, Goldstein MS. Inertial navigation. Proceedings of the IEEE. 1983;71:1156-1176
27. Heller WG. Free-Inertial and Damped-Inertial Navigation Mechanization and Error Equations. Reading, MA: Analytic Sciences Corp; 1975
28. King A. Inertial navigation-forty years of evolution. GEC review. 1998;13(3):140-149
29. Arasaratnam I, Haykin S. Cubature Kalman Filters. IEEE Transactions on Automatic Control. 2009;54(6):1254-1269
30. Farina A. Target tracking with bearings – Only measurements. Signal Processing. 1999;78(1):61-78
31. Aidala VJ. Kalman Filter Behavior in Bearings-Only Tracking Applications. IEEE Transactions on Aerospace and Electronic Systems. 1979;AES-15(1):29-39
32. Leong PH, Arulampalam S, Lamahewa TA, Abhayapala TD. A Gaussian-Sum Based Cubature Kalman Filter for Bearings-Only Tracking. IEEE Transactions on Aerospace and Electronic Systems. 2013;49(2):1161-1176
33. Ristic B, Arulampalam S, Gordon N. Beyond the Kalman filter: Particle filters for tracking applications. Norwood, MA: Artech house; 2003
34. Kumar K, Bhaumik S, Arulampalam S. Tracking an Underwater Object with Unknown Sensor Noise Covariance Using Orthogonal Polynomial Filters. Sensors. 2022;22(13):4870

Notes

The pair (F, H) in the system should be detectable in order for the continuous-time algebraic Riccati equation to have at least one positive semidefinite solution and in this case at least one solution results in a marginally stable steady-state KF [23, 24].

[1] 1. Auger F, Hilairet M, Guerrero JM, Monmasson E, Orlowska-Kowalska T, Katsura S. Industrial applications of the Kalman filter: A review. IEEE Transactions on Industrial Electronics. 2013;60(12):5458-5471

[2] 2. Bar-Shalom Y, Li XR, Kirubarajan T. Estimation with applications to tracking and navigation: theory algorithms and software. New York, NY: John Wiley & Sons; 2001

[3] 3. Kalman RE. A new approach to linear filtering and prediction problems. Journal of Basic Engineering. 1960;82(1):35-45

[4] 4. Zhang L, Sidoti D, Bienkowski A, Pattipati KR, Bar-Shalom Y, Kleinman DL. On the identification of noise covariances and adaptive Kalman Filtering: A new look at a 50 year-old problem. IEEE Access. 2020;8:59362-59388

[5] 5. Kim HS, Zhang L, Bienkowski A, Pattipati KR. Multi-pass sequential mini-batch stochastic gradient descent algorithms for noise covariance estimation in adaptive Kalman filtering. IEEE Access. 2021;9:99220-99234

[6] 6. Kim HS, Bienkowski A, Pattipati KR. A single-pass noise covariance estimation algorithm in multiple-model adaptive Kalman filtering for non-stationary systems. TechRxiv. 2022. DOI: 10.36227/techrxiv.14761005.v2

[7] 7. Odelson BJ, Rajamani MR, Rawlings JB. A new autocovariance least-squares method for estimating noise covariances. Automatica. 2006;42(2):303-308

[8] 8. Mehra R. On the identification of variances and adaptive Kalman filtering. IEEE Transactions on automatic control. 1970;15(2):175-184

[9] 9. Belanger PR. Estimation of noise covariance matrices for a linear time-varying stochastic process. Automatica. 1974;10(3):267-275

[10] 10. Sarkka S, Nummenmaa A. Recursive noise adaptive Kalman filtering by variational Bayesian approximations. IEEE Transactions on Automatic Control. 2009;54(3):596-600

[11] 11. Killick R, Fearnhead P, Eckley IA. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association. 2012;107(500):1590-1598

[12] 12. Neethling C, Young P. Comments on “Identification of optimum filter steady-state gain for systems with unknown noise covariances”. IEEE Transactions on Automatic Control. 1974;19(5):623-625

[13] 13. Tajima K. Estimation of steady-state Kalman filter gain. IEEE Transactions on Automatic Control. 1978;23(5):944-945

[14] 14. Bertsekas DP. Athena scientific optimization and computation series. In: Nonlinear Programming. Belmont, MA: Athena Scientific; 2016. Available from: http://www.athenasc.com/nonlinbook.html

[15] 15. Golub GH, Van Loan CF. Matrix computations. Vol. 3. Baltimore, MD: JHU press; 2013

[16] 16. Arnold WF, Laub AJ. Generalized eigenproblem algorithms and software for algebraic Riccati equations. Proceedings of the IEEE. 1984;72(12):1746-1754

[17] 17. Battiti R. Accelerated backpropagation learning: Two optimization methods. Complex Systems. 1989;3(4):331-342

[18] 18. Boyd S, Xiao L, Mutapcic A. Subgradient methods. lecture notes of EE392o, Stanford University, Autumn Quarter. 2003;2004:2004-2005. Available from: www.stanford.edu/class/ee392o/subgrad method.pdf

[19] 19. Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning. 2012;4(2):26-31

[20] 20. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv. 2014. DOI: 10.48550/arXiv.1412.6980

[21] 21. Zeiler MD. Adadelta: an adaptive learning rate method. arXiv. 2012. DOI: 10.48550/arXiv.1212.5701

[22] 22. Kim HS, Zhang L, Bienkowski A, Pattipati KR. A single-pass noise covariance estimation algorithm in adaptive Kalman filtering for non-stationary systems. In: 2021 IEEE 24th International Conference on Information Fusion (FUSION). IEEE; 2021. pp. 556-563. DOI: 10.23919/FUSION49465.2021.9626861

[23] 23. Naets F, Cuadrado J, Desmet W. Stable force identification in structural dynamics using Kalman filtering and dummy-measurements. Mechanical Systems and Signal Processing. 2015;50:235-248

[24] 24. Simon D. Optimal state estimation: Kalman, H infinity, and nonlinear approaches. Hoboken, NJ: John Wiley & Sons; 2006

[25] 25. Woodman OJ. An introduction to inertial navigation. Technical Report, Computer Laboratory. Cambridge, UK: University of Cambridge; 2007 UCAM-CL-TR-696. DOI: 10.48456/tr-696

[26] 26. Kuritsky MM, Goldstein MS. Inertial navigation. Proceedings of the IEEE. 1983;71:1156-1176

[27] 27. Heller WG. Free-Inertial and Damped-Inertial Navigation Mechanization and Error Equations. Reading, MA: Analytic Sciences Corp; 1975

[28] 28. King A. Inertial navigation-forty years of evolution. GEC review. 1998;13(3):140-149

[29] 29. Arasaratnam I, Haykin S. Cubature Kalman Filters. IEEE Transactions on Automatic Control. 2009;54(6):1254-1269

[30] 30. Farina A. Target tracking with bearings – Only measurements. Signal Processing. 1999;78(1):61-78

[31] 31. Aidala VJ. Kalman Filter Behavior in Bearings-Only Tracking Applications. IEEE Transactions on Aerospace and Electronic Systems. 1979;AES-15(1):29-39

[32] 32. Leong PH, Arulampalam S, Lamahewa TA, Abhayapala TD. A Gaussian-Sum Based Cubature Kalman Filter for Bearings-Only Tracking. IEEE Transactions on Aerospace and Electronic Systems. 2013;49(2):1161-1176

[33] 33. Ristic B, Arulampalam S, Gordon N. Beyond the Kalman filter: Particle filters for tracking applications. Norwood, MA: Artech house; 2003

[34] 34. Kumar K, Bhaumik S, Arulampalam S. Tracking an Underwater Object with Unknown Sensor Noise Covariance Using Orthogonal Polynomial Filters. Sensors. 2022;22(13):4870

Sequential Mini-Batch Noise Covariance Estimator

Kalman Filter - Engineering Applications

Abstract

Keywords

Author Information

Hee-Seung Kim*

Lingyi Zhang

Adam Bienkowski

Krishna R. Pattipati

David Sidoti

Yaakov Bar-Shalom

David L. Kleinman

1. Introduction

1.1 Prior work

1.2 Organization of the chapter

2. Sequential mini-batch SGD method for estimating process and measurement noise covariances

2.1 Identifiability conditions for estimating Q and R

2.2 Recursive fading memory-based innovation correlation estimation

2.3 Objective function and the gradient

2.4 Estimation of Q and R

2.4.1 Estimation of R

2.4.2 Estimation of Q

2.5 Updating the gain W sequentially

3. Numerical examples

3.1 Case 1: A detectable (but not completely observable) system that satisfies the identifiability conditions

Table 1.

Figure 1.

3.2 Case 2: a five-state inertial navigation system with diagonal Q and R

Table 2.

Figure 2.

3.3 Case 3: Bearings-only tracking problem

Figure 3.

Figure 4.

4. Conclusions

Acknowledgments

Abbreviations

References

Notes

Review of Kalman Filter Developments in Analytical Engineering Design

Your cart

Sequential Mini-Batch Noise Covariance Estimator

Kalman Filter - Engineering Applications

Abstract

Keywords

Author Information

Hee-Seung Kim*

Lingyi Zhang

Adam Bienkowski

Krishna R. Pattipati

David Sidoti

Yaakov Bar-Shalom

David L. Kleinman

1. Introduction

1.1 Prior work

1.2 Organization of the chapter

2. Sequential mini-batch SGD method for estimating process and measurement noise covariances

2.1 Identifiability conditions for estimating Q and R

2.2 Recursive fading memory-based innovation correlation estimation

2.3 Objective function and the gradient

2.4 Estimation of Q and R

2.4.1 Estimation of R

2.4.2 Estimation of Q

2.5 Updating the gain W sequentially

3. Numerical examples

3.1 Case 1: A detectable (but not completely observable) system that satisfies the identifiability conditions

Table 1.

Figure 1.

3.2 Case 2: a five-state inertial navigation system with diagonal Q and R

Table 2.

Figure 2.

3.3 Case 3: Bearings-only tracking problem

Figure 3.

Figure 4.

4. Conclusions

Acknowledgments

Abbreviations

References

Notes

Continue reading from the same book

Kalman Filter

Your cart