Discrete-Time Minimum-Variance Prediction and Filtering

This book describes the classical smoothing, filtering and prediction techniques together with some more recently developed embellishments for improving performance within applications. It aims to present the subject in an accessible way, so that it can serve as a practical guide for undergraduates and newcomers to the field. The material is organised as a ten-lecture course. The foundations are laid in Chapters 1 and 2, which explain minimum-mean-square-error solution construction and asymptotic behaviour. Chapters 3 and 4 introduce continuous-time and discrete-time minimum-variance filtering. Generalisations for missing data, deterministic inputs, correlated noises, direct feedthrough terms, output estimation and equalisation are described. Chapter 5 simplifies the minimum-variance filtering results for steady-state problems. Observability, Riccati equation solution convergence, asymptotic stability and Wiener filter equivalence are discussed. Chapters 6 and 7 cover the subject of continuous-time and discrete-time smoothing. The main fixed-lag, fixed-point and fixed-interval smoother results are derived. It is shown that the minimum-variance fixed-interval smoother attains the best performance. Chapter 8 attends to parameter estimation. As the above-mentioned approaches all rely on knowledge of the underlying model parameters, maximum-likelihood techniques within expectation-maximisation algorithms for joint state and parameter estimation are described. Chapter 9 is concerned with robust techniques that accommodate uncertainties within problem specifications. An extra term within Riccati equations enables designers to trade-off average error and peak error performance. Chapter 10 rounds off the course by applying the afore-mentioned linear techniques to nonlinear estimation problems. It is demonstrated that step-wise linearisations can be used within predictors, filters and smoothers, albeit by forsaking optimal performance guarantees.

possesses a direct-feedthrough term. A simplification of the generalised regulator problem from control theory is presented, from which the solutions of output estimation, input estimation (or equalisation), state estimation and mixed filtering problems follow immediately. Figure 1. The discrete-time system  operates on the input signal w k   m and produces the output y k   p .

The Time-varying Signal Model
A discrete-time time-varying system :   m →  p is assumed to have the state-space representation where A k    n n , B k    n m , C k    p n and D k    p p over a finite interval k  [0, N]. The w k is a stochastic white process with in which is the Kronecker delta function. This system is depicted in Fig. 1, in which z -1 is the unit delay operator. It is interesting to note that, at time k the current state

The State Prediction Problem
Suppose that observations of (5) are available, that is, where v k is a white measurement noise process with Figure 2. The state prediction problem. The objective is to design a predictor  which operates on the measurements and produces state estimates such that the variance of the error residual e k/k-1 is minimised.
It is noted above for the state recursion (4), there is a one-step delay between the current state and the input process. Similarly, it is expected that there will be one-step delay between the current state estimate and the input measurement. Consequently, it is customary to denote / 1 k k x  as the state estimate at time k, given measurements at time k -1.
The / 1 k k x  is also known as the one-step-ahead state prediction. The objective here is to design a predictor  that operates on the measurements z k and produces an estimate, / 1 k k y  = / 1 k k k C x  , of y k = C k y k , so that the covariance, , of the error residual, e k/k-1 = y k -/ 1 k k y  , is minimised. This problem is depicted in Fig. 2

The Discrete-time Conditional Mean Estimate
The predictor derivation that follows relies on the discrete-time version of the conditionalmean or least-mean-square estimate derived in Chapter 3, which is set out as follows.

Consider a stochastic vector [ ]
having means and covariances The above formula is developed in [3] and established for Gaussian distributions in [4]. A derivation is requested in the problems. If α k and β k are scalars then (10) degenerates to the linear regression formula as is demonstrated below.
Example 1 (Linear regression [5]). The least-squares estimate ˆk  =  k a + b of k  given data α k , β k   over [1, N], can be found by minimising the performance objective J =

Minimum-Variance Prediction
It follows from (1), (6), together with the assumptions E{w k } = 0, E{v k } = 0, that E{x k+1 } = E{A k x k } and E{z k } = E{C k x k }. It is assumed that similar results hold in the case of predicted state estimates, that is, Substituting (11) into (10) and denoting where is known as the predictor gain, which is designed in the next section. Thus, the optimal one-step-ahead predictor follows immediately from the leastmean-square (or conditional mean) formula. A more detailed derivation appears in [4]. The structure of the optimal predictor is shown in Fig. 3. It can be seen from the figure that  produces estimates / 1  Figure 3. The optimal one-step-ahead predictor which produces estimates given measurements z k .
x  denote the state prediction error. It is shown below that the expectation of the prediction error is zero, that is, the predicted state estimate is unbiased.
Proof: The condition 0 / 0 x = x 0 is equivalent to 0 / 0 x  = 0, which is the initialisation step for an induction argument. Subtracting (12) from (1) gives and therefore From assumptions (3) and (7), the last two terms of the right-hand-side of (15) are zero. Thus, (13) follows by induction.

Design of the Predictor Gain
It is shown below that the optimum predictor gain is that which minimises the prediction error covariance

over [0, N], then the predictor gain
which can be rearranged to give By inspection of (19), the predictor gain (17) minimises

Minimum-Variance Filtering
It can be seen from (12) that the predicted state estimate / 1 k k x  is calculated using the previous measurement z k-1 as opposed to the current data z k . A state estimate, given the data at time k, which is known as the filtered state, can similarly be obtained using the linear least squares or conditional-mean formula. In Lemma 1 it was shown that the predicted state estimate is unbiased. Therefore, it is assumed that the expected value of the filtered state equals the expected value of the predicted state, namely, Substituting (20) into (10) and denoting / k k where Proof: Following the approach of [6], combining (4) - (6) results in z k = C k A k-1 x k-1 + C k B k-1 w k-1 + v k , which together with (21) yields From (23) and the assumptions (3), (7), it follows that Hence, with the initial condition 0 / 0

Design of the Filter Gain
It is shown below that the optimum filter gain is that which minimises the covariance x is the filter error.

within (21) minimises
which can be rearranged as 7 By inspection of (29), the filter gain (26) minimises / k k P .

Example 2 (Data Fusion).
Consider a filtering problem in which there are two measurements of the same state variable (possibly from different sensors), namely from which it follows that That is, when the first measurement is noise free, the filter ignores the second measurement and vice versa. Thus, the Kalman filter weights the data according to the prevailing measurement qualities.

The Predictor-Corrector Form
The Kalman filter may be written in the following predictor-corrector form. The corrected (or filtered) error covariances and states are respectively given by where (31) is also known as the measurement update. The predicted state and error covariances are respectively given by where It can be seen from (31) that the corrected estimate, / k k x , is obtained using measurements up to time k. This contrasts with the prediction at time k + 1 in (32), which is based on all previous measurements. The output estimate is given by

The A Posteriori Filter
The above predictor-corrector form is used in the construction of extended Kalman filters for nonlinear estimation problems (see Chapter 10). When state predictions are not explicitly required, the following one-line recursion for the filtered state can be employed. Substituting Hence, the output estimator may be written as This form is called the a posteriori filter within [7], [8] and [9]. The absence of a direct feedthrough matrix above reduces the complexity of the robust filter designs described in [7], [8] and [9].

The Information Form
Algebraically equivalent recursions of the Kalman filter can be obtained by propagating a so-called corrected information state and a predicted information state The expression which is variously known as the Matrix Inversion Lemma, the Sherman-Morrison formula and Woodbury's identity, is used to derive the information filter, see [3], [4], [11], [14] and [15]. To confirm the above identity, premultiply both sides of (38) by to obtain assuming that can be obtained from the Matrix Inversion Lemma and (33), namely, Another useful identity is From (42) and (39), the filter gain can be expressed as Premultiplying (39) by / k k P and rearranging gives It follows from (31), (36) and (44) that the corrected information state is given by The predicted information state follows from (37), (41) and the definition of F k , namely, Recall from Lemma 1 and Lemma 3 that That is, the information states (scaled by the appropriate covariances) will be unbiased, provided that the filter is suitably initialised. The calculation cost and potential for numerical instability can influence decisions on whether to implement the predictor-corrector form (30) -(33) or the information form (39) -(46) of the Kalman filter. The filters have similar complexity, both require a p × p matrix inverse in the measurement updates (31) and (45). However, inverting the measurement covariance matrix for the information filter may be troublesome when the measurement noise is negligible.

Comparison with Recursive Least Squares
The recursive least squares (RLS) algorithm is equivalent to the Kalman filter designed with the simplifications A k = I and B k = 0; see the derivations within [10], [11]. For convenience, consider a more general RLS algorithm that retains the correct A k but relies on the simplifying assumption B k = 0. Under these conditions, denote the RLS algorithm's predictor gain by where / 1 k k P  is obtained from the Riccati difference equation It is argued below that the cost of the above model simplification is an increase in meansquare-error. Proof: From the approach of Lemma 2, the RLS algorithm's predicted error covariance is given by The last term on the right-hand-side of (50) is nonzero since the above RLS algorithm relies on the erroneous assumption T k k k B Q B = 0. Therefore (49) follows.

Repeated Predictions
When there are gaps in the data record, or the data is irregularly spaced, state predictions can be calculated an arbitrary number of steps ahead. The one-step-ahead prediction is given by (32). The two, three and j-step-ahead predictions, given data at time k, are calculated as see also [4], [12]. The corresponding predicted error covariances are given by Another way to handle missing measurements at time i is to set C i = 0, which leads to the same predicted states and error covariances. However, the cost of relying on repeated predictions is an increased mean-square-error which is demonstrated below.

Lemma 6:
(i) (i) The claim follows by inspection of (30) since Thus, the filter outperforms the one-step-ahead predictor.
The monotonically increasing sequence of error variances shown in the figure demonstrates that degraded performance occurs during repeated predictions. Fig. 5 shows some sample trajectories of the model output (dotted line), filter output (crosses) and predictions (circles) assuming that z 3 … z 8 are unavailable. It can be seen from the figure that the prediction error increases with time k, which illustrates Lemma 6.

Accommodating Deterministic Inputs
Suppose that the signal model is described by where µ k and π k are deterministic inputs (such as known non-zero means). The modifications to the Kalman recursions can be found by assuming The filtered and predicted states are then given by respectively. Subtracting (62) from (58) gives k k x  . Therefore, the predicted error covariance, is unchanged. The filtered output is given by

Correlated Process and Measurement Noises
Consider the case where the process and measurement noises are correlated The generalisation of the optimal filter that takes the above into account was published by Kalman in 1963 [2]. The expressions for the state prediction and the state prediction error remain the same. It follows from (68) that As before, the optimum predictor gain is that which minimises the prediction error covariance

over [0, N], then the state prediction (67) with the gain
Proof: It follows from (69) that By inspection of (73), the predictor gain (71) minimises Thus, the predictor gain is calculated differently when w k and v k are correlated. The calculation of the filtered state and filtered error covariance are unchanged, viz.
However, / 1 k k P  is now obtained from the Riccati difference equation (70).

Including a Direct-Feedthrough Matrix
Suppose now that the signal model possesses a direct-feedthrough matrix, D k , namely Let the observations be denoted by where k k k k v D w v   , under the assumptions (3) and (7). It follows that The approach of the previous section may be used to obtain the minimum-variance predictor for the above system. Using (80) within Lemma 7 yields the predictor gain where and / 1 k k P  is the solution of the Riccati difference equation The filtered states can be calculated from (74) , (82), (83) and L k =

Solution of the General Filtering Problem
The general filtering problem is shown in Fig. 7  Suppose that the system   has the realisation (84) and The objective is to produce estimates 1, / k k y of 1,k y from the measurements is minimised. The predicted state follows immediately from the results of the previous sections, namely, in which / 1 k k In view of the structure (89), an output estimate of the form is sought, where L k is a filter gain to be designed. Subtracting (93) from (86) gives It is shown below that an optimum filter gain can be found by minimising the output error covariance "This 'telephone' has too many shortcomings to be seriously considered as a means of communication.
The device is inherently of no value to us." Western Union memo, 1876 Proof: It follows from (94) that which can be expanded to give By inspection of (97), the filter gain (95) minimises The filter gain (95) has been generalised to include arbitrary C 1,k , D 1,k , and D 2,k . For state estimation, C 2 = I and D 2 = 0, in which case (95) reverts to the simpler form (26). The problem (84) -(88) can be written compactly in the following generalised regulator framework from control theory [13].
The application of the solution (99) -(100) to output estimation, input estimation (or equalisation), state estimation and mixed filtering problems is demonstrated in the example below. Consider a mixed filtering and equalisation problem depicted in Fig. 8 Noting the realisation of "Video won't be able to hold on to any market it captures after the first six months. People will soon get tired of staring at a plywood box every night." Daryl Francis Zanuck

Hybrid Continuous-Discrete Filtering
Often a system's dynamics evolve continuously but measurements can only be observed in discrete time increments. This problem is modelled in [20] as where which T s is the sampling interval. Following the approach of [20], state estimates can be obtained from a hybrid of continuous-time and discrete-time filtering equations. The predicted states and error covariances are obtained from Define / 1 k k x  = ˆ( ) x t and P k/k-1 = P(t) at t = kT s . The corrected states and error covariances are given by The above filter is a linear system having jumps at the discrete observation times. The states evolve according to the continuous-time dynamics (106) in-between the sampling instants. This filter is applied in [20] for recovery of cardiac dynamics from medical image sequences.

Conclusion
A linear, time-varying system   is assumed to have the realisation x k+1 = A k x k + B k w k and y 2,k = C 2,k x k + D 2,k w k . In the general filtering problem, it is desired to estimate the output of a second reference system   which is modelled as y 1,k = C 1,k x k + D 1,k w k . The Kalman filter which estimates y 1,k from the measurements z k = y 2,k + v k at time k is listed in Table 1.
"Louis Pasteur's theory of germs is ridiculous fiction." Pierre Pachet, Professor of Physiology at Toulouse, 1872 Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 96 If the state-space parameters are known exactly then this filter minimises the predicted and corrected error covariances respectively. When there are gaps in the data record, or the data is irregularly spaced, state predictions can be calculated an arbitrary number of steps ahead, at the cost of increased mean-square-error.

ASSUMPTIONS
MAIN RESULTS

Signals and system
Predictor gain, filter gain and Riccati difference equation The filtering solution is specialised to output estimation with C 1,k = C 2,k and D 1,k = D 2,k .
In the case of input estimation (or equalisation), C 1,k = 0 and D 1,k = I, which results in For problems where C 1,k = I (state estimation) and D 1,k = D 2,k = 0, the filtered state calculation simplifies to If the simplifications B k = D 2,k = 0 are assumed and the pair (A k , C 2,k ) is retained, the Kalman filter degenerates to the RLS algorithm. However, the cost of this model simplification is an increase in mean-square-error.

Problems
Show that an estimate of  k given  k , which minimises ( k , the model x k+1 = A k x k + B k w k , y k = C k x k and the measurements z k = y k + v k . Problem 4 [11], [14], [17], [18], [19]. Consider the standard discrete-time filter equations "But what is it good for?" Engineer at the Advanced Computing Systems Division of IBM, commenting on the micro chip, 1968 Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 98 where K(t k ) = P(t k )C(t k )R -1 (t k ). (Hint: Introduce the quantities A k = (I + A(t k ))Δt, B(t k ) = B k , C(t k ) = C k , / and Δt = t k -t k-1 .) Problem 5. Derive the two-step-ahead predicted error covariance 2 / k k Problem 6. Verify that the Riccati difference equation Problem 7 [16]. Suppose that the systems y 1,k =   w k and y 2,k =   w k have the state-space realisations L k Time-varying filter gain matrix.