Continuous-Time Minimum-Variance Filtering

This book describes the classical smoothing, filtering and prediction techniques together with some more recently developed embellishments for improving performance within applications. It aims to present the subject in an accessible way, so that it can serve as a practical guide for undergraduates and newcomers to the field. The material is organised as a ten-lecture course. The foundations are laid in Chapters 1 and 2, which explain minimum-mean-square-error solution construction and asymptotic behaviour. Chapters 3 and 4 introduce continuous-time and discrete-time minimum-variance filtering. Generalisations for missing data, deterministic inputs, correlated noises, direct feedthrough terms, output estimation and equalisation are described. Chapter 5 simplifies the minimum-variance filtering results for steady-state problems. Observability, Riccati equation solution convergence, asymptotic stability and Wiener filter equivalence are discussed. Chapters 6 and 7 cover the subject of continuous-time and discrete-time smoothing. The main fixed-lag, fixed-point and fixed-interval smoother results are derived. It is shown that the minimum-variance fixed-interval smoother attains the best performance. Chapter 8 attends to parameter estimation. As the above-mentioned approaches all rely on knowledge of the underlying model parameters, maximum-likelihood techniques within expectation-maximisation algorithms for joint state and parameter estimation are described. Chapter 9 is concerned with robust techniques that accommodate uncertainties within problem specifications. An extra term within Riccati equations enables designers to trade-off average error and peak error performance. Chapter 10 rounds off the course by applying the afore-mentioned linear techniques to nonlinear estimation problems. It is demonstrated that step-wise linearisations can be used within predictors, filters and smoothers, albeit by forsaking optimal performance guarantees.

Compared to the Wiener Filter, Kalman's state-space approach has the following advantages.
 It is applicable to time-varying problems.  As noted in [7], [8], the state-space parameters can be linearisations of nonlinear models.  The burdens of spectral factorisation and pole-zero cancelation are replaced by the easier task of solving a Riccati equation.  It is a more intuitive model-based approach in which the estimated states correspond to those within the signal generation process.
Kalman's research at the RIAS was concerned with estimation and control for aerospace systems which was funded by the Air Force Office of Scientific Research. His explanation of why the dynamics-based Kalman filter is more important than the purely stochastic Wiener filter is that "Newton is more important than Gauss" [1]. The continuous-time Kalman filter produces state estimates ˆ( ) x t from the solution of a simple differential equation Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 50 in which it is tacitly assumed that the model is correct, the noises are zero-mean, white and uncorrelated. It is straightforward to include nonzero means, coloured and correlated noises. In practice, the true model can be elusive but a simple (low-order) solution may return a cost benefit.
The Kalman filter can be derived in many different ways. In an early account [3], a quadratic cost function was minimised using orthogonal projections. Other derivation methods include deriving a maximum a posteriori estimate, using Itô's calculus, calculus-of-variations, dynamic programming, invariant imbedding and from the Wiener-Hopf equation [6] - [17]. This chapter provides a brief derivation of the optimal filter using a conditional mean (or equivalently, a least mean square error) approach.
The developments begin by introducing a time-varying state-space model. Next, the state transition matrix is defined, which is used to derive a Lyapunov differential equation. The Kalman filter follows immediately from a conditional mean formula. Its filter gain is obtained by solving a Riccati differential equation corresponding to the estimation error system. Generalisations for problems possessing deterministic inputs, correlated process and measurement noises, and direct feedthrough terms are described subsequently. Finally, it is shown that the Kalman filter reverts to the Wiener filter when the problems are timeinvariant. Figure 1. The continuous-time system  operates on the input signal w(t)   m and produces the output signal y(t)   p .

The Time-varying Signal Model
The focus initially is on time-varying problems over a finite time interval t  [0, T]. A system :  m  →  p is assumed to have the state-space representation where A(t)    n n , B(t)    n m , C(t)    p n , D(t)    p p and w(t) is a zero-mean white process noise with E{w(t)w T (τ)} = Q(t)δ(tτ), in which δ(t) is the Dirac delta function. This "A great deal of my work is just playing with equations and seeing what they give." Paul Arien Maurice Dirac  A(t)  Fig. 1. In many problems of interest, signals are band-limited, that is, the direct feedthrough matrix, D(t), is zero. Therefore, the simpler case of D(t) = 0 is addressed first and the inclusion of a nonzero D(t) is considered afterwards.

The State Transition Matrix
The state transition matrix is introduced below which concerns the linear differential equation (1).

Lemma 1: The equation (1) has the solution
where the state transition matrix, with boundary condition Proof: Differentiating both sides of (3) and using Leibnitz's rule, that is, Substituting (4) and (5) into the right-hand-side of (6) results in

The Lyapunov Differential Equation
The mathematical expectation, To verify this, expand the left-hand-side of (9) to give Using Fubini's theorem, that is, The result (9) follows from the definition (8) within (11).
The Dirac delta function, In the foregoing development, use is made of the partitioning Lemma 2: In respect of equation (1), assume that w(t) is a zero-mean white process with E{w(t)w T (τ)} = Q(t)δ(t -τ) that is uncorrelated with x(t 0 ), namely, E{w(t)x T (t 0 )} = 0. Then the covariances P(t,τ) = E{x(t)x T (τ)} and ( , ) Proof: "It is a mathematical fact that the casting of this pebble from my hand alters the centre of gravity of the universe." Thomas Carlyle www.intechopen.com The assumptions E{w(t)x T (t 0 )} = 0 and E{w(t)w T (τ)} = Q(t)δ(t -τ) together with (15) The above Lyapunov differential equation follows by substituting (16) into (14).
Then the corresponding Lyapunov differential equation is written as

Conditional Expectations
The minimum-variance filter derivation that follows employs a conditional expectation formula, which is set out as follows. Consider a stochastic vector [x T (t) y T (t)] T having means and covariances and x t x y t y y t y respectively, where T yx xy    . Suppose that it is desired to obtain an estimate of x(t) given . A standard approach (e.g., see [18]) is to assume that the solution for "As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality." Albert Einstein where A and b are unknowns to be found. It follows from (20) that

Ay t y t A Ay t b bx t by t A bb
Substituting E{y(t)y T (t)} = T yy + yy  into (21) and completing the squares yields   The second term on the right-hand-side of (22) can be rearranged as The conditional mean estimate (23) is also known as the linear least mean square estimate [18]. An important property of the conditional mean estimate is established below.

Lemma 3 (Orthogonal projections):
In respect of the conditional mean estimate (23), in which the mean and covariances are respectively defined in (18) and (19), the error vector Proof [8], [18]: From (23) and (25), it can be seen that Sufficient background material has now been introduced for the finite-horizon filter (for time-varying systems) to be derived.

Derivation of the Optimal Filter
Consider again the linear time-varying system :  where A(t), B(t), C(t) are of appropriate dimensions and w(t) is a white process with Suppose that observations The objective is to design a linear system  that operates on the measurements z(t) and produces an estimate ˆ( | ) x t t . This output estimation problem is depicted in Fig. 2.
. "Art has a double face, of expression and illusion, just like science has a double face: the reality of error and the phantom of truth." René Daumal It is desired that ˆ( | ) x t t and the estimate ˆ( | ) x t t is a conditional mean estimate, from Lemma 3, criterion (32) will be met. Criterion where (34) is known as the continuous-time Kalman filter (or the Kalman-Bucy filter) and is depicted in Fig. 3. This filter employs the state matrix A(t) akin to the signal generating model  , which Kalman and Bucy call the message process [4]. The matrix K(t) is known as the filter gain, which operates on the error residual, namely the difference between the measurement z(t) and the estimated output C t x t ( ) ( ) . The calculation of an optimal gain is addressed in the next section. The filter calculates conditional mean estimates ˆ( | ) x t t from the measurements z(t).

The Riccati Differential Equation
Denote the state estimation error by ( | ) It is shown below that the filter "Somewhere, something incredible is waiting to be known." Carl Edward Sagan is the solution of the Riccati differential equation for the algebratic Riccati equation (36) satisfying for all t in the interval [0,T]. Then the filter (34) having the gain (35) minimises Applying Lemma 2 to the error system (39) gives which can be rearranged as Setting ( ) P t  equal to the zero matrix results in a stationary point at (35) which leads to (40). From the differential of (40) and it can be seen that ( ) P t  ≥ 0 provided that the assumptions (37) -(38) hold. Therefore, P(t) = The above development is somewhat brief and not very rigorous. Further discussions appear in [4] - [17]. It is tendered to show that the Kalman filter minimises the error covariance, provided of course that the problem assumptions are correct. In the case that it is desired to estimate an arbitrary linear combination C 1 (t) of states, the optimal filter is given by the system This filter minimises the error covariance 1 1 ( ) ( ) ( ) T C t P t C t . The generalisation of the Kalman filter for problems possessing deterministic inputs, correlated noises, and a direct feedthrough term is developed below.

Including Deterministic Inputs
Suppose that the signal model is described by where μ(t) and π(t) are deterministic (or known) inputs. In this case, the filtered state estimate can be obtained by including the deterministic inputs as follows It is easily verified that subtracting (47) from (45) yields the error system (39) and therefore, the Kalman filter's differential Riccati equation remains unchanged.

Example 2.
Suppose that an object is falling under the influence of a gravitational field and it is desired to estimate its position over [0, t] from noisy measurements. Denote the object's vertical position, velocity and acceleration by x  − gt , so the model may be written as where  is the output mapping. Thus, the Kalman filter has the form where the gain K is calculated from (35)

Including Correlated Process and Measurement Noise
Suppose that the process and measurement noises are correlated, that is, The equation for calculating the optimal state estimate remains of the form (34), however, the differential Riccati equation and hence the filter gain are different. The generalisation of the optimal filter that takes into account (52) was published by Kalman in 1963 [5]. Kalman's approach was to first work out the corresponding discrete-time Riccati equation and then derive the continuous-time version.
The correlated noises can be accommodated by defining the signal model equivalently as is a new stochastic input that is uncorrelated with v(t), and is a deterministic signal. It can easily be verified that the system (53) with the parameters (54) -(56), has the structure (26) with E{w(t)v T (τ)} = 0. It is convenient to define "I am tired of all this thing called science here. We have spent millions in that sort of thing for the last few years, and it is time it should be stopped." Simon Cameron www.intechopen.com Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 60

The corresponding Riccati differential equation is obtained by substituting ( ) A t for A(t) and
( ) Q t for Q(t) within (36), namely,

T T T P t A t P t P t A t P t C t R t C t P t B t Q t B t
This can be rearranged to give in which the gain is now calculated as

Including a Direct Feedthrough Matrix
The approach of the previous section can be used to address signal models that possess a direct feedthrough matrix, namely, As before, the optimal state estimate is given by where the gain is obtained by substituting S(t) = Q(t)D T (t) into (60), in which P(t) is the solution of the Riccati differential equation

t A t B t Q t D t R t C t B t Q t Q t D t R t D t Q t B t
Note that the above Riccati equation simplifies to

Riccati Differential Equation Monotonicity
This section sets out the simplifications for the case where the signal model is stationary (or time-invariant). In this situation the structure of the Kalman filter is unchanged but the gain is fixed and can be pre-calculated. Consider the linear time-invariant system together with the observations and E{w(t)v T (τ)} = 0. It follows from the approach of Section 3 that the Riccati differential equation for the corresponding Kalman filter is given by It will be shown that the solution for P(t) monotonically approaches a steady-state asymptote, in which case the filter gain can be calculated before running the filter. The following result is required to establish that the solutions of the above Riccati differential equation are monotonic.

over an interval t  [0, T]. Then the existence of a solution X(t 0 ) ≥ 0 implies X(t) ≥ 0 for all t  [0, T].
Proof: Denote the transition matrix of ( )
"Today's scientists have substituted mathematics for experiments, and they wander off through equation after equation, and eventually build a structure which has no relation to reality." Nikola Tesla www.intechopen.com Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 62 Lemma 6 [19], [20]: Suppose for a t ≥ 0 and a δ t > 0 there exist solutions P(t) ≥ 0 and P(t + δ t ) ≥ 0 of the Riccati differential equations

T T T P t AP t P t A P t C R CP t BQB
respectively, such that P(t) − P(t + δ t ) ≥ 0. Then the sequence of matrices P(t) is monotonic nonincreasing, that is, Proof: The conditions of the Lemma are the initial step of an induction argument. For the induction step, denote ( ) which is of the form (70), and so the result (73) follows.
A monotonic nondecreasing case can be established similarly -see [20].

Observability
The continuous-time system (66) -(67) is termed completely observable if the initial states, x(t 0 ), can be uniquely determined from the inputs and outputs, w(t) and y(t), respectively, over an interval [0, T]. A simple test for observability is is given by the following lemma.
Lemma 7 [10], [21]. Suppose   Since the input signal w(t) within (66) is known, it suffices to consider the unforced system and y(t) = Cx(t), that is, Bw(t) = 0, which leads to The exponential matrix is defined as From the Cayley-Hamilton Theorem [22], for all N ≥ n. Therefore, we can take N = n within (78). Thus, equation (78) uniquely determines x(t 0 ) if and only if O has full rank n.
A system that does not satisfy the above criterion is said to be unobservable. An alternate proof for the above lemma is provided in [10]. If a signal model is not observable then a Kalman filter cannot estimate all the states from the measurements.
"Who will observe the observers ?" Arthur Stanley Eddington Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 64

Example 3. The pair
 is expected to be unobservable because one of the two states appears as a system output whereas the other is hidden. By inspection, the rank of the observability matrix, namely measurements of both states are available. Since the observability matrix is of rank 2, the pair (A, C) is observable, that is, the states can be uniquely reconstructed from the measurements.

The Algebraic Riccati Equation
Some pertinent facts concerning the Riccati differential equation (69) are:  Its solutions correspond to the covariance of the state estimation error.  From Lemma 6, if it is suitably initialised then its solutions will be monotonically nonincreasing.


If the pair (A, C) is observable then the states can be uniquely determined from the outputs.
In view of the above, it is not surprising that if the states can be estimated uniquely, in the limit as t approaches infinity, the Riccati differential equation will have a unique steady state solution.

Lemma 8 [20], [23], [24]: Suppose that Re{λ i (A)} < 0, the pair (A, C) is observable, then the solution of the Riccati differential equation (69) satisfies
where P is the solution of the algebraic Riccati equation A proof that the solution P is in fact unique appears in [24]. A standard way for calculating solutions to (80) arises by finding an appropriate set of Schur vectors for the Hamiltonian , see [25] and the Hamiltonian solver within Matlab TM .
The optimal filter for estimating y(t) from noisy measurements (29) is obtained by using the above state-space parameters within (81) -(83). It has the structure depicted in Figs. 3 and 4. These figures illustrate two features of interest. First, the filter's model matches that within the signal generating process. Second, designing the filter is tantamount to finding an optimal gain. Figure 4. The optimal filter for Example 5.

Equivalence of the Wiener and Kalman Filters
When the model parameters and noise statistics are time-invariant, the Kalman filter reverts to the Wiener filter. The equivalence of the Wiener and Kalman filters implies that spectral factorisation is the same as solving a Riccati equation. This observation is known as the Kalman-Yakubovich-Popov Lemma (or Positive Real Lemma) [15], [26], which assumes familiarity with the following Schur complement formula.
For any matrices 11  , 12  and 22  , where 11  and 22  are symmetric, the following are equivalent.
(i) 11 12 "Mathematics is the queen of sciences and arithmetic is the queen of mathematics." Carl Friedrich Gauss  The Kalman-Yakubovich-Popov Lemma is set out below. Further details appear in [15] and a historical perspective is provided in [26]. A proof of this Lemma makes use of the identity Lemma 9 [15], [26]: Consider the spectral density matrix Then the following statements are equivalent: There exists a nonnegative solution P of the algebraic Riccati equation (80).

Proof: To establish equivalence between (i) and (iii), use (85) within (80) to obtain
Premultiplying and postmultiplying (87) Hence, The Schur complement formula can be used to verify the equivalence of (ii) and (iii).
In Chapter 1, it is shown that the transfer function matrix of the optimal Wiener solution for output estimation is given by is the spectral density matrix of the measurements. It follows from (91) that The Wiener filter (90) requires the spectral factor inverse, 1 ( ) s   , which can be found from (92) and using [I + C(sI − A) -1 K] -1 = I + C(sI − A + KC) -1 K to obtain Substituting (93) into (90) yields which is identical to the minimum-variance output estimator (84).

Conclusion
The Kalman-Bucy filter which produces state estimates ˆ( | ) x t t and output estimates ˆ( | ) y t t from the measurements z(t) = y(t) + v(t) at time t is summarised in Table 2 When the model parameters and noise covariances are time-invariant, the gain is also timeinvariant and can be precalculated. The time-invariant filtering results are summarised in Table 3. In this stationary case, spectral factorisation is equivalent to solving a Riccati equation and the transfer function of the output estimation filter, H OE (s) = , is identical to that of the Wiener filter. It is not surprising that the Wiener and Kalman filters are equivalent since they are both derived by completing the square of the error covariance.

ASSUMPTIONS
MAIN RESULTS

E{w(t)w T (t)} = Q(t) and
E{v(t)v T (t)} = R(t) are known. A(t), B(t) and C(t) are known.

( ) ( ) ( ) ( ) ( ) x t A t x t B t w t
and output factorisation

t t C t x t t 
Filter gain and Riccati differential equation Q(t) > 0 and R(t) > 0.    Table 3. Main results for time-invariant output estimation.
"There are two ways to do great mathematics. The first is to be smarter than everybody else. The second way is to be stupider than everybody else -but persistent." Raoul Bott

Glossary
In addition to the terms listed in Section 1.6, the following have been used herein.
:  p    q A linear system that operates on a p-element input signal and produces a q-element output signal.

A(t), B(t), C(t), D(t)
Time-varying state space matrices of appropriate dimension. The system  is assumed to have the realisation ( ) x t  = A(t)x(t) + B(t)w(t), y(t) = C(t)x(t) + D(t)w(t).

Q(t) and R(t)
Covariance matrices of the nonstationary stochastic signals w(t)and v(t), respectively.

ˆ( | )
x t t Conditional mean estimate of the state x(t) given data at time t.
( | ) x t t  State estimation error which is defined by ( | )

K(t)
Time-varying filter gain matrix.

P(t)
Time-varying error covariance, i.e., { ( ) ( )} T E x t x t   , which is the solution of a Riccati differential equation.

A, B, C, D
Time-invariant state space matrices of appropriate dimension.
Q and R Time-invariant covariance matrices of the stationary stochastic signals w(t) and v(t), respectively.
O Observability matrix.

SNR
Signal to noise ratio. H OE (s) Transfer function matrix of the minimum-variance solution specialised for output estimation.