Continuous-Time Minimum-Mean-Square-Error Filtering

This book describes the classical smoothing, filtering and prediction techniques together with some more recently developed embellishments for improving performance within applications. It aims to present the subject in an accessible way, so that it can serve as a practical guide for undergraduates and newcomers to the field. The material is organised as a ten-lecture course. The foundations are laid in Chapters 1 and 2, which explain minimum-mean-square-error solution construction and asymptotic behaviour. Chapters 3 and 4 introduce continuous-time and discrete-time minimum-variance filtering. Generalisations for missing data, deterministic inputs, correlated noises, direct feedthrough terms, output estimation and equalisation are described. Chapter 5 simplifies the minimum-variance filtering results for steady-state problems. Observability, Riccati equation solution convergence, asymptotic stability and Wiener filter equivalence are discussed. Chapters 6 and 7 cover the subject of continuous-time and discrete-time smoothing. The main fixed-lag, fixed-point and fixed-interval smoother results are derived. It is shown that the minimum-variance fixed-interval smoother attains the best performance. Chapter 8 attends to parameter estimation. As the above-mentioned approaches all rely on knowledge of the underlying model parameters, maximum-likelihood techniques within expectation-maximisation algorithms for joint state and parameter estimation are described. Chapter 9 is concerned with robust techniques that accommodate uncertainties within problem specifications. An extra term within Riccati equations enables designers to trade-off average error and peak error performance. Chapter 10 rounds off the course by applying the afore-mentioned linear techniques to nonlinear estimation problems. It is demonstrated that step-wise linearisations can be used within predictors, filters and smoothers, albeit by forsaking optimal performance guarantees.


Introduction
Optimal filtering is concerned with designing the best linear system for recovering data from noisy measurements. It is a model-based approach requiring knowledge of the signal generating system. The signal models, together with the noise statistics are factored into the design in such a way to satisfy an optimality criterion, namely, minimising the square of the error.
A prerequisite technique, the method of least-squares, has its origin in curve fitting. Amid some controversy, Kepler claimed in 1609 that the planets move around the Sun in elliptical orbits [1]. Carl Freidrich Gauss arrived at a better performing method for fitting curves to astronomical observations and predicting planetary trajectories in 1799 [1]. He formally published a least-squares approximation method in 1809 [2], which was developed independently by Adrien-Marie Legendre in 1806 [1]. This technique was famously used by Giusseppe Piazzi to discover and track the asteroid Ceres using a least-squares analysis which was easier than solving Kepler's complicated nonlinear equations of planetary motion [1]. Andrey N. Kolmogorov refined Gauss's theory of least-squares and applied it for the prediction of discrete-time stationary stochastic processes in 1939 [3]. Norbert Wiener, a faculty member at MIT, independently solved analogous continuous-time estimation problems. He worked on defence applications during the Second World War and produced a report entitled Extrapolation, Interpolation and Smoothing of Stationary Time Series in 1943. The report was later published as a book in 1949 [4].
Wiener derived two important results, namely, the optimum (non-causal) minimum-meansquare-error solution and the optimum causal minimum-mean-square-error solution [4] - [6]. The optimum causal solution has since become known at the Wiener filter and in the time-invariant case is equivalent to the Kalman filter that was developed subsequently. Wiener pursued practical outcomes and attributed the term "unrealisable filter" to the optimal non-causal solution because "it is not in fact realisable with a finite network of resistances, capacities, and inductances" [4]. Wiener's unrealisable filter is actually the optimum linear smoother.
solution to general estimation problem is stated. Second, the general estimation results are specialised to output estimation. The optimal input estimation or equalisation solution is then described. An example, demonstrating the recovery of a desired signal from noisy measurements, completes the chapter.

Signals
Consider two continuous-time, real-valued stochastic (or random) signals ( ) T n w t , with ( ) n, which are said to belong to the space  n , or more concisely v(t), w(t)   n . Let w denote the set of w(t) over all time t, that is, w = { w(t), t  ( , )   }.

Elementary Functions Defined on Signals
The inner product , v w of two continuous-time signals v and w is defined by The 2-norm or Euclidean norm of a continuous-time signal w, 2 w , is defined as The square of the 2-norm, that is, commonly known as energy of the signal w.

Spaces
The Lebesgue 2-space, defined as the set of continuous-time signals having finite 2-norm, is denoted by  2 . Thus, w   2 means that the energy of w is bounded. The following properties hold for 2-norms.
, which is known as the triangle inequality. (iv) where    . An interpretation of (2) is that a parallel combination of  and  is equivalent to the system  +  . From (3), a series combination of  and  is equivalent to the system  . Equation (4) states that scalar amplification of a system is equivalent to scalar amplification of a system's output.

Polynomial Fraction Systems
The Wiener filtering results [4] - [6] were originally developed for polynomial fraction descriptions of systems which are described below. Consider an n th -order linear, timeinvariant system  that operates on an input w(t)   and produces an output y(t)   , that is,  : where a 0 , … a n and b 0 , … b m are real-valued constant coefficients, 0  n a , with zero initial conditions. This differential equation can be written in the more compact form The above theorem is attributed to Parseval whose original work [7] concerned the sums of trigonometric series. An interpretation of (9) is that the energy in the time domain equals the energy in the frequency domain.

Polynomial Fraction Transfer Functions
The steady-state response y(t) = Y(s)e st can be found by applying the complex-exponential input w(t) = W(s)e st to the terms of (6), which results in Therefore, is known as the transfer function of the system. It can be seen from (6) and (12) that the polynomial transfer function coefficients correspond to the system's differential equation coefficients. Thus, knowledge of a system's differential equation is sufficient to identify its transfer function.

Poles and Zeros
The numerator and denominator polynomials of (12) can be factored into m and n linear factors, respectively, to give The Example 1. Consider a system described by the differential equation ( ) y t  = -y(t) + w(t), in which y(t) is the output arising from the input w(t). From (6) and (12), it follows that the corresponding transfer function is given by G(s) = (s + 1) -1 , which possesses a pole at s = -1.
The system in Example 1 operates on a single input and produces a single output, which is known as single-input-single-output (SISO) system. Systems operating on multiple inputs and producing multiple outputs, for example, :  p  →  q , are known as multiple-input-multipleoutput (MIMO). The corresponding transfer function matrices can be written as equation (14), where the components G ij (s) have the polynomial transfer function form within (12) or (13).
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 6

State-Space Systems A system
:  p  →  q having a state-space realisation is written in the form is a state vector and y   q is an output. A is known as the state matrix and D is known as the direct feed-through matrix. The matrices B and C are known as the input mapping and the output mapping, respectively. This system is depicted in Fig. 1.

Euler's Method for Numerical Integration
Differential equations of the form (15) could be implemented directly by analog circuits. Digital or software implementations require a method for numerical integration. A firstorder numerical integration technique, known as Euler's method, is now derived. Suppose that x(t) is infinitely differentiable and consider its Taylor series expansion in the neighbourhood of t 0 Truncating the series after the first order term yields the approximation Thus, the continuous-time linear system (15) could be approximated in discrete-time by iterating and (18) provided that δ t is chosen to be suitably small. Applications of (18) -(19) appear in [9] and in the following example.
"It is important that students bring a certain ragamuffin, barefoot irreverence to their studies; they are not here to worship what is known, but to question it." Jacob Bronowski

www.intechopen.com
Continuous-Time Minimum-Mean-Square-Error Filtering 7 Example 2. In respect of the continuous-time state evolution (15), consider A = −1, B = 1 together with the deterministic input w(t) = sin(t) + cos(t). The states can be calculated from the known w(t) using (19) and the difference equation (18). In this case, the state error is given by e(t k ) = sin(t k ) -x(t k ). In particular, root-mean-square-errors of 0.34, 0.031, 0.0025 and 0.00024, were observed for δ t = 1, 0.1, 0.01 and 0.001, respectively. This demonstrates that the first order approximation (18) can be reasonable when δ t is sufficiently small.

State-Space Transfer Function Matrix
The transfer function matrix of the state-space system (15) -(16) is defined by in which s again denotes the Laplace transform variable.

Canonical Realisations
The mapping of a polynomial fraction transfer function (12) to a state-space representation (20) is not unique. Two standard state-space realisations of polynomial fraction transfer functions are described below. Assume that: the transfer function has been expanded into the sum of a direct feed-though term plus a strictly proper transfer function, in which the order of the numerator polynomial is less than the order of the denominator polynomial; and the strictly proper transfer function has been normalised so that a n = 1. Under these assumptions, the system can be realised in the controllable canonical form which is parameterised by [10] "Science is everything we understand well enough to explain to a computer. Art is everything else." David Knuth www.intechopen.com Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 8 The system can be also realised in the observable canonical form which is parameterised by

Asymptotic Stability
Consider a continuous-time, linear, time-invariant n th -order system  that operates on an input w and produces an output y. The system  is said to be asymptotically stable if the output remains bounded, that is, y   2 , for any w   2 . This is also known as boundedinput-bounded-output stability. Two equivalent conditions for  to be asymptotically stable are:  The real part of the eigenvalues of the system's state matrix are in the left-handplane, that is, for A of (20), The real part of the poles of the system's transfer function are in the left-handplane, that is, for α i of (13), Re{ } i  < 0, i = 1 …n.

Example 6.
A state-space system having A = -1, B = C = 1 and D = 0 is stable, since λ(A) = -1 is in the left-hand-plane. Equivalently, the corresponding transfer function G(s) = (s + 1) -1 has a pole at s = -1 which is in the left-hand-plane and so the system is stable. Conversely, the transfer function G T (-s) = (1 -s) -1 is unstable because it has a singularity at the pole s = 1 which is in the right hand side of the complex plane. G T (-s) is known as the adjoint of G(s) which is discussed below.

Adjoint Systems
An important concept in the ensuing development of filters and smoothers is the adjoint of a system. Let :  p  →  q be a linear system operating on the interval [0, T]. Then :   H q →  p , the adjoint of  , is the unique linear system such that <y,  w> = < H  y, w>, for all y   q and w   p . The following derivation is a simplification of the time-varying version that appears in [11].

Lemma 1 (State-space representation of an adjoint system): Suppose that a continuous-time linear time-invariant system  is described by
with ζ(T) = 0.

Proof: The system (21) -(22) can be written equivalently
Integrating the last term by parts gives where H  is given by (23) -(24).
□ "If you thought that science was certain-well, that is just an error on your part." Richard Phillips Feynman www.intechopen.com Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 10 Thus, the adjoint of a system having the parameters Adjoint

Causal and Noncausal Systems
A causal system is a system that depends exclusively on past and current inputs.
The positive sign of ( ) x t  within (29) denotes a system that proceeds forward in time. This is called a causal system because it depends only on past and current inputs.
Example 9. The negative differential of ξ(t) with respect to t is defined by The negative sign of ( ) t   within (30) denotes a system that proceeds backwards in time. Since this system depends on future inputs, it is termed noncausal. Note that Re{ ( Hence, if causal system (21) -(22) is stable, then its adjoint (23) -(24) is unstable.

Realising Unstable System Components
Unstable systems are termed unrealisable because their outputs are not in  2 that is, they are unbounded. In other words, they cannot be implemented as forward-going systems. It follows from the above discussion that an unstable system component can be realised as a stable noncausal or backwards system. Suppose that the time domain system  is stable. The adjoint system  H z u  can be realised by the following three-step procedure. "We haven't the money, so we've got to think." Baron Ernest Rutherford www.intechopen.com Time-reverse the input signal u(t), that is, construct u(τ), where τ = T -t is a time-togo variable (see [12]).  Realise the stable system  T Time-reverse the output signal z(τ), that is, construct z(t).
The above procedure is known as noncausal filtering or smoothing; see the discrete-time case described in [13]. Thus, a combination of causal and non-causal system components can be used to implement an otherwise unrealisable system. This approach will be exploited in the realisation of smoothers within subsequent sections.

Example 10.
Suppose that it is required to realise the unstable system

Power Spectral Density
The power of a voltage signal applied to a 1-ohm load is defined as the squared value of the signal and is expressed in watts. The power spectral density is expressed as power per unit bandwidth, that is, W/Hz. Consider again a linear, time-invariant system y =  w and its corresponding transfer function matrix G(s). Assume that w is a zero-mean, stationary, white noise process with , in which δ denotes the Dirac delta function. Then ( )  yy s , the power spectral density of y, is given by which has the property ( ) The total energy of a signal is the integral of the power of the signal over time and is expressed in watt-seconds or joules. From Parseval's theorem (9), the average total energy of y(t) is which is equal to the area under the power spectral density curve.
"Time is what prevents everything from happening at once." John Archibald Wheeler Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 12

Spectral Factorisation
Suppose that noisy measurements of a linear, time-invariant system  , described by (21) -(22), are available, where v(t)   q is an independent, zero-mean, stationary white noise process with denote the spectral density matrix of the measurements z(t). Spectral factorisation was pioneered by Wiener (see [4] and [5]). It refers to the problem of decomposing a spectral density matrix into a product of a stable, minimum-phase matrix transfer function and its adjoint. In the case of the output power spectral density (36), a spectral factor ( ) The problem of spectral factorisation within continuous-time Wiener filtering problems is studied in [14]. The roots of the transfer function polynomials need to be sorted into those within the left-hand-plane and the right-hand plane. This is an eigenvalue decomposition problem -see the survey of spectral factorisation methods detailed in [11].

Filter Derivation
Now that some underlying frequency-domain concepts have been introduced, the Wiener filter [4] - [6] can be described. A Wiener-Hopf derivation of the Wiener filter appears in [4], [6]. This section describes a simpler completing-the-square approach (see [14], [16]). Consider a stable linear time-invariant system having a transfer function matrix G 2 (s) = C 2 (sI -A) -1 B + D 2 . Let Y 2 (s), W(s), V(s) and Z(s) denote the Laplace transforms of the system's output, measurement noise, process noise and observations, respectively, so that Consider also a fictitious reference system having the transfer function G 1 (s) = C 1 (sI -A) -1 B + D 1 as shown in Fig. 3. The problem is to design a filter transfer function H(s) to calculate is minimised. It follows from Fig. 3 that E(s) is generated by The error power spectrum density matrix is denoted by ( )  ee s and given by the covariance of E(s), that is, It follows that the total energy of the error signal is given by Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 14 The first term on the right-hand-side of (43) is independent of H(s) and represents a lower bound of ( ) j ee j s ds      . The second term on the right-hand-side of (43) may be minimised by a judicious choice for H(s).   poles. This optimal noncausal solution is actually a smoother, which can be realised by a combination of forward and backward processes. Wiener called (44) the optimal unrealisable solution because it cannot be realised by a memory-less network of capacitors, inductors and resistors [4].
The transfer function matrix of a realisable filter is given by in which { } + denotes the causal part. A procedure for finding the causal part of a transfer function is described below.

Finding the Causal Part of a Transfer Function
The causal part of transfer function can be found by carrying out the following three steps.  If the transfer function is not strictly proper, that is, if the order of the numerator is not less than the degree of the denominator, then perform synthetic division to isolate the constant term.  Expand out the (strictly proper) transfer function into the sum of stable and unstable partial fractions.  The causal part is the sum of the constant term and the stable partial fractions.
Incidentally, the noncausal part is what remains, namely the sum of the unstable partial fractions.
Since G 2 (s) possesses equal order numerator and denominator polynomials, synthetic division is required, which yields A partial fraction expansion results in "There is an astonishing imagination, even in the science of mathematics." Francois-Marie Arouet de Voltaire Thus, the causal part of G(s) is {G(s)} + = 1 -

Minimum-Mean-Square-Error Output Estimation
In output estimation, the reference system is the same as the generating system, as depicted in Fig. 4. The simplification of the optimal noncausal solution (44) of Theorem 2 for the case G 1 (s) = G 2 (s) can be expressed as   , which illustrates the low measurement noise asymptote (48). Some sample trajectories from a simulation conducted with δ t = 0.001 s are shown in Fig. 5. The input measurements are shown in Fig. 5(a). It can be seen that the filtered signal (the solid line of Fig. 5 (b)) estimates the system output (the dotted line of Fig. 5(b)).

Minimum-Mean-Square-Error Input Estimation
In input estimation problems, it is desired to estimate the input process w(t), as depicted in Fig. 6. This is commonly known as an equalisation problem, in which it is desired to mitigate the distortion introduced by a communication channel G 2 (s). The simplification of the general noncausal solution (44) of Theorem 2 for the case of G 2 (s) = I results in Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 18 Figure 6. The s-domain input estimation problem.

Conclusion
Continuous-time, linear, time-invariant systems can be described via either a differential equation model or as a state-space model. Signal models can be written in the time-domain as Under the time-invariance assumption, the system transfer function matrices exist, which are written as polynomial fractions in the Laplace transform variable Thus, knowledge of a system's differential equation is sufficient to identify its transfer function. If the poles of a system's transfer function are all in the left-hand-plane then the system is asymptotically stable. That is, if the input to the system is bounded then the output of the system will be bounded.
The optimal solution minimises the energy of the error in the time domain. It is found in the frequency domain by minimising the mean-square-error. The main results are summarised in Table 1. The optimal noncausal solution has unstable factors. It can only be realised by a combination of forward and backward processes, which is known as smoothing. The optimal causal solution is also known as the Wiener filter.
In output estimation problems, C 1 = C 2 , D 1 = D 2 , that is, G 1 (s) = G 2 (s) and when the measurement noise becomes negligible, the solution approaches a short circuit. In input estimation or equalisation, C 1 = 0, D 1 = I, that is, G 1 (s) = I and when the measurement noise becomes negligible, the optimal equaliser approaches the channel inverse, provided the inverse exists. Conversely, when the problem is dominated by measurement noise then the equaliser approaches an open circuit.
"Read Euler, read Euler, he is our master in everything." Pierre-Simon Laplace MAIN RESULTS

E{w(t)} = E{W(s)} = E{v(t)} = E{V(s)} = 0. E{w(t)w T (t)} = E{W(s)W T (s)} = Q > 0 and E{v(t)v T (t)} = E{V(s)V T (s)} =
R > 0 are known. A, B, C 1 , C 2 , D 1 and D 2 are known. G 1 (s) and G 2 (s) are stable, i.e., Re{λ i (A)} < 0. This book describes the classical smoothing, filtering and prediction techniques together with some more recently developed embellishments for improving performance within applications. It aims to present the subject in an accessible way, so that it can serve as a practical guide for undergraduates and newcomers to the field. The material is organised as a ten-lecture course. The foundations are laid in Chapters 1 and 2, which explain minimum-mean-square-error solution construction and asymptotic behaviour. Chapters 3 and 4 introduce continuous-time and discrete-time minimum-variance filtering. Generalisations for missing data, deterministic inputs, correlated noises, direct feedthrough terms, output estimation and equalisation are described. Chapter 5 simplifies the minimum-variance filtering results for steady-state problems. Observability, Riccati equation solution convergence, asymptotic stability and Wiener filter equivalence are discussed. Chapters 6 and 7 cover the subject of continuous-time and discrete-time smoothing. The main fixed-lag, fixedpoint and fixed-interval smoother results are derived. It is shown that the minimum-variance fixed-interval smoother attains the best performance. Chapter 8 attends to parameter estimation. As the above-mentioned approaches all rely on knowledge of the underlying model parameters, maximum-likelihood techniques within expectation-maximisation algorithms for joint state and parameter estimation are described. Chapter 9 is concerned with robust techniques that accommodate uncertainties within problem specifications. An extra term within Riccati equations enables designers to trade-off average error and peak error performance. Chapter 10 rounds off the course by applying the afore-mentioned linear techniques to nonlinear estimation problems. It is demonstrated that step-wise linearisations can be used within predictors, filters and smoothers, albeit by forsaking optimal performance guarantees.

How to reference
In order to correctly reference this scholarly work, feel free to copy and paste the following: