Nonlinear Prediction, Filtering and Smoothing

10.


Introduction
The Kalman filter is widely used for linear estimation problems where its behaviour is wellunderstood. Under prescribed conditions, the estimated states are unbiased and stability is guaranteed. Many real-world problems are nonlinear which requires amendments to linear solutions. If the nonlinear models can be expressed in a state-space setting then the Kalman filter may find utility by applying linearisations at each time step. Linearising means finding tangents to the curves of interest about the current estimates, so that the standard filter recursions can be employed in tandem to produce predictions for the next step. This approach is known as extended Kalman filtering -see [1] - [5].
Extended Kalman filters (EKFs) revert to optimal Kalman filters when the problems become linear. Thus, EKFs can yield approximate minimum-variance estimates. However, there are no accompanying performance guarantees and they fall into the try-at-your-own-risk category. Indeed, Anderson and Moore [3] caution that the EKF "can be satisfactory on occasions". A number of compounding factors can cause performance degradation. The approximate linearisations may be crude and are carried out about estimated states (as opposed to true states). Observability problems occur when the variables do not map onto each other, giving rise to discontinuities within estimated state trajectories. Singularities within functions can result in non-positive solutions to the design Riccati equations and lead to instabilities.
The discussion includes suggestions for performance improvement and is organised as follows. The next section begins with Taylor series expansions, which are prerequisites for linearisation. First, second and third-order EKFs are then derived. EKFs tend be prone to instability and a way of enforcing stability is to masquerade the design Riccati equation by a faux version. This faux algebraic Riccati equation technique [6] - [10] is presented in Section 10.3. In Section 10.4, the higher order terms discarded by an EKF are treated as uncertainties. It is shown that a robust EKF arises by solving a scaled H∞ problem in lieu of one possessing uncertainties. Nonlinear smoother procedures can be designed similarly. The use of fixedlag and Rauch-Tung-Striebel smoothers may be preferable from a complexity perspective. However, the approximate minimum-variance and robust smoothers, which are presented in Section 10.5, revert to optimal solutions when the nonlinearities and uncertainties diminish. Another way of guaranteeing stability is to by imposing constraints and one such approach is discussed in Section 10.6.

Taylor Series Expansion A nonlinear function ( ) :
   n k a x having n continuous derivatives may be expanded as a Taylor series about a point x 0 is known as the gradient of a k (.) and is called a Hessian matrix.

Nonlinear Signal Models
Consider nonlinear systems having state-space representations of the form Similarly, Taylor series for ( ) : and 0 0 respectively.

First-Order Extended Kalman Filter
Suppose that filtered estimates / k k x of x k are desired given observations where v k is a measurement noise sequence. A first-order EKF for the above problem is developed below. Following the approach within [3], the nonlinear system (2) -(3) is approximated by where A k , B k , C k ,  k and π k are found from suitable truncations of the Taylor series for each nonlinearity. From Chapter 4, a filter for the above model is given by "You will always define events in a manner which will validate your agreement with reality." Steve Anthony Maraboli www.intechopen.com Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 248 [5]) to linearise about the current conditional mean estimate, retain up to first order terms within the corresponding Taylor series and assume and where Substituting for  k and π k into (10) - (11) gives Note that nonlinearities enter into the state correction (14) and prediction (15), whereas linearised matrices A k , B k and C k are employed in the Riccati equation and gain calculations.
In the case of scalar states, the linearisations are . In texts on optimal filtering, the recursions (14) - (15) are either called a first-order EKF or simply an EKF, see [1] - [5]. Two higher order versions are developed below.

Second-Order Extended Kalman Filter
Truncating the series (1) after the second-order term and observing that "People take the longest possible paths, digress to numerous dead ends, and make all kinds of mistakes. Then historians come along and write summaries of this messy, nonlinear process and make it appear like a simple straight line." Dean L. Kamen . Similarly for the system output, where Substituting for  k and π k into the filtering and prediction recursions (10) -(11) yields the second-order EKF The above form is described in [2]. The further simplifications [4], [5].

Third-Order Extended Kalman Filter
Higher order EKFs can be realised just as elegantly as its predecessors. Retaining up to third-order terms within (1) results in where and The resulting third-order EKF is defined by (18) - (19) in which the gain is now calculated using (21) and (23).
Example 1. Consider a linear state evolution x k+1 = Ax k + w k , with A = 0.5, w k   , Q = 0.05, a nonlinear output mapping y k = sin(x k ) and noisy observations z k = y k + v k , v k   . The firstorder EKF for this problem is given by The filtering step within the second-order EKF is amended to The modified output linearisation for the third-order EKF is Simulations were conducted in which the signal-to-noise-ratio was varied from 20 dB to 40 dB for N = 200,000 realisations of Gaussian noise sequences. The mean-square-errors exhibited by the first, second and third-order EKFs are plotted in Fig. 1. The figure demonstrates that including higher-order Taylor series terms within the filter can provide small performance improvements but the benefit diminishes with increasing measurement noise.

A Nonlinear Observer
The previously-described Extended-Kalman filters arise by linearising the signal model about the current state estimate and using the linear Kalman filter to predict the next estimate. This attempts to produce a locally optimal filter, however, it is not necessarily stable because the solutions of the underlying Riccati equations are not guaranteed to be positive definite. The faux algebraic Riccati technique [6] - [10] seeks to improve on EKF performance by trading off approximate optimality for stability. The familiar structure of the EKF is retained but stability is achieved by selecting a positive definite solution to a faux Riccati equation for the gain design.
Assume that data is generated by the following signal model comprising a stable, linear state evolution together with a nonlinear output mapping where the components of c k (.) are assumed to be continuous differentiable functions. Suppose that it is desired to calculate estimates of the states from the measurements. A nonlinear observer may be constructed having the form where g k (.) is a gain function to be designed. From (24) -(26), the state prediction error is given by "The observer, when he seems to himself to be observing a stone, is really, if physics is to be believed, observing the effects of the stone upon himself." Bertrand Arthur William Russell www.intechopen.com Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 252 where The Taylor series expansion of c k (.) to first order terms leads to ε k ≈ to be a linear function of / 1 k k x   to first order terms. It will be shown that for certain classes of problems, this objective can be achieved by a suitable choice of a linear bounded matrix function of the states D k , resulting in the time-varying gain function g k (ε k ) = K k D k ε k , where K k is a gain matrix of appropriate dimension. For example, consider  p m possesses approximately constant terms. Then the locally linearised error (27) may be written as is completely observable, then the asymptotic stability of (28) can be guaranteed by selecting the gain such that ( ) A method for selecting the gain is described below.

Gain Selection
From (28), an approximate equation for the error covariance P k/k-1 = which can be written as In an EKF for the above problem, the gain is obtained by solving the above Riccati difference equation and calculating The faux algebraic Riccati equation approach [6] - [10] is motivated by connections between Riccati difference equation and algebraic Riccati equation solutions. Indeed, it is noted for some nonlinear problems that the gains can converge to a steady-state matrix [3]. This technique is also known as 'covariance setting'. Following the approach of [10], the Riccati difference equation (30) may be masqueraded by the faux algebraic Riccati equation That is, rather than solve (30), an arbitrary positive definite solution  k is assumed instead and then the gain at each time k is calculated from (31) -(32) using  k in place of P k/k-1 .

Tracking Multiple Signals
Consider the problem of tracking two frequency or phase modulated signals which may be modelled by equation (34), instantaneous amplitude, frequency and phase components, respectively.
2 ) v  . Expanding the prediction error to linear terms yields This form suggests the choice "If you haven't found something strange during the day, it hasn't been much of a day." John Archibald Wheeler www.intechopen.com Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 254 In the multiple signal case, the linearization k C = D k C k does not result in perfect decoupling.
While the diagonal blocks reduce to ( , ) Assuming a symmetric positive definite solution to (33) of the form  k =

Stability Conditions
In order to establish conditions for the error system (28) to be asymptotically stable, the problem is recast in a passivity framework as follows. Let w = (1) Consider the configuration of Fig. 2, in which there is a cascade of a stable linear system  and a nonlinear function matrix γ(.) acting on e. It follows from the figure that satisfies some sector conditions which may be interpreted as bounds existing on the slope of the components of (.); see Theorem 14, p. 7 of [11].
Consider the first term on the right hand side of (39). Since the ( ) e Using the approach of [11] [11]), the Schwartz inequality and the triangle inequality, it can be shown that is finite, it also follows that If G(z) is stable and bounded on the unit circle, then the test condition (38) becomes see pp. 175 and 194 of [11].

Applications
Example 2 [10]. Consider a unity-amplitude frequency modulated (FM) signal modelled as for an FM demodulator may be written as for gains K 1 , K 2   to be designed. In view of the form (36), the above error system is reformatted as where γ(x) = x -sin(x). The z-transform of the linear part of (43) is G(z) = (K 2 z + K 2 +   K 1 ) (z 2 + (K 2 -1 -  )z + K 1 + 1 -  K 2 ) -1 . The nonlinearity satisfies the sector condition (37) for  = 1.22. Candidate gains may be assessed by checking that G(z) is stable and the test condition (41). The stable gain space calculated for the case of   = 0.9 is plotted in Fig. 3.
The gains are required to lie within the shaded region of the plot for the error system (42) to be asymptotically stable.   A speech utterance, namely, the phrase "Matlab is number one", was sampled at 8 kHz and used to synthesize a unity-amplitude FM signal. An EKF demodulator was constructed for the above model with 2 w  = 0.02. In a nonlinear observer design it was found that suitable parameter choices were  k = 0.001 0.08 The nonlinear observer gains were censored at each time k according to the stable gain space of Fig. 3. The results of a simulation study using 100 realisations of Gaussian measurement noise sequences are shown in Fig. 4. The figure demonstrates that enforcing stability can be beneficial at low SNR, at the cost of degraded high-SNR performance.
Example 3 [10]. Suppose that there are two superimposed FM signals present in the same frequency channel. Neglecting observation noise, a suitable approximation of the demodulator error system in the form (36) is given by where The linear part of (44) may be evaluated at each time k for the above parameter values with β = 1.2, q = 0.001, δ = 0.82 and used to censor the gains. The resulting co-channel demodulation performance is shown in Fig. 5. It can be seen that the nonlinear observer significantly outperforms the EKF at high SNR.
Two mechanisms have been observed for occurrence of outliers or faults within the cochannel demodulators. Firstly errors can occur in the state attribution, that is, there is correct tracking of some component speech message segments but the tracks are inconsistently associated with the individual signals. This is illustrated by the example frequency estimate tracks shown in Figs. 6 and 7. The solid and dashed lines in the figures indicate two sample co-channel frequency tracks. Secondly, the phase unwrapping can be erroneous so that the frequency tracks bear no resemblance to the underlying messages. These faults can occur without any significant deterioration in the error residual.   The EKF demodulator is observed to be increasingly fault prone at higher SNR. This arises because lower SNR designs possess narrower bandwidths and so are less sensitive to nearby frequency components. The figures also illustrate the trade-off between stability and optimality. In particular, it can be seen from Fig. 6, that the sample EKF speech estimates exhibit faults in the state attribution. This contrasts with Fig. 7, where the nonlinear observer's estimates exhibit stable state attribution at the cost of degraded speech fidelity.

Nonlinear Problem Statement
Consider again the nonlinear, discrete-time signal model (2), (7). It is shown below that the H ∞ techniques of Chapter 9 can be used to recast nonlinear filtering problems into a model uncertainty setting. The following discussion attends to state estimation, that is, C 1,k = I is assumed within the problem and solution presented in Section 9.3.2.
The Taylor series expansions of the nonlinear functions a k (.), b k (.) and c k (.) about filtered and predicted estimates / k k x and / 1 k k x  may be written as where 1 (.)  , 2 (.)  , 3 (.)  are uncertainties that account for the higher order terms, / k k Substituting (45) -(47) into the nonlinear system (2), (7) gives the linearised system where Note that the first-order EKF for the above system arises by setting the uncertainties 1 (.)  , 2 (.)  and 3 (.)  to zero as

Robust Solution
Following the approach in Chapter 9, instead of addressing the problem (48) -(49) which possesses uncertainties, an auxiliary H ∞ problem is defined as A sufficient solution to the auxiliary H ∞ problem (55) -(57) can be obtained by solving another problem in which w k and v k are scaled in lieu of the additional inputs s k and r k . The scaled H ∞ problem is defined by where c w , c v   are to be found.

Lemma 2 [12]: The solution of the H ∞ problem (60) -(62), where v k is scaled by
and w k is scaled by is sufficient for the solution of the auxiliary H ∞ problem (55) -(57).

Proof: If the H ∞ problem (50) -(52) has been solved then there exists a 0
  such that The robust first-order extended Kalman filter for state estimation is given by (50) -(52), and (54). As discussed in Chapter 9, a search is required for a minimum γ such that  Example 4 [12]. Suppose that an FM signal is generated by 17 The objective is to construct an FM demodulator that produces estimates of the frequency message ω k from the noisy in-phase and quadrature measurements (1) improve on the EKF. However, when 2 w  = 1, the problem is substantially nonlinear and a performance benefit can be observed. A robust EKF demodulator was designed with , δ 1 = 0.1, δ 2 = 4.5 and δ 3 = 0.001. It was found that γ = 1.38 was sufficient for P k/k-1 of the above Riccati difference equation to always be positive definite. A histogram of the observed frequency estimation error is shown in Fig. 8, which demonstrates that the robust demodulator provides improved mean-square-error performance. For sufficiently large 2 w  , the output of the above model will resemble a digital signal, in which case a detector may outperform a demodulator.

Approximate Minimum-Variance Smoother
Consider again a nonlinear estimation problem where x k+1 = a k (x k ) + B k w k , z k = c k (x k ) + v k , with x k   , in which the nonlinearities a k (.), c k (.) are assumed to be smooth, differentiable functions of appropriate dimension. The linearisations akin to Extended Kalman filtering may be applied within the smoothers described in Chapter 7 in the pursuit of performance improvement. The fixed-lag, Fraser-Potter and Rauch-Tung-Striebel smoother recursions are easier to apply as they are less complex. The application of the minimum-variance smoother can yield approximately optimal estimates when the problem becomes linear, provided that the underlying assumptions are correct.
"You can recognize a pioneer by the arrows in his back" Beverly Rubik www.intechopen.com Step 2. Operate (69) -(71) on the time-reversed transpose of α k . Then take the timereversed transpose of the result to obtain β k .
Step 3. Calculate the smoothed output estimate from

Robust Smoother
From the arguments within Chapter 9, a smoother that is robust to uncertain w k and v k can be realised by replacing the error covariance correction (72) by within Procedure 1. As discussed in Chapter 9, a search for a minimum γ such that A unity-amplitude FM signal was synthesized using μ  = 0.99 and the SNR was varied in 1.5 dB steps from 3 dB to 15 dB. The mean-square errors were calculated over 200 realisations of Gaussian measurement noise and are shown in Fig. 9. It can be seen from the figure, that at 7.5 dB SNR, the first-order EKF improves on the FM discriminator MSE by about 12 dB. The improvement arises because the EKF "The farther the experiment is from theory, the closer it is to the Nobel Prize." Irène Joliot-Curie demodulator exploits the signal model whereas the FM discriminator does not. The figure shows that the approximate minimum-variance smoother further reduces the MSE by about 2 dB, which illustrates the advantage of exploiting all the data in the time interval. In the robust designs, searches for minimum values of γ were conducted such that the corresponding Riccati difference equation solutions were positive definite over each noise realisation. It can be seen from the figure at 7.5 dB SNR that the robust EKF provides about a 1 dB performance improvement compared to the EKF, whereas the approximate minimumvariance smoother and the robust smoother performance are indistinguishable.
This nonlinear example illustrates once again that smoothers can outperform filters. Since a first-order speech model is used and the Taylor series are truncated after the first-order terms, some model uncertainty is present, and so the robust designs demonstrate a marginal improvement over the EKF. However, as constraints are not easily described within state-space frameworks, many techniques for constrained filtering and smoothing are reported in the literature. An early technique for constrained filtering involves augmenting the measurement vector with perfect observations [14]. The application of the perfect-measurement approach to filtering and fixed-interval smoothing is described in [15].
Constraints can be applied to state estimates, see [16], where a positivity constraint is used within a Kalman filter and a fixed-lag smoother. Three different state equality constraint approaches, namely, maximum-probability, mean-square and projection methods are described in [17]. Under prescribed conditions, the perfect-measurement and projection approaches are equivalent [5], [18], which is identical to applying linear constraints within a form of recursive least squares.
In the state equality constrained methods [5], [16] - [18], a constrained estimate can be calculated from a Kalman filter's unconstrained estimate at each time step. Constraint information could also be embedded within nonlinear models for use with EKFs. A simpler, low-computation-cost technique that avoids EKF stablity problems and suits real-time implementation is described in [19]. In particular, an on-line procedure is proposed that involves using nonlinear functions to censor the measurements and subsequently applying the minimum-variance filter recursions. An off-line procedure for retrospective analyses is also described, where the minimum-variance fixed-interval smoother recursions are applied to the censored measurements. In contrast to the afore-mentioned techniques, which employ constraint matrices and vectors, here constraint information is represented by an exogenous input process. This approach uses the Bounded Real Lemma which enables the nonlinearities to be designed so that the filtered and smoothed estimates satisfy a performance criterion. 22

Problem Statement
The ensuing discussion concerns odd and even functions which are defined as follows. Problems are considered where stochastic random variables are subjected to inequality constraints. Therefore, nonlinear censoring functions are introduced whose outputs are constrained to lie within prescribed bounds. Let β  p  and : p o g  → p  denote a constraint vector and an odd function of a random variable X  p  about its expected value E{X}, respectively. Define the censoring function By inspection of (75) -(76), g(X) is constrained within E{X} ± β. Suppose that the probability density function of X about E{X} is even, that is, is symmetric about E{X}. Under these conditions, the expected value of g(X) is given by Thus, a constraining process can be modelled by a nonlinear function. Equation (77) states that g(X) is unbiased, provided that g o (X,β) and f X (X) are odd and even functions about E{X}, respectively. In the analysis and examples that follow, attention is confined to systems having zero-mean inputs, states and outputs, in which case the censoring functions are also centred on zero, that is, E{X} = 0. 23   m represent a stochastic white input process having an even Kronecker delta function. Suppose that the states of a system  : m  → p  are realised by where A k  n n   and B k  n m   . Since w k is zero-mean, it follows that linear combinations of the states are also zero-mean. Suppose also that the system outputs, y k , are generated by 1, 1, where C j,k is the j th row of o j k k j k g C x  , j = 1, … p, is an odd censoring function centred on zero. The outputs y j,k are constrained to lie within For example, if the system outputs represent the trajectories of pedestrians within a building then the constraint process could include knowledge about wall, floor and ceiling positions.
Similarly, a vehicle trajectory constraint process could include information about building and road boundaries.
Assume that observations z k = y k + v k are available, where v k  p  is a stochastic, white measurement noise process having an even probability density function, with Thus, the energy of the system's output is bounded from above by the energy of the constraint process. 24 The minimum-variance filter and smoother which produce estimates of a linear system's output, minimise the mean square error. Here, it is desired to calculate estimates that trade off minimum mean-square-error performance and achieve Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 268 Censoring the measurements is suggested as a low-implementation-cost approach to constrained filtering. Design constraints are sought for the measurement censoring functions so that the outputs of a subsequent filter satisfy the performance objective (82). The recursions akin to the minimum-variance filter are applied to calculate predicted and filtered state estimates from the constrained measurements k z at time k. That is, the output mapping C k is retained within the linear filter design even though nonlinearities are present with (83). The predicted states, filtered states and output estimates are respectively obtained as Nonzeromean sequences can be accommodated using deterministic inputs as described in Chapter 4. Since a nonlinear system output (79) and a nonlinear measurement (83) are assumed, the estimates calculated from (85) -(87) are not optimal. Some properties that are exhibited by these estimates are described below. 26 Lemma 3 [19]: In respect of the filter (85) -(87) which operates on the constrained measurements (83), suppose the following: (i) the probability density functions associated with w k and v k are even; (ii) the nonlinear functions within (79) and (83) Subtracting (88) from (78) gives From above assumptions, the second and third terms on the right-hand-side of (89) are zero. The is zero. The first term on the right-hand-side of (89) pertains to the unconstrained Kalman filter and is zero by induction. Thus, (ii) Condition (iii) again serves as an induction assumption. It follows from (86) that Recall that the Bounded Real Lemma (see Lemma 7 of Chapter 9) specifies a bound for a ratio of a system's output and input energies. This lemma is used to find a design for γ within (83) as described below. The proof follows mutatis mutandis from the approach within the proofs of Lemma 5 of Chapter 7 and Lemma 3. An analogous result to Lemma 5 is now stated.
Lemma 7 [19]: Define the smoother output estimation error as y  = y  ŷ . Under the conditions of Lemma 3, y   2  .
The proof follows mutatis mutandis from that of Lemma 5. Two illustrative examples are set out below. A GPS and inertial navigation system integration application is detailed in [19].
Example 5 [19]. Consider the saturating nonlinearity 29 operating the minimum-variance filter recursions on the raw data z k = y k + v k are indicated by the outer black region of Fig. 11. It can be seen that the filter outputs do not satisfy the performance objective (82), which motivates the pursuit of constrained techniques. A minimum value of γ 2 = 1.24 was found for the solutions of the Riccati difference equation mentioned specified within Lemma 4 to be positive definite. The filter (85) -(87) was applied to the censored measurements k z = 1, using (91). The limits of the observed distribution of the constrained filter estimates are indicated by the inner white region of Fig. 11. The figure shows that the constrained filter estimates satisfy (82), which illustrates Lemma 5.
Example 6 [19]. Measurements were similarly synthesized using the parameters of Example 5 to demonstrate constrained fixed-interval smoother performance. A minimum value of γ 2 = 5.6 was found for the solutions of the Riccati difference equation mentioned within Lemma 4 to be positive definite. The superimposed distributions of the unconstrained and constrained smoothers are respectively indicated by the inner and outer black regions of Fig. 12. It can be seen by inspection of the figure that the constrained smoother estimates meets (80), where as those produced by the standard smoother do not. 30 30"An expert is a man who has made all the mistakes which can be made in a very narrow field." Niels Henrik David Bohr  Smoothing, Filtering and Prediction: Estimating the Past, Present and Future 272 The above examples involved searching for minimum value of γ 2 for the existence of positive definite solutions for the Riccati equation alluded to within Lemma 4. The need for a search may not be apparent as stability is guaranteed whenever a positive definite solution for the associated Riccati equation exists. Searching for a minimum γ 2 is advocated because the use of an excessively large value can lead to a nonlinearity design that is conservative and exhibits poor mean-square-error performance. If a design is still too conservative then an empirical value, namely, γ 2 = 1 2 2 y z  , may need to be considered instead.

Conclusion
In this chapter it is assumed that nonlinear systems are of the form x k+1 = a k (x k ) + b k (w k ), y k = c k (x k ), where a k (.), b k (.) and c k (.) are continuous differentiable functions. The EKF arises by linearising the model about conditional mean estimates and applying the standard filter recursions. The first, second and third-order EKFs simplified for the case of x k   are summarised in Table 1.
The EKF attempts to produce locally optimal estimates. However, it is not necessarily stable because the solutions of the underlying Riccati equations are not guaranteed to be positive definite. The faux algebraic Riccati technique trades off approximate optimality for stability. The familiar structure of the EKF is retained but stability is achieved by selecting a positive definite solution to a faux Riccati equation for the gain design.
H ∞ techniques can be used to recast nonlinear filtering applications into a model uncertainty problem. It is demonstrated with the aid of an example that a robust EKF can reduce the mean square error when the problem is sufficiently nonlinear.
Linearised models may be applied within the previously-described smoothers in the pursuit of performance improvement. Nonlinear versions of the fixed-lag, Fraser-Potter and Rauch-Tung-Striebel smoothers are easier to implement as they are less complex. However, the application of the minimum-variance smoother can yield approximately optimal estimates when the problem becomes linear, provided that the underlying assumptions are correct. A smoother that is robust to input uncertainty is obtained by replacing the approximate error covariance correction with an H ∞ version. The resulting robust nonlinear smoother can exhibit performance benefits when uncertainty is present.
In some applications, it may be possible to censor a system's inputs, states or outputs, rather than proceed with an EKF design. It has been shown that the use of a nonlinear censoring function to constrain input measurements leads to bounded filter and smoother estimation errors.
The capacity to blunder slightly is the real marvel of DNA. Without this special attribute, we would still be anaerobic bacteria and there would be no music." Lewis Thomas