Adaptive Filter as Efficient Tool for Data Assimilation under Uncertainties

In this contribution, the problem of data assimilation as state estimation for dynamical systems under uncertainties is addressed. This emphasize is put on high-dimensional systems context. Major difficulties in the design of data assimilation algorithms is a concern for computational resources (computational power and memory) and uncertainties (system parameters, statistics of model, and observa-tional errors). The idea of the adaptive filter will be given in detail to see how it is possible to overcome uncertainties as well as to explain the main principle and tools for implementation of the adaptive filter for complex dynamical systems. Simple numerical examples are given to illustrate the principal differences of the AF with the Kalman filter and other methods. The simulation results are presented to compare the performance of the adaptive filter with the Kalman filter.


Introduction
In this chapter, the adaptive filter (AF) is considered as a computational device that yields estimates of the system state by minimizing recursively (in time) the error between the predicted output of the device and its observed signal in real time. As the main objective of the AF is to produce estimates of the state in highdimensional systems (HdSs), we shall focus the attention on the mathematical form of the AF in a state-space form as that used in the Kalman filter (KF) [1]. In this chapter, the HdS is referred to as a system whose state dimension is of order O 10 7 À Á À O 10 8 À Á . The assimilation problem in this chapter is formulated as a standard filtering problem. For simplicity, let the dynamical system be described by the equation where x t ð Þ is the system state at the t time instant. At each time instant t, we are given the observation for the system output In (1) and (2), w t ð Þ is the model error (ME), v t ð Þ is the observation error (ObE), and Φ represents the system dynamics. In general, the system (1) and (2) may be nonlinear with Φx ¼ f x ð Þ, Hx ¼ h x ð Þ. The filtering problem for a partial observed dynamical system (1) and (2) is to obtain the best possible estimate for the state x t ð Þ at each instant t, given the set of observations Z 1 : t ð Þ ¼ z 1 ð Þ, … , z t ð Þ ½ . There exist different techniques to solve estimation problems. The simplest approach is related to linear estimator [2], since it requires only first two moments. Linear estimation is frequently used in practice when there is a limitation in computational complexity. Among others, the widely used methods are maximum likelihood, least squares, method of moments, the Bayesian estimation, minimum mean square error (MMSE), etc. For more details, see [3].
There are limitations of optimal filters. In practice, the difficulties are numerous: the statistics of signals which may not be available or cannot be accurately estimated; there may not be available time for statistical estimation (real-time); the signals and systems may be non-stationary; memory required and computational load may be prohibitive. All these difficulties become insurmountable, especially for HdSs.
In order to deal with real-time applications, the AFs appear to be a valuable tool in solving estimation problems when there is no time for statistical estimation and when we are dealing with non-stationary signals and/or systems environment. They can operate satisfactorily in unknown and possibly time-varying environments without user intervention. They improve their performance during operation by learning statistics from current signal observations. Finally, they can track variations in the signal operating environment [4].
It is well-known that the MMSE estimator in the class of Borel measurable (with respect to (wrt) Z 1 : t ð Þ) functions is given by the conditional mean Under standard conditions, related to the noise sequences w t ð Þ, v t ð Þ (Gaussian i.i.d.-identically independent (temporal) distributed), the estimate (3)x t ð Þ for x t ð Þ can be obtained from the KF in the recursive form In (4)- (8), Q, R are the covariance matrix for w t ð Þ and v t ð Þ, respectively. One sees that K ¼ K M ð Þ-the gain matrix, is a function of M ≔ M t þ 1 ð Þ-the error covariance matrix (ECM) for the state prediction error (PE) e t þ 1=t ð Þwhich is defined as e t þ 1=t ð Þ≔ x t þ 1 ð ÞÀx t þ 1 ð Þ, and ζ t þ 1 ð Þis known as innovation vector. Note from (7) and (8) that the ECM M t þ 1 ð Þcan be found by solving the matrix nonlinear Algebraic Riccati equation (ARE). Generally speaking, a solution of the ARE is not unique. Conditions must be introduced for ensuring an existence of a unique non-negative definite solution [5]. It is remarkable that the ECMs P, M in (7) and (8) do not depend on observations; therefore, they can be computed in advance, offline, given the system matrices and the noise covariances. The same remark is valid for the gain matrix K in (6). In contrast, the gain in the AF is observation-dependent [6] (see Section 2).
Under the most favorable conditions (perfect knowledge of all system parameters and noise statistics), for a dynamical system with dimension of order 10 7 -10 8 , it is impossible to solve the ARE (due to computational burden), not to say about storing M, P. To overcome these difficulties, the AF is proposed. Mention that the KF is also an MMSE filter in the complete Hilbert space of random variables, For the nonlinear models, there are KF variants, among those are the extended KF (EKF) [7], the unscented KF (UKF, [8]), and the Ensemble Kalman filter (EnKF, [9]). In the EnKF, the ECM is a sampled ECM whose samples are generated using samples of the state variable, and consequently the ECM in the KF becomes a sampled ECM. For an example of application of the EnKF for data assimilation in geophysical data assimilation with high dimensional model, see [10]. Another class of ensemble filtering technique is a class of particle filters (PF, [11]). The basic idea of the PF (also the EnKF) is to use a discrete set of weighted n particles to represent the distribution of x t ð Þ, where the distribution is updated at each time by changing the particle weights according to their likelihoods.
Despite a possible implementation of the KF variants, they might still be seriously biased because the accuracy of the KF update requires linearity of the observation function and Gaussianity of the distribution of system state x t ð Þ. In reality, the KF (4)-(8) may be biased and unstable, even divergent [12]. Today, the PF algorithms are ineffective for HdS data assimilation.
In this chapter, we shall show how the AF can be efficient in dealing with uncertainties existing in the filtering problem (1) and (2). In Section 2, a brief outline of the AF is given. The main features of the AF, which are different to those of the KF, are presented. This concerns the optimal criteria, stabilizing gain structure, optimization algorithms. Section 3 shows in detail how the AF is capable of dealing with uncertainties in the specification of ME covariance. The hypothesis on a subspace of ME is presented in Section 4 from which one sees how one can make order reduction for representing the bias and ME covariance. Simple numerical examples on one-and two-dimensional systems are given in Section 5 to illustrate in details the differences between the AF and the traditional KF. Numerical experiments on low and high dimensional systems are given in Section 6 to demonstrate how the AF algorithm works. The performance comparison between the AF and KF, for both situations of perfect knowledge of ME statistics and that with ME uncertainties, is also presented. Conclusions and perspectives of the AF are summarized in Section 7.

Adaptive filter
The AF is originated from [13]. It is constructed for estimating the state of a dynamical system based on partially observed quantities related in some way to the system state. As reported before, for linear systems contaminated by Gaussian noise, the MMSE estimate can be obtained by the KF. Since publication of [1] in 1960, an uncountable number of works are done for solving engineering problems by KF, in all engineering fields, as well as many modifications have been proposed. The reasons for the need in modification of the KF are numerous, but mostly related to nonlinear dynamics, parameter uncertainties in specification of system parameters, bias of ME, unknown statistics of ME, model reduction. With the rapid progress of computer technology (computer power, memory, … ), various simplified versions of KF are suggested for solving filtering (or data assimilation) for HdSs, in particular, in meteorology and oceanography.
Direct application of the KF to HdSs is impossible due to the limit in computer power, memory, and computational time. In particular, the KF requires to solve the matrix AREs (7) and (8) for computing ECMs M t ð Þ, P t ð Þ. Storing such matrices is impossible, not to say on computational time.
Different simplified approaches are proposed for overcoming difficulties in the application of the KF. The example of successful tool for solving data assimilation problems in HdSs is the EnKF [9]. In the EnKF, an ensemble of error samples, of small size, is generated on the basis of model states to approximate the ECMs. In practice of data assimilation for HdSs, it is possible to generate only ensembles of moderate sizes (of order O 100 ð Þ) by model integrations over the assimilation window (time interval between two successive arrivals of observations) since one such integration takes several hours! The other approach like PF is based on sampling from conditional distributions. Theoretically, this approach is more appropriate for nonlinear problems because no linearization is required as in the EKF (Extended KF based on linearization technique). However, even for filtering problems with state dimensions of order O 10 ð Þ, relatively large ensembles of size O 10000 ð Þwould be required in order to produce reasonably good performance. The AF in [13] is based on the different idea. Here, no linearization is required for nonlinear filtering problems. For the problem (1) and (2), the filter is of the form (4) and (5) but the gain K ¼ K θ ð Þ is assumed to be of a given stabilizing structure [6]. It means that K is parametrized by some vector of unknown parameters θ ∈ Θ so that the filter (4) and (5) with the gain K θ ð Þ, ∀θ ∈ Θ is stable. It is wellknown that under mild conditions, the solution of the ARE will tend (quickly) to stationary solution M ∞ and so the gain (6), to the stationary gain K ∞ . Moreover, the representing the error for the output predictionẑ t þ 1=t ð Þ≔ Hx t þ 1=t ð Þ¼HΦx t ð Þ, is unbiased and of minimum variance. This fact leads to the idea to seek the optimal vector θ by minimizing here E : ½ denotes the average in a probabilistic sense. For stationary systems (1) and (2), if we assume the validity of the ergodic hypothesis, the average value in a probabilistic sense, expressed in (7), is almost everywhere equivalent to the time average (for large time of running the dynamical system). The optimal θ * can be found by solving the equation A stochastic approximation (SA) algorithm for solving (10) can be written out Conditions related to the sequence of positive scalar γ t ð Þ for ensuring a convergence θ t ð Þ f gin the procedure (11) are One of the most advantages of the SA algorithm (11) is that, instead of computing the gradient of the cost function (9) (which requires knowledge of probability distribution), the algorithm (11) is based on the knowledge of only the gradient of sample cost function Ψ (wrt to θ) which can be easily evaluated numerically. Comment 2.1. Generally speaking, the convergence rate of the algorithm (11) and (12) is O 1=t ð Þ. It is possible to improve the convergence rate of the SA by averaging of the iterates, For more details, see [14]. Comment 2.2. For high HdSs, even with θ being of moderate dimension, instead of the algorithm (12) or (13), the SPSA (Simultaneous Perturbation Stochastic Perturbation) algorithms in [15,16] are of preference. That is due to the fact that integration of HdS over the assimilation window is very expensive. These algorithms generate stochastic perturbation δθ ¼ δθ 1 This allows to evaluate the gradient-like vector by only two or three time integration of the numerical model.
For details on the SPSA algorithm and its convergence rate, see [15,16].

Adjoint approach
As seen from (11) and (12), implementation of SA algorithms is much simpler for searching optimal gain parameters compared to the other optimization methods. The SA algorithms require only numerical computing derivatives wrt θ evaluated at θ t ð Þ and γ t ð Þ is a scalar which can be chosen a priori, for example, as γ t ð Þ ¼ 1 t . That is possible due to introducing the ergodic hypothesis on of the system (1) and (2) from which there exists an asymptotic optimal gain First, consider the situation when the vector of parameters consists of are all elements of K, θ ¼ K: Compute the innovation vector, Let us compute derivatives of the sample cost function Ψ wrt the elements K ij of the gain K. To do so, one needs to integrate the adjoint operator Φ T s.t. the forcing here ψ i is the i th component of ψ, ζ t, j ð Þ the j th component of ζ t ð Þ. The AF now takes the formx where Þis the gradient vector whose components are computed by (14). In the AF (15)- (17), no matrix ARE (see (7) and (8) in the KF) is involved. The AF (15)-(17) is quite realizable for HdSs, since at each assimilation instant we need to integrate only the direct model to produce the forecast (16) and (eventually) an adjoint model over the assimilation window for computing

Simultaneous perturbation stochastic approximation (SPSA) approach
Remark that in the form (14) the adjoint operator Φ T would be available to implement the AF. It is well-known that construction of numerical code for Φ T is a very difficult and heavy task, especially for meteorological and oceanic numerical models which are HdSs and nonlinear (linearization is required).
A comparison study of the AF with other assimilation methods is done in [17]. Compared to the AF, the widely used variational method (VM) minimizes the distance between the observations available (for example, the observations of the whole set Z 1 : T ½ ) and the outputs of the dynamical system. This optimization problem is carried out in the phase space, hence is very difficult and expensive. Theoretically, a simplification is possible subject to (s.t.) the condition of linearity of the dynamical system: in this case, one can reformulate the VM minimization problem as searching the best estimate for the initial system state x 0 ð Þ. For HdSs, to ensure a merely high quality estimate for x 0 ð Þ, it is necessary: (i) to take the observation window as large as possible; (ii) to parameterize the initial state by some parameters (using a slow manifold, for example). Iterative minimization procedures require usually O 10 ð Þ iterates involving integrating the direct and adjoint models over the window 1, T ½ . For an unstable dynamics, integration of direct and adjoint equations over a long period naturally amplify the initial errors during assimilation process. For a more detailed comparison between the AF and VM, see [17].
Thus, if the ergodic conditions hold, there exists an optimal stationary gain and the AF in limit will approach to the optimal one in the given class of stable filters. It is important to emphasize that up to this point, no covariance matrices Q, R are specified. It means that the AF in the form (12) is robust to uncertainties in the specification of the covariances of the ME and ObE.

Stability of the AF
One of the main features of the AF is related to its stability. For simplicity of presentation, in the previous section, the AF algorithm is written out under the assumption (13). In practice, application of the AF in the form (13) is not recommended since instability may occur. It is easy to see that the transition matrix of the filter is given by It is evident that if we do not take care on the structure of K, varying stochastically all elements of K can lead to instability of L and the filter will be exploded. Moreover, for HdSs, the number of elements of K is very large. It is therefore primordial to choose a parametrized stabilizing structure for K (depending on θ) to ensure a stability of L and reducing a number of tuning parameters. This question is addressed in [6]. One of possible structures for K is of the form  (19) where P r ∈ R nxr is a matrix with dimensions n Â r, r is the dimension of the reduced space (equal to the number of unstable eigenvectors (EiVecs)of Φ), the matrix M e is a strictly positive symmetric definitive playing the role of the ECM in the reduced space R e , Θ is a diagonal matrix with diagonal elements θ i whose values belong to ϵ, 2 À ϵ ð Þ , i.e. θ i ∈ ϵ, 2 À ϵ ð Þ , with ϵ ∈ 0, 1 ð Þ whose value depends on the modulus of the first stable eigenvalue (EiV) [6]. We will refer to the filter s.t. (19) with Θ ¼ Id (Id is the identity matrix of appropriate dimension) as a nonadaptive filter (NAF). In the AF, the parameters θ i are adjusted each time when a new observation arrives, to minimize the cost function (9). Thus θ i is a time-varying function. As to the matrix P r , its choice is important to ensure a filter stability. One simple and efficient procedure (called Prediction Error Sampling Procedure-PeSP) to generate P r is to use the power orthogonal iteration method [18] which allows to compute real leading Schur vectors (SchVecs) of Φ i . The advantage of using the SchVecs compared to the EiVecs, is that they are real and their computation is stable. It is seen that the optimal AF is found in a class of stable filters which is stable even for an unstable numerical model. As to the VM, the optimal trajectory is found on the basis of only the numerical model with the initial state as a control vector. It means that for unstable dynamics, the errors in the forcing or numerical errors arising during computations will be amplified and lead to large estimation error growth. More seriously, the VM requires a large set of observations and large number of iterations (i.e., many forward and backward integrations of the direct and adjoint models) which naturally leads to increase of estimation error too.

On improving the initial gain
Consider the gain structure (19). Suppose that M e has been chosen in agreement with the required stability conditions. Before tuning the parameters θ i to minimize the cost function (9), remark that stability of the filter is still ensured for the following gain: where Writing the equation for the filtered error (FE) e f t ð Þ ≔x t ð Þ À x t ð Þ one sees that the matrix L in (18) also represents the transition of the FE e f t ð Þ. It means that it is possible to choose a more optimal initial gain by solving, for example, the minimization problem The problem (21) is solved without using the observations, hence it is offline. Once the optimal Λ * has been found, the standard AF is implemented s.t. the filter gain It is seen that using the structure (22) this optimization procedure does not require the information on the ME statistics.

Joint estimation of state and model error in AF
The previous section shows how the AF is designed to deal with the difficulty in specification of covariances of the ME and ObE. This is done without exploiting a possibility to determine, more or less correctly, a subspace for the ME. If such a subspace can be determined without major difficulties, it would be beneficial for better estimating the AF gain and improving the filter performance. In [19], the hypothesis of the structure of the ME has been introduced and a number of experiments have been successfully conducted.
There is a long history of joint estimation of state and ME for filtering algorithms, in particular, with the bias and covariance estimation. One of the most original approaches, dealing with the treatment of bias in recursive filtering (known as bias-separated estimation-BSE), is carried out by Friedland in [20]. He has shown that the MMSE state estimator for a linear dynamical system augmented with bias states can be decomposed into three parts: (1) bias-free state estimator; (2) bias estimator; and (3) blender. This BSE approach has the advantage that it requires fewer numerical operations than the traditional augmented-state implementation and avoids numerical ill-conditioning compared to the case of bias-separated estimation by filtering technique.
It is common to treat the bias as part of the system state and then estimate the bias as well as the system state. There are two types of ME-deterministic (DME) and stochastic (SME). Generally speaking, a suitable equation can be introduced for the ME. In the presence of bias, under the assumption on constant b, instead of (1) one has To introduce a subspace for the variables w t ð Þ, b t ð Þ the SME and DME in (23), let Generally speaking, G w , G b are unknown, and finding reasonable hypothesizes for them is desirable but not self-evident. In [19], one hypothesis for G w , G b has been introduced (it will be referred to as Hypothesis on model error-HME).
The information on G w , G b , given in (25), allows to better estimate the DME b and SME w for improving the filter performance, especially for n b < n, n w < n in a HdS setting. The difficulty, encountered in practice of operational forecasting systems, is that (practically) nothing is given a priori on the space of the ME values. To overcome this difficulty, one simple hypothesis has been introduced in [19]. This hypothesis is postulated by taking into consideration the fact that for a large number of data assimilation problems in HdSs, the model time step δt (chosen for ensuring a stability of numerical scheme and for guaranteeing a high precision of the discrete solution) is much smaller than Δt-the assimilation window (time interval between two successive observation arrivals).
Suppose that Δt ¼ n a δt where n a is a positive integer number. Hypothesis (on the subspace of ME-HME) [19]. Under the condition that n a is relatively large, the ME belongs to the subspace spanned by all unstable and neutral EiVecs (or SchVecs) of the system dynamics Φ.

One-dimensional system
To see the difference between the AF and the KF in doing with ME uncertainties, introduce the one-dimensional system In (25), Φ is the unique eigenvalue (also the singular value) of the system dynamics.
i. For simplicity, let Φ ¼ 1, h ¼ 1. This corresponds to the situation when the system is neutrally stable. The filter fundamental matrix (18) For the KF gain (4)-(8), as is the solution of (7). That is true for any M kf t ð Þ ≥ 0, R > 0. It means then the KF is stable. Mention that if For the AF, we have in this case P r ¼ 1: where K e is the gain of the form K e ¼ M e M e þR , M e > 0, R > 0, M e is constant. We have then for the NAF (θ ¼ 1Þ 0 < K e < 1 and K naf ¼ K e . For the AF, the transition matrix (18) reads L af θ ð Þ ¼ 1 À θK e ð Þ . For θ ∈ 0, 2 ð Þ, |L kf θ ð Þ| ∈ 0, 1 ð Þ, K af θ ð Þ ∈ 0, 2 ð Þ and the AF is stable. It is evident that there is a larger margin for varying the gain in the AF than that in the KF since K kf t ð Þ ∈ 0, 1 ð Þ. One sees that the stationary KF is a member of the class of stable AFs (19). The performance of AF is optimized by solving the problem (9) using the procedure (11) and (12) or SPSA algorithms (Comment 2.2).
ii. Let Φ < 1, i.e., the system (1) is stable. The results in (i) are valid for the AF structure. In this situation, the filter is stable even for K af ¼ 0.

ΦÀ1
K e Φ ! 1 K e (left-hand limit), Φþ1 K e Φ ! 1 K e (right-hand limit) and there remains no margin for varying θ (or Q ≫ R) and K af ! 1. It is important to emphasize that as K e is chosen by designer, we can define the interval for varying θ if the amplitude of Φ is more or less known. In practice, one can vary θ ∈ ϵ, 2 þ ϵ ½ with small ϵ > 0 for Φ close to 1, and with ϵ close to 1 for large Φ.
It is seen from (27) that when Φ ! À1, approximately θ ∈ 0, 2 K e . As for the It is important to stress that the KF gain is computed on the basis of Q and R (under the condition that the statistics of the initial state will be forgotten as t becomes large); whereas, the gain of the AF is updated on the basis of samples of the innovation vector. It means that the KF is optimal in the MMSE sense (under the condition of exact knowledge of the required statistics) whereas the AF is optimized during the assimilation process using PE realizations of the system output (innovation vector). The KF gain can be computed in an offline fashion, whereas the AF gain is a function of observation and computed in online.

Stable filter
To see the role of the correction subspace R P r ½ in ensuring a stability of the AF, let us consider the system (1) and (2) s.t.
The filter transition matrix (30) is obtained on the basis of L naf ¼ I À K naf H À Á Φ and the assumption (29). It is easily to see that L naf has two EiVs,

Stability of the filter depends on the condition |l
. These conditions should be taken into account when the EiVs of Φ are large.
For the AF gain (19) (P r ¼ I), From (31) conditions for |l ii | < 1 can be obtained as done in Section 5.1 with the one-dimensional system since l ii , i ¼ 1, 2 are independent one from another. The length of the interval I i for varying θ i depends on the value of Φ ii (see (26)).
This example shows that for P r ¼ Id, it is always possible to construct a stable AF whatever are the EiVs of Φ (stable or unstable). There are some constraints for M ii (they are positive) and for R i (small positive). Optimality of the AF is obtained by searching recursively (in time) the optimal θ i during assimilation process. Thus, in the AF, a correct specification of ME and ObE statistics (second order) is not important as happens in the KF.

Unstable filter
Consider the situation when P r is constructed from only one vector. Let P r ¼ 1, 0 ð Þ T -the EiVec associated with Φ 11 (the results remain the same if we choose We show now that the filter with the gain (19) is unstable. We have (for Θ ¼ Id), As α m e þα can be made as small as desired by choosing small α > 0, the first EiV l 11 ¼ αΦ 11 m e þα can be made stable. However, the second EiV in (34) l 22 ¼ Φ 22 > 1 is unstable. It implies that the filter with the gain (19) s.t. P r ¼ 1, 0 ð Þ T is unstable. This happens even for Θ 6 ¼ Id. It means that when the projection subspace R P r ½ does not contain all unstable and neutral EiVecs of the system dynamics, it is impossible to guarantee a stability of the filter.

Two-dimensional system: estimation of ME
Consider the filtering problem (1) and (2), the dynamical system (1) describes a sequence of system states at time instants t ¼ 0, 1, … when the observations are available. It means that Φ represents the transition of system state over the (observation) time window Δt separating arrivals of two successive observations. In practice, the interval Δt is much larger than the model time step δt which is the step size in approximating the temporal derivative. The choice of δt is important for guaranteeing a stability of discretized scheme and having high is important for guaranteeing a stability of discretized scheme and having high precision of the discretized solution (wrt the continuous solution). We have then ΔT ¼ n a δt, where n a is a relatively large positive integer. For example, in the HYCOM model at SHOM (French marine) for the Bay of Biscay configuration, the interval Δt between two observation arrivals is 7 days which is equivalent to integrating 1200 model time steps δt. It means n a ¼ 1200. Symbolically we have then the equations for model time step integration In (35), Φ 0 represents the integration of numerical model over one model time step δt. Hence The contribution of ψ 0 τ ð Þ, over the assimilation window t À 1, t ½ (for simplicity and without loss of generality, one supposes t À 1 ≔ 0, t ≔ n_a) is The HME in Section 4 says that the SME w t ð Þ and DME b t ð Þ, as functions of n a , belong to the subspace spanned by leading EiVs (or SchVecs) of Φ for a relatively large n a . The initial filtering problem now has the form (1) and (2)  To illustrate this HME, continue the two-dimensional system in Section 5.2.2 and suppose that |Φ 0 11 > 1, jΦ 0 22 j j< 1. Applying HME in this case is equivalent to saying that the values of MEs b t ð Þ, w t ð Þ, approximately, belong to the subspace R u 1 ½ spanned by the first EiVec u 1 ¼ col 1, 0 ð Þ, associated with the EiV Φ 0 11 . Here y ¼ col y 1 , … , y n À Á denotes the vector-column with components y 1 , … , y n . It follows that the covariance matrix of w t ð Þ is assumed to be of the form Q ¼ σ 2 w u 1 u T 1 and b t ð Þ-of the structure b t ð Þ ¼ cu 1 , c is a scalar to be estimated. For the algorithm of joint estimation of state and bias (in term of c), see [19].

One-dimensional system
In this section, the filtering problem (25) in Section 5.1 is considered s.t.
The true system states and observations are simulated using the initial state x 0 ð Þ ¼ 1 and w t ð Þ, v t ð Þ are zero mean Gaussian mutually uncorrelated and temporal uncorrelated sequences.
To see the performance of the AF, unknown system states are estimated on the basis of the AF algorithm. To obtain a reference, the standard KF is also implemented for solving this filtering problem. In the filtering algorithms, the estimate of the initial state isx 0 ð Þ ¼ 2: The gain K naf in the NAF is taken as that of the KF at t ¼ 0, i.e., K naf ¼ K kf 0 ð Þ. Figure 1 shows the temporal evolution of the parameters θ m t ð Þ during assimilation process.
The gains in the KF and AF during the assimilation process are displayed in Figure 2. Mention that the KF gain is computed s.t. true statistics Q, R. In the AF, θ m t ð Þ has been used for computation of the AF gain, i.e., K af ¼ θ m t ð ÞK. From Figure 2, one sees that initialized by the same value, the two gains become different during assimilation process. The KF gain has reached a stationary regime very quickly. The mean temporal RMS (root mean square) of the innovation is shown in Figure 3. It is interesting to remark that no significant difference is observed between two curves and a slightly better performance is produced by the KF.
In Figure 4, we show RMS of the state FE produced by the KF and AF under the condition that the variance Q is known exactly. One sees that the KF, as expected, produces the best results.   Figure 5 shows the RMS of FE as a function of the variance Q. Here, the value of Q varies from 0.1 to 1.9. Note that the true value of Q is 0.1. The red curve represents the RMS of FE produced by the KF at the end of the assimilation period (as a function of Q). The green curve has the same meaning, but for the FE

RMS of the state FE produced by the KF and AF under the condition that the variance of ME is known exactly.
It is seen that when the ME is correctly specified, the KF behaves better than the AF. produced by the AF. It is interesting to note that when Q is correctly specified, the KF behaves better than the AF, but misspecification of Q leads to growing of the error in the KF. The AF is robust wrt the error in the specification of Q. This fact says in favor of the AF as an efficient tool for overcoming uncertainties in the ME.

Illustration of hypothesis HME
According to the notations in Section 5.3, consider the two-dimensional system (1) ð Þwith the true DME b 0 τ ð Þ ¼ col 0:1, 0:1 ð Þ 0 : Thus the first EiV is unstable, the second-stable [19]. Numerically one finds that the first SchVec is equal to u 1 ¼ À1, À7:0 E À 7 ð Þ T . Figure 6 [19] shows the simulation results obtained on the basis of (37). One sees that, for n a > 10, the second component of w t ð Þ is close to 0 whereas the first component becomes bigger and bigger (in absolute value) as n a increases. Here, w 0 τ ð Þ is a sequence of independent two-dimensional Gaussian random vectors of zero mean and variance 1. This means that the values of w t ð Þ become more and more close to the subspace R u 1 ½ spanned by u 1 , hence the HME is practically valid for n a > 10 in this example. Mention that, as a rule, in ocean numerical models, n a is of order o(100) (n a ¼ 800 or the MICOM model in the experiment in Section 6.3). See also [22].

RMS of FE as a function of Q. The true value of Q is equal to 0. It is noted that the KF behaves better than the AF s.t. true Q but is more and more degraded as the ME becomes greater and greater. At the same time, the FE of the AF remains very robust.
In terms of x t ð Þ, the filtering problem then is of the form (1) and (2) Figure 7 depicts the time evolution of the KF and AF gains. One sees here as in the experiment with 1D system (Figure 2) that the KF gain is stabilized very quickly compared to that of the AF gain. Figure 8 (from [19]) shows the sample time average RMS of the state FE produced by the three filters NAF, KF, and AF. One sees that the AF outperforms the NAF and KF.

Data assimilation in the high-dimensional ocean model
To illustrate the effectiveness of the AF in dealing with uncertainties in HdSs, this section presents the results on data assimilation in the oceanic numerical model MICOM (Miami Isopycnic Coordinate Ocean Model) [19]. This MICOM describes the oceanic circulation in the North Atlantics. The model has four vertical layers with the state consisting of three variables  [22]. The experiment is carried out on estimating the oceanic circulation using sea surface height (SSH) measurements. The SSH observation is available each 7 days (ds) (hence the observation window ΔT ¼ 7ds). Mention that simulating the circulation over 7 ds requires 800 model time steps δt ð Þ integration.

AF with optimal initial gain
First, in order to examine whether the method of optimal gain initialization, described in Section 3.2, is really useful for improving the filter performance, the optimization problem (21) has been solved. Symbolically, in the gain (20), P r ¼ The optimal parameters λ i , i ¼ 1, … , 4 are found by solving the minimization problem (21) using SPSA algorithm. Figure 9 shows the averaged values (see  optimization process in Figure 9). The performances of these two NAFs are shown in Figure 10. One sees here that the NAFOI has improved considerably the quality of estimates of the velocity u-component compared with the NAFI. This result justifies that offline optimization (21) is an interesting strategy for finding the optimal initial gain in the NAF.

Estimating the ECM of ME
In practice, for real operational systems, information on the space of ME is not available or very poorly known. Usually, there is a big difference between the model and the real physical process and if the ME statistics are taken more or less properly, in some way, in the filtering algorithm, one can improve the filter performance and reduce the estimation error.
This idea is tested here by applying the HME in Section 4. We carry out the procedure for estimating the ECM of the ME by first constructing the subspace for the ME. For more details on the structure of the ECM M in the AF, see [23]. According to [23], the ECM M is assumed to be of the structure M ¼ M v ⊗ M h -the Kronecker product of M h with M v where M h is the ECM of the horizontal variable, M v -ECM with vertical variable. Figure 11 displays RMS of FE for the u velocity  component at the surface resulting from two AFs. The curve AF0U corresponds to the AF whose nonadaptive version has the gain computed on the basis of the ECM M using an ensemble of PE samples (generated by the PeSP in [18]). The curve AF3U shows the performance of the AF with the modified ECM (by adding the vertical ME covariance Q v to the vertical ECM M v ). More precisely, Q v is assumed to belong to the subspace spanned by three leading EiVs of M v . This choice is justified by the fact: the eigenvalue decomposition of M v has the first three EiVs with the explained variances 67, 17, 15%, respectively. As the fourth EiVec has only the explained variance 0.7E-07%, it is dropped from the subspace constructed for the vertical ME. The better performance of the AF3U, in comparison with that of the AF0U, is apparently seen in Figure 11.
The above experiment shows in details how, on the basis of HME, the subspace for the ME can be constructed, and how one estimates the ECM for the model error. The superior performance of the AF3U over that of the AF0U validates the usefulness of the HME which can serve as an important tool for estimating the ME and improving the performance of the AF for solving the data assimilation problems with HdSs.

Conclusions
One of the key assumptions to ensure the optimal performance of the KF is that a priori knowledge of the system model is given without any uncertainty. This assumption, however, is never valid in practice for dynamical systems under consideration. The uncertainties exist everywhere in modeling a real process like structural uncertainty, model parameterization, model resolution, model bias or ME statistics. For HdSs, order reduction introduced either in the original numerical model or in the filtering algorithms, inevitably leads to uncertainty in the ME, especially in geophysical numerical models.
Our focus in this chapter is to show how the AF solves efficiently filtering problems for systems operating in an uncertain environment.
As seen from this chapter, the AF has proven to be efficient to deal with uncertainties in the specification of the ME statistics, system bias or model reduction. The reasons of the success of the AF are that (i) it belongs to the class of Figure 11. Performance of the AF: (i) AF0U-no ME ECM has been taken into account; (ii) AF3U-with ME ECM computed in accordance with the HME. parametrized stable filters; (ii) it is defined as the best member minimizing mean PE for the system outputs; (iii) The tuning parameters are chosen as elements of stabilizing gain and they are of no physical sense.
It is obvious from this chapter that the performance of the AF is comparable with that of the KF when perfect knowledge of all ME statistics is given, and it outperforms the KF in presence of uncertainties. This happens since the AF acquires knowledge during assimilation process, regardless of uncertainties existing in the filtering problems. From the computational point of view, implementation of the AF consumes much less memory and computational time than the KF or other assimilation methods.
Simple numerical examples and simulation results, presented in Sections 5 and 6, clearly demonstrate the advantages gained through application of the AF in dealing with uncertainties. These positive results encourage a wide application of the AF in different fields of technology and applied sciences like automatic control, finance, aerospace, space exploration, meteorology, and oceanography. A more in-depth and significant research on the capacity of the AF to deal with uncertainties is surely a challenge for the near future.