## 1. Introduction

Consider the following data assimilation problem: Given the dynamical system

and the observations

Here,
*k* instant,

(1.3) |

The problem we consider here is to estimate the system state *x*
_{
k
} under the conditions that the dimension *n* of *x*
_{
k
} is of order 10^{6} – 10^{8}, and there are uncertainties in statistics of the model and observational noises. Due to very large *n*, it is impossible to apply traditional estimation algorithms for producing the estimate
*N* = *k* and smoothed estimate for *N* > *k* [1, 2]. For the linear dynamical system Eqs. (1.1) and (1.2), the computation of
*x*
_{
k
} based on *Z*[1, *N*], the Kalman smoother can serve as an efficient algorithm for its computation. The KF approach, however, is inappropriate for solving estimation problems in high-dimensional systems. In this chapter, the high-dimensional system means the system whose state dimension is of order 10^{6} – 10^{8}. At the present and in the near future, the computer capacity, in both computational power and memory, is still very far to be sufficient to implement the KF in real time to produce the filtered estimate and to make corresponding forecast. For suboptimal schemes for atmospheric data assimilation based on the KF, see Ref. [4].

In this chapter, the emphasis is put principally on comparison of the AF with VM. For the review on the data assimilation methods in meteorology and oceanography, see Ref. [5]. To see more the advantages of the AF, we implement the extended KF (EKF) [6] in Section 6 and will compare its performance with that of the AF (the experiment with Lorenz system). The Cooper-Haines filter (CHF) [7], widely used in data assimilation in oceanography, is also applied in Section 7 to produce the estimate for the ocean state. It serves as a reference to be compared with that produced by the AF in high-dimensional setting.

In the next section, the variational method (VM), which is widely used in data assimilation for high-dimensional systems in meteorology and oceanography, is outlined. Section 3 provides the recently developed AF approach to data assimilation. The main idea of the AF is to take the innovation representation for the input-output system as a departure point to formulate the optimization problem, with the parameters of the filter gain as control variables. Section 4 presents the tools to implement the AF in a simple and efficient way which is adapted for high-dimensional setting. This includes the objective function, filter stability, structure of the error covariance matrix (ECM), gain parameterization, algorithm for optimization known as simultaneous perturbation stochastic approximation (SPSA). It is shown how the ECM is estimated using an ensemble of samples of prediction error (PE) and the hypothesis on separation of vertical and horizontal variables structure (SeVHS) in the ECM. Computational comparison between the VM and AF is also given here. Section 5 includes a simple numerical experiment showing in detail how work the VM and AF, the difficulties of the VM in searching optimal solution and why no similar difficulties are encountered in the AF. The more complicated experiment with chaotic system known as Lorenz system is done in Section 6. The difficulties encountered here concern extreme sensitivity of its solution to small errors in the initial condition. Section 7 presents the performance of different filters in the data assimilation experiment with the high-dimensional ocean model MICOM with the North Atlantic configuration. The conclusions are given in Section 8.

*Notation*: In the chapter, *A*
^{
T
} denotes the transpose of the matrix *A; E*[.], *E*[.|.] denote the expectation and conditional expectation, respectively;
*A*.

## 2. Variational method (VM)

Consider the problem of estimating {*x*
_{
k
}} in Eqs. (1.1)–(1.3). The VM consists of minimizing the following objective function

Thus, in the VM, we seek optimal solutions in the functional space (space of functions {*x*
_{
k
}}). For systems of high dimension, this task is impossible to perform. The simplification is required. Suppose the system Eq. (1.1) is linear and perfect, that is,

Expressing all *x*
_{
k
} as functions of the initial state *x*
_{0},

and substituting
*k*
^{th} observation instant, the following set of observations is available for *x*
_{0},

Under the assumption on perfect model, the optimization problem Eqs. (2.1)–(2.3) is simplified as

We have now the unconstrained optimization problem Eqs. (2.7) and (2.8) with the vector of unknown parameters

It is not hard to write out a solution to the problem Eqs. (2.7) and (2.8). For high-dimensional systems, there is no computational and memory resources to handle such implementation. In practice, the solution to the problem Eqs. (2.7) and (2.8) is found by solving iteratively the equation

*Comment 2.1*. Usually, finding a solution to Eqs. (2.7) and (2.8) is a heavy task: in addition to storing the model solution produced by the direct model, minimization requires 20–30 iterations to reach a relatively good approximate solution.

*Comment 2.2*. Writing out

for some *y*. As

*Comment 2.3* As *θ*—initial state—has a physical meaning, it is important to introduce constraints on the appropriate physically realistic structure of the correction for *θ* during the estimation process. A poor (in the physical sense) structure of the guess for the initial state can lead to large estimation errors.

## 3. Adaptive filtering (AF)

To overcome the difficulties listed in Comments 2.1–2.3, an adaptive filtering (AF) has been proposed in [10]. The main difference of the AF with the VM is lying in the choice of innovation representation for the original input-output system Eqs. (1.1) and (1.2) as a departure point to formulate the optimization problems. It is well known that under standard conditions, the optimal in MSE estimate

Under standard conditions (Gaussianness, uncorrelated noise sequences …),

where

where

Due to the very expensive computational burden in time stepping the ECM

the optimal AF can be considered as that in some class of parameterized filters of a given structure. The following objective function is introduced

In Ref. [12], the different classes of parameterized filters are found which belong to the class of stable reduced-order filters (ROF) [10, 13].

As an example for one class of ROFs, consider

where

One class of parameterized filters is (Section 5.2.2 in Ref. [12])

if

## 4. Differences between VM and AF

### 4.1. Batch data formulation

We list now the main differences between two approaches VM and AF from which it becomes clear what are the advantages of the AF over the VM.

To make easier comparison between two approaches, let us write out the objective function Eq. (3.6) using a representation in a sample space

Mention that in practical implementation of the AF, the optimization algorithm is not constructed on the basis of Eq. (4.1), but on Eq. (3.6). That is, due to the fact that Eq. (4.1) is written in a batch form which requires to make optimization over the time interval

(D1) Dynamical system (DS): if in Eq. (2.1), the DS is the initial system Eq. (1.1); in Eq. (4.1), the DS is the filtering Eq. (3.3). This difference has an interesting consequence: if in practice, there is very little known about statistics of

(D2) The system noise

(D3) Control variable

This difference has an important consequence: as

(D4) Suppose the DS Eq. (1.1) is unstable. It implies that the error in estimating

As shown in Ref. [12], the filter Eq. (4.2) is stable under the conditions Eqs. (3.9) and (3.10). It means that the filtering error is bounded during model integration since the parameters

(D5) Return to the objective function Eqs. (2.1)–(2.3). First taking the derivative of the objective function Eqs. (2.1)–(2.3) wrt (with respective to)

(4.3) |

One sees that for a batch of

### 4.2. Implementation of AF

#### 4.2.1. The choice of criteria Eq. (3.6)

The choice of Eq. (3.6) is important in many aspects in order to obtain a simple and efficient data assimilation algorithm. The idea lying in the criteria Eq. (3.6) is to select some pertinent parameters as a control variables for minimizing the mean of the cost function

The solution to the problem Eq. (3.6) can be found iteratively using a stochastic optimization (SA) algorithm

where

The algorithm Eq. (4.4) is much more simple [compared to the computation of Eq. (4.3)] since it requires, at the

Thus, minimization of Eq. (3.6) by gradient-based SA algorithm requires only one integration of the direct model and one backward integration of the AE code: direct integration of

A less computational burden can be achieved by measuring the sample objective function (but not based on a gradient formula): instead of computing the gradient by Eq. (4.6) based on AE, one can approximate the gradient using the values of the cost function [on the basis of finite difference scheme (FDSA)]. Traditionally, the

where

It is seen that FDSA algorithms do not require the formula for the gradient. However, for the high-dimensional systems (

In order to overcome the difficulties with very high dimension of

(4.8) |

It is seen that in the SPSA, all the directions are perturbed at the same time (the numerator is identical in all

For the SPSA algorithm, the conditions for

#### 4.2.2. On the operator
P r

As shown in Ref. [12],
*S-PE samples*)

which are results of integration of leading ScVs (columns of

#### 4.2.3. On separation of vertical and horizontal variables structure in ECM [18]

Let us consider the situation when the DS is described by PDEs. The state vector at the time instant

where

As the number of vertical layers in the today’s numerical models is of order

Using dominant real Schur vectors has advantages that they are real and their computation is stable [12] while the computation of eigenvectors is unstable, and they may be complex.

### 4.3. Computational comparison between VM and AF

We give a brief comparison (computational burden) between the VM and AF algorithms.
**Table 1**
shows the number of elementary arithmetic operations required for implementation of VM and AF filtering algorithms based on AE tool. A smaller number of operations are required for the AF if the SPSA method is used (no need of

In the VM algorithm, the computation of
**Table 1**
, there is also the number of operations required for the AROF, when the ECM

To have the idea on how work in practice the VM and AF, here the examples of experiments with two numerical models MICOM (see Section 7) and HYCOM (Hybrid Coordinate Ocean Model) [20] developed at SHOM, Toulouse, France. The first experiment is performed with the MICOM model. The observations are available each 10 days (

*Comment 4.1*. Looking at
**Table 1**
, one sees that the dominant numbers

## 5. Simple numerical examples: scalar case

### 5.1. Estimation problem

Consider the simple scalar dynamical process

In Eq. (5.1),

where

#### 5.1.1. Experiment: cost functions

In the experiment,

Let
**Figures 1**
and
**2**
display the curves (time averaged variance of distances between the true trajectory and those resulting from varying the control variables) as functions of tuning parameters in these two methods: the initial state

From
**Figure 1**
, it is seen that the curve “noise-free” is equal to 0 when
**Figure 1**
are quadratic, it is impossible to find the true initial state in the noisy situation since almost the same curves are obtained for the cost function Eq. (2.1).

**Figure 2**
presents the same curves as those in
**Figure 1**
resulting from application of the filter by letting the parameter
**Figure 2**
shows that for the both curves “noise-free” and “noisy”, the minimal values are attained at

Two curves “noisy” in
**Figures 1**
and
**2**
show that when the model is noisy, the minimal value of the curve “noisy” (VM) in
**Figure 1**
is much higher (it is equal to 0.121) than that in
**Figure 2**
(0.009, for filtering). This fact is in favour of the choice of a short window for assimilating observations by the VM.

## 6. Numerical experiment: Lorenz system

### 6.1. Lorenz equations

The Lorenz attractor is a chaotic map, noted for its butterfly shape. The map shows how the state of a dynamical system evolves over time in a complex, non-repeating pattern [21].

The attractor itself and the equations from which it is derived were introduced by Edward Lorenz [21], who derived it from the simplified equations of convection rolls arising in the equations of the atmosphere.

The equations that govern the Lorenz attractor are:

where

### 6.2. Numerical model

In the experiments, the parameters

The numerical model is obtained by applying the Euler method (first-order accurate method) to approximate Eq. (6.1). Symbolically, we have

where

### 6.3. Observations: assimilation

The corresponding

In Eq. (6.3),

where the operator

The true system state

The problem considered in this experiment is to apply the extended KF (EKF), nonadaptive filter (NAF), and adaptive filter (AF) to estimate the true system state using the observations

Here, the NAF is in fact the prediction error filter (PEF). Mention that the PEF is developed in Ref. [23] in which the prediction error ECM is estimated on the basis of an ensemble of PE samples, that is,

where

The filter gain is taken in the following form

which is time invariant. At the same time, for the comparison purpose, the EKF is also used for assimilating the observations.

**Figure 3**
shows the evolution of the prediction errors resulting from three filters: NAF, AF, and EKF. It is of no surprise that the NAF has produced the estimates with larger estimation error. By adaptation, however, it is possible to obtain the AF, which improves significantly the PEF and even behaves better than the EKF.

Mention that the VM is much less appropriate for assimilating the observations in the Lorenz model due to the choice of the initial state as control vector. For simplicity, we simulate the situation when all three components of the system state are observed in additive noise, i.e. with
**Figure 4**
displays time averaged variances of the difference between the true trajectory and model trajectory (denoted as
**Figure 5**
: here, the sample cost function (4.1) is computed over all assimilation period, by varying the third parameter

## 7. Assimilation in high-dimensional model

### 7.1. MICOM model and assimilation problem

In this section, we show how the AF can be designed in a simple way to produce the high performance estimates for the ocean state in the high-dimensional ocean model MICOM. For details on the Miami Isopycnal Coordinate Ocean Model (MICOM) used here, see Ref. [24]. The model configuration is a domain situated in the North Atlantic from 30°N to 60°N and 80°W to 44°W; for the exact model domain and some main features of the ocean current (mean, variability of the SSH, velocity) produced by the model, see Ref. [24]. The grid spacing is about 0.2° in longitude and in latitude, requiring
*climatology*. During the period of 2 years 19 and 20, every 10 days (10 ds), we calculate the sea surface height (SSH) from the layer thickness

### 7.2. Different filters

The filter used for assimilating SSH observations is of the form

where

#### 7.2.1. PEF: computation of ECM

In the experiment, two assimilation methods will be implemented. First the PEF is designed. To do that, the data ECM

Let

where

(7.3) |

The main advantage of the separability hypothesis is that the number of parameters to be estimated in the covariance matrix is reduced drastically. As a consequence, even an ensemble of PE samples with small size can serve as a large data set for estimating the unknown parameters. This results in a fast convergence of the estimation procedure. In addition, introducing the SeVHS hypothesis allows to avoid the rank deficiency problem in the estimation of the ECM. In fact, as only a few numbers of ScVs can be computed in very high-dimensional systems, approximation of the ECM

Suppose we are given the ensemble of S-PE samples

For the problem Eqs. (4.3)–(4.4) let us define the vector of unknown parameters in the ECM

where

Considering

Mention that the problem Eq. (7.4) is somewhat closely related to the Nearest Kronecker Product (NKP) problem [25]. In the experiment, the correlation length

#### 7.2.2. PEF: computation of gain

As to the computation of the gain, introduce the notations: at the instant *k*, let
*x* where
*x* at all the horizontal grid points (ordered in some way) at the

Consider the ECM Eq. (3.10) and the observation equation (1.2). Represent the observation matrix *H* in a block-matrix form

which corresponds to the vector representation

Compute the gain according to Eq. (2.4). We have

(7.10) |

As proved in Ref. [18], in this case, the gain has the following form

where

#### 7.2.3. Cooper-Haines filter: CHF

To see the performance of the AF, we implement also a so-called CHF. The CHF [7] is obtained from (7.9), (7.10) under three hypotheses [24]: (**H1**) the analysis error of the system output is canceled in the case of noise-free observations; (**H2**) conservation of the linear potential vorticity (PV); (**H3**) there is no correction for the velocity at the bottom layer. The AF in Ref. [24] is obtained by relaxing one or several hypotheses (**H1**)–(**H3**). From the filtering theory, the difference between the PEF and CHF is lying in the way we estimate the elements of the ECM.

For the choice of the tuning parameters in the PEF, see Ref. [24].

### 7.3. Numerical results

First, we run the model initialized by the climatology. This run is different from that used for modeling the sequence of true ocean states only by changing the initial state by climatology. This run is denoted as *model*.
**Figure 6**
shows the (spatial) averaged variance between SSH observations and that produced by the *model*. We see the error grows as time progresses, meaning instability of the numerical model wrt the perturbed initial condition. That fact signifies that the VM will have difficulties in producing high performance estimates if the assimilation window is long.

Next the two filters, CHF and PEF, are implemented under the same initial condition as those carried out with the experiment *model*. It is seen from
**Figure 6**
that initialized by the same initial condition, and the CHF is much more efficient than the *model* in reducing the estimation error. The performance comparison between the CHF and PEF is presented in
**Figure 7**
. Here, the (spatially averaged variance) SSH prediction errors, resulting from two filters CHF and PEF, are displayed. The superiority of the PEF over the CHF is undoubted. It is clear from
**Figure 7**
that the PEF is capable of producing the better estimates, with lower error level, along all assimilation period. On the other hand, if the estimation error in CHF decreases continuously at the beginning of the assimilation and is stabilized more or less after (during the interval

To see the effect of adaptation,
**Figure 8**
displays the filtered errors for the *u*-velocity component estimates at the surface, produced by the PEF and APEF, respectively. The APEF is an AF, which is an adaptive version of the PEF. Here, the tuning parameters are optimized by the SPSA method. From the computational point of view, the SPSA requires much less time integration and memory storage compared with the traditional AE method. At each assimilation instant, we have to make only two integrations of the MICOM for approximating the gradient vector. From
**Figure 8**
, one sees that the adaptation allows to reduce significantly the estimation errors produced by the PEF.

## 8. Conclusions

In this chapter, a comparison study on the performance of the AF with other existing methods is presented in the context of its application to data assimilation problems in high-dimensional numerical models. As it is seen, in comparison with the standard VM, the AF is much simpler to implement and produces better estimates. The advantages of the AF over other methods such as EKF, CHF are also demonstrated. The principal reason for high performance of the AF is lying in the choice of innovation representation for the initial input-output system and selection of pertinent gain parameters as control variables to minimize the MSE of the innovation process. If in the VM, the choice of the structure for the initial state is the most important thing to do (but that is insufficient for guaranteeing its high performance), in the AF, however, the initial state has a little impact on the performance of the AF. This happens because the AF is selected as optimal in the class of stable filters, and as a consequence, the error in the initial estimate is attenuated during assimilation process. In contrary, in the VM, the error in the specification of the initial guess is amplified during assimilation if the numerical model is unstable.

In conclusion, it is important to emphasize that the AF approach, presented in this chapter, is consolidated by exploiting the following road map: (i) generate data ECM from S-PE samples which grow in the directions of dominant Schur vectors; (ii) select a parametrized structure for the ECM under the hypothesis SeVHS; (iii) make the choice of tuning parameters in the gain by minimizing the distance between the data ECM and that having the SeHVS structure; (iv) adjust the unknown parameters in the gain in order to minimize the PE error of the system output by applying the SPSA algorithm.

There are a wide variety of engineering problems to which the AF is applicable and that could be worthy of further study. Depending on particular problems, undoubtedly, the other modifications would be helpful to improve the filter performance, to simplify its implementation. But the main features of the AF, presented in this chapter, remain as the key points to follow in order to preserve a high performance of the AF.