Open access peer-reviewed chapter

# On Optimal and Simultaneous Stochastic Perturbations with Application to Estimation of High-Dimensional Matrix and Data Assimilation in High-Dimensional Systems

Written By

Hong Son Hoang and Remy Baraille

Submitted: November 21st, 2017 Reviewed: April 16th, 2018 Published: November 5th, 2018

DOI: 10.5772/intechopen.77273

From the Edited Volume

## Perturbation Methods with Applications in Science and Engineering

Edited by İlkay Bakırtaş

Chapter metrics overview

View Full Metrics

## Abstract

This chapter is devoted to different types of optimal perturbations (OP), deterministic, stochastic, OP in an invariant subspace, and simultaneous stochastic perturbations (SSP). The definitions of OPs are given. It will be shown how the OPs are important for the study on the predictability of behavior of system dynamics, generating ensemble forecasts as well as in the design of a stable filter. A variety of algorithm-based SSP methodology for estimation and decomposition of very high-dimensional (Hd) matrices are presented. Numerical experiments will be presented to illustrate the efficiency and benefice of the perturbation technique.

### Keywords

• predictability
• optimal perturbation
• invariant subspace
• simultaneous stochastic perturbation
• dynamical system
• filter stability
• estimation of high-dimensional matrix

## 1. Introduction

Study in high-dimensional systems (HdS) today constitutes one of the most important research subjects thanks to the exponential increase of the power and speed of computers: after Moore’s law, the number of transistors in a dense integrated circuit doubles approximately every 2 years (see Myhrvold [1]). However, this exponential increase is still far from being sufficient for responding to great demand on computational and memory resources in implementing the optimal data assimilation algorithms (like Kalman filter (KF) [2], for example) for operational forecasting systems (OFS).

This chapter is devoted to the role of perturbations as an efficient tool for predictability of dynamical system, ensemble forecasting and for overcoming the difficulties in the design of data assimilation algorithms, in particular, of the optimal adaptive filtering for extremely HdS. In [3, 4], Lorenz has studied the problem of predictability of the atmosphere. It is found that the atmosphere is a chaotic system and a predictability limit to numerical forecast is of about 2 weeks. The barrier of predictability has to be overcome in order to increase the time period of a forecast further. The fact that estimates of the current state are inaccurate and that numerical models have inadequacies leads to forecast errors that grow with increasing forecast lead time. Ensemble forecasting aims at quantifying this flow-dependent forecast uncertainty. Today, a medium-range forecast has become a standard product. In the 1990s, the ensemble forecasting (EnF) technique was introduced in operational centers such as the European Centre for Medium-range Weather Forecast (ECMWF) (see Palmer et al. [5]), the NCEP (US National Center for Environmental Prediction) (Toth and Kalnay [6]). It is found that a single forecast can depart rapidly from the real atmosphere. The idea of the ensemble forecasting is to add the perturbations around the control forecast to produce a collection of forecasts that try to better simulate the possible uncertainties in a numerical forecast. The ensemble mean can then act as a nonlinear filter such that its skill is higher than that of individual members in a statistical sense (Toth and Kalnay [6]).

The chapter is organized as follows. Section 2 outlines first the optimal perturbation (OP) theory, on how the OP plays the important role for seeking the most growing direction of prediction error (PE). The predictability theory of the dynamical system as well as a stability of the filtering algorithm all are developed on the basis of OP. The definition of the optimal deterministic perturbation (ODP) and some theoretical results on the ODP are introduced. It is found that the ODP is associated with the right singular vector (SV) of the system dynamics. In Section 3, the two other classes of ODPs are presented: the leading eigenvector (EV) and real Schur vector (SchV) of the system dynamics. Mention that the first EV is the ODS in the eigen invariant subspace (EI-InS) of the system dynamics. As to the leading SchV, it is ODS in the Schur invariant subspace (Sch-InS) which is closely related to the EI-InS in the sense that the subspace of the leading SchVs, generated by the sampling procedure (Sampling-P, Section 3), converges to the EI-InS. In Section 4, we present the other type of OP called as optimal stochastic perturbation (OSP). Mention that the OSP is a natural extension of the ODP which gives insight into understanding of what represents the most growing PE and how one can produce it by stochastically perturbing the initial state. One important class of perturbations (known as simultaneous stochastic perturbation—SSP) is presented in Section 5. It will be shown that the SSP is very efficient for solving optimization problems in high-dimensional (Hd) setting. The different algorithms for estimating, decomposing … Hd matrices are also presented here. Numerical examples are presented in Section 6 for illustrating the theoretical results and efficiency of the OPs in solving data assimilation problems. The experiment on data assimilation in the Hd ocean model MICOM by the filters constructed on the basis of the Schur ODSs and SSPs is presented in Section 7. The concluding remarks are presented in Section 8.

## 2. Optimal perturbations: predictability and filter stability

### 2.1. Stability of filter

The behavior of atmosphere or ocean is recognized as highly sensitive to initial conditions. It means that a small change in an initial condition can alter strongly the trajectory of the system. It is therefore important to be able to know about the directions of rapid growth of the system state. The research on OP is namely aimed at finding the methods to better capture these rapidly growing directions of the system dynamics, to optimize the predictability of the physical process under consideration.

To explain this phenomenon more clearly, consider a standard linear filtering problem

xk+1=Φxk+wk+1,zk+1=Hxk+1+vk+1.E1

where ΦRn×n is the state transition matrix, HRp×n is an observation matrix. Under standard conditions related to the model and observation noises wk,vk, the minimum mean squared (MMS) estimate x̂k can be obtained by the well-known KF [2].

x̂k+1=x̂k+1/k+k+1,x̂k+1/k=Φx̂kE2

where ζk+1=zk+1Hx̂k+1/k is the innovation vector, x̂k+1 is the filtered (or analysis) estimate, x̂k+1/k is the one-step ahead prediction for xk+1. The KF gain K is given by

K=MHTHMHT+R1E3

From Eq. (2), it can be shown that the transition matrix for the filtered estimate equation is expressed by L=IKHΦ.

For HdS, the KF gain (3) is impossible to compute. In a study by Hoang et al. [7], it is suggested to find the gain with the structure

K=PrKeE4

with PrRn×ne—an operator projecting a vector from the reduced space Rne to the full system space Rn, KeRne×p is the gain for the reduced filter. One of very important questions arising here is how one can choose a subspace of projection and structure of Ke to make L to be stable? It is found in the work done by Hoang et al. [8] that detectability of the input-output system (1) is sufficient for the existence of a stabilizing gain K and this gain can be constructed with Pr consisting from all unstable EVs (or unstable SVs, SchVs. See Section 3) of the system dynamics.

### 2.2. Singular value decomposition and optimal perturbations

Consider the singular value decomposition (SVD) of Φ [9],

Φ=UDVT,D=diagσ1σ2σn,σ1σ2σn,U=U1U2,D=blockdiagD1D2,V=V1V2E5

where U1,V1R×n1, Dn1Rn1×n1, n1 is the number of all unstable and neutral SVs of Φ. In the future, for simplicity, unless otherwise stated, we say on the set of all unstable SVs as that including all unstable and neutral SVs.

Suppose the system is s-detectable (detectability of all the columns of U1 or V1). From the research by Hoang et al. [10], there exists a stabilizing gain Ks (sufficient but not necessary condition) with Pr=U1 (see Eq. (4)) such that the transition matrix L=IKsHΦ is stable. It signifies that the filter is stable and the estimation error is bounded. The columns of U1, i.e., the left unstable SVs of Φ, serve as a basis for seeking appropriate correction in the filtering algorithm.

On the other hand, in practice, for extreme HdS, one cannot compute all elements of U1 but only some of its subset U1U1. Using U1 instead of U1 cannot guarantee a filter stability. The ensemble forecasting has been proposed as an approach to prevent a possible large error in the forecast and requires a knowledge on the rapidly growing directions of the PE. In this context, the OPs appear to be important which allow to search the rapid growing directions of the PE by model integration of OPs. They (i.e., OPs) are infact the unstable right SVs (RSV).

### 2.3. Optimal perturbation

Let δx be a given perturbation, representing an error (uncertainty, deterministic, or stochastic) around the true system state x, i.e., x̂f=x+δx, (at some instant k). The prediction of the system state x̂p can be obtained by forwarding the numerical model (1) on the basis of x̂f—filtered estimate, i.e., x̂p=Φx̂f. We have then

x̂p=Φx̂f=Φx+δx=Φx+Φδx,E6

One sees the perturbation δx in the initial system state “grows” into Φδx which represents uncertainty in the forecast.

In general, the perturbation δx may be any element in the n-dimensional space, Rn, i.e., δxRn. For eflδxfl—a sample of the filtered error (FE) ef, integrating the model by ef results in eplΦδxfl—a sample for the PE ep. By generating the ensemble of perturbations EfLefll=1L according to the distribution of ef, one can produce the ensemble of PE samples EpLepll=1L and use them to estimate the distribution of the PE ep. This serves as a basis for the particle filtering [11].

The ensemble-based filtering (EnBF) algorithm is simplified for the standard linear filtering problems with ef being of zero mean and the error covariance matrix (ECM) P. This technique is aimed to approximate the ECM without solving the matrix Ricatti equation (Ensemble KF—EnKF, [12]. Mention that at the present, one can generate only about O(100) samples at each assimilation instant. This ensemble size is too small compared to the system dimension. That is why it is important to have a good strategy for selecting the “optimal” samples (perturbations) to better approximate the ECM in the filtering algorithm.

#### 2.3.1. Optimal deterministic perturbation

Introduce

Sδx:δxδx2=<δxδx>=1E7

where .2 denotes the Euclidean vector norm (here <.,.> denotes the dot product). Let Φ have the SVD (5).

Definition 2.1. The ODP δxo is the solution of the extremal problem

Jδx=Φδx2maxδxE8

under the constraint (7). One can prove

Lemma 2.1. The optimal perturbation in the sense (8) and (7) is

δxo=+/v1

where v1 is the first right SV of Φ.

#### 2.3.2. Subspaces of ODPs

Introduce

Φ1Φσ1u1v1T

Consider the optimization problem

J1δx=Φ1δx2maxδxE9

under the constraint (7). Similar to the proof of Lemma 2.1, one can prove

Lemma 2.2. The optimal perturbation in the sense (9) and (7) is

δxo=+/v2

where v2 is the second right SV of Φ.

By iteration, for

ΦiΦi1σiuiviT,i=1,,n1;Φ0=Φ.E10

applying Lemma 2.2 with slight modifications, one finds that the OPs for Φi,i=0,1,,n1 are +/vi,i=1,2,n..

Theorem 2.1. The optimal perturbation for the matrix Φi1,i=1,,n is +/vi where vi is the i th leading right SV of Φi1,i=1,,n.

The OP for Φi1 will be called the ith OP for Φ (or the ith SOP—singular OP).

Comment 2.2. The OPs, presented above, are optimal in the sense of the Euclidean norm .2. In practice, there is a need to normalize the state vector (using the inverse of the covariance matrix M). The normalization is done by changing δx=M1/2δx, and all the results presented above remain valid s.t. the new δx,

δx2=<M1/2δx,M1/2δx>=<δx,M1δx>δxM1

The weighted norm δxM1 is known as the Mahanalobis norm.

As to the PE, a normalization is also applied in order to have a possibility to compare different variables like density, temperature, velocity … In this situation, the norm for y=Φδx may be seminorm [5].

### 2.4. Ensemble forecasting

The idea of ensemble forecasting is that instead of performing “deterministic” forecasts, stochastic forecasts should be made: several model forecasts are performed by introducing perturbations in the filtered estimate or in the models.

Since 1994, NCEP (National Centers for Environmental Prediction, USA) has been running 17 global forecasts per day, with the perturbations obtained using the method of breeding growing perturbations. This ensures that the perturbations contain growing dynamical perturbations. The length of the forecasts allows the generation of outlook for the second week. At the ECMWF, the perturbation method is based on the use of SVs, which grow even faster than the bred or Lyapunov vector perturbations. The ECMWF ensemble contains 50 members [13].

## 3. Perturbations based on leading EVs and SchVs

The idea underlying the AF is to construct a filter which uses feedback in the form of the PE signal (innovation) to adjust the free parameters in the gain to optimize the filter performance. If in the KF, the optimality is defined as a minimum mean squared error (MMS), in the AF, optimality is understood in the sense of MMS for the prediction output error (innovation). This definition allows to define the optimality of the filter in the realization space, but not in the probability space as done in the KF.

The optimal gain thus can be determined from solving the optimization problem by adjusting all elements of the filter gain. There are two major difficulties:

1. Instability: As the filter gain is estimated stochastically during the optimization process, the filter may become unstable due to the stochastic character of the filter gain.

2. Reduction of tuning parameters: For extreme HdS, the number of elements in the filter gain is still very high. Reduction of the number of tunning gain elements is necessary.

### 3.2. Leading EVs and SchVs as optimal perturbations

Interest on stability of the AF arises soon after the AF has been introduced. The study on the filter stability shows that it is possible to provide a filter stability when the system is detectable [8]. For the different parameterized stabilizing gain structures based on a subspace of unstable and neutral EVs, see [8]. As the EVs may be complex and their computation is unstable (Lanczos [14]), in [8], it is proved that one can also ensure a stability of the filter if the space of projection is constructed from a set of unstable and neutral SchVs of the system dynamics. The unstable and neutral real SchVs are referred to as SchVs associated with the unstable and neutral eigenvalues of the system dynamics. The advantage of the real SchVs is that they are real, orthonormal, and their computation is stable. Moreover, the algorithm for estimating dominant SchVs is simple which is based on the power iteration procedure (Sampling-P, see [15]). As to the unstable SVs, although they are real and orthonormal, their computation requires an adjoint operator (the transpose matrix ΦT). Construction of adjoint code (AC) is a time-consuming and tedious process. Approximating leading SVs without (AC) can be done on the basis of Algorithms 5.2.

### 3.3. EVs as optimal perturbations in the invariant subspace

Let Φ be diagonalizable. Introduce the set

EV1xλ:xCnx2=1λC1:Φx=λx.E11

The subspace of xCn satisfying Φx=λx for some λC1 is known as an invariant subspace of Φ: the matrix Φx acts on to stretch the vector x but conserves the direction of x. Consider the optimization problem

Jδx=Φδx2maxδx,δxλEV1δxλ,E12

It is seen that the optimal solution is the first EV xei1 of Φ with the largest magnitude equal to λ1. We will call λ1 a first optimal EV perturbation (denoted as EI-OP).

For a symmetric matrix, the EI-OP coincides with the SOP. The EI-OP is not unique.

By solving the optimization problem (8) s.t.

EV2xλ:xCnx2=1:Φx=λxλC1λ<λ1.E13

one finds the second EI-OP xei2. In a similar way, by defining

EVixλ:xCnx2=1:Φx=λxλC1λ<λi1.E14

for i=1,2,..,n1, we obtain a sequence of EI-OPs xeii, i=1,2,..,n. The first ne EI-OPs are unstable SVs.

In general, for a defective case (not diagonalizable), Φ does not have n linearly independent EVs and the independent generalized EVs can serve as “optimal” perturbations to construct a subspace of projection in the AF.

To summarize, let the EV decomposition be

XeiJXei1=ΦE15

where J is a matrix of Jordan canonical form, Xei1 is the matrix inverse of Xei (see Golub and Van Loan [9]). The columns of Xei are the EVs of Φ, J is a block diagonal with the diagonal blocks of 1 or 2 dimensions. The rank k decomposition is Xei,1J1X˜ei,1 where

EVkXei,1J1X˜ei,1,Xei=Xei,1Xei,2,J=blockdiagJ1J2,X˜eiXei1=X˜ei,1TX˜ei,2T,E16

with Xei,1Rn×k, Xei,2Rn×nk. Multiplying the right of EVk by Xei,1 yields Xei,1J1, i.e., we obtain the k largest (in modulus) perturbations in the eigen (invariant) space of Φ. The perturbations being the column vectors of Xei,1 (i.e., the k first EVs of Φ) are the first k OPs of Φ in the eigen-invariant subspace (EI-InS).

### 3.4. Dominant SchVs as OPs in the Schur invariant subspace

The study of Hoang et al. [8] shows that the subspace of projection of the stable filter can be constructed on the basis of all unstable EVs or SchVs of Φ.

Compared to the EVs, the approach based on real Schur decomposition is of preference in practice since the SchVs are real and orthonormal. Moreover, there exists a simple, power iterative algorithm for approaching the set of real leading SchVs. According to Theorem 7.3.1 in Golub and Van Loan [9], the subspace RXs,1 spanned by the nu leading SchVs converges to the unique invariant subspace DnuΦ (called a dominant invariant subspace) associated with the eigenvalues λ1,,λnu if λnu>λnu+1. In this sense, we consider the leading SchVs as OPs (denoted as Sch-OP) in the Schur invariant subspace (Sch-InS).

## 4. Optimal stochastic perturbation (OSP)

In Section 2, the perturbation δx is deterministic (see Definition 2.1). In practice, it happens that δx is of stochastic nature. For example, the priori information on the FE is an zero mean random vector (RV) with the ECM P. The question arising here is how one determine the OP in such situation and how to find it.

We will consider now δx as an element of the Hilbert space H of RVs. This space H is a complete normed linear vector space equipped with the inner product

<x,y>H=E<x,y>E17

where E. denotes the mathematical expectation. The norm in H is defined as

xH=E<x,x>E18

All elements of H are of finite variance and for simplicity, we assume they all have zero mean value.

Introduce the set of RVS

Ssδxδx:δxH=1E19

Definition 4.1. The optimal stochastic perturbation (OSP) δxo is the solution of the extremal problem

Jδx=ΦδxHmaxδx,E20

under the constraint (19). One can prove

Lemma 4.1. For δxSsδx, there exists δy=δy1δynTSsδx such that δx=k=1nvkδyk.

Lemma 4.2. The optimal perturbation in the sense (20) and (19) is

δxo=ψv1

where ψ is a RV with zero mean and unit variance, v1 is the first right SV of Φ.

Comment 4.1. Comparing the ODP with OSP shows that if the ODS is the first right SV (defined up to the sign), the OSP is an ensemble of vectors lying in the subspace of the first right SV with the lengths being the samples of the RV of zero mean and unit variance.

Introduce

Φ1Φσ1u1v1T

and consider the objective function

Jδx=Φ1δxHmaxδx,E21

Lemma 4.3. The optimal perturbation in the sense (21) and (19) is δxo=ψv2, where ψ is an RV with zero mean and unit variance, v2 is the first right SV of Φ.

By iteration, for

ΦiΦi1σiuiviT,i=1,,n1;Φ0=Φ.E22

applying Lemma 4.3 with slight modifications, one finds that the OSP for Φk,k=0,1,,n1 are ψvk,k=1,2,n., where ψ is an RV with zero mean and unit variance, vk is the kth right SV of Φ.

Theorem 4.1. The optimal perturbation for the matrix Φi1,i=1,,n is ψvi, where ψ is an RV with zero mean and unit variance, vk is the kth right SV of Φ.

## 5. Simultaneous stochastic perturbations (SSP)

In [16], Spall proposes a simultaneous perturbation stochastic approximation (SPSA) algorithm for finding optimal unknown parameters by minimizing some objective function. The main feature of the simultaneous perturbation gradient approximation (SPGA) resides in the way to approximate the gradient vector (in average): a sample gradient vector is estimated by perturbing simultaneously all components of the unknown vector in a stochastic way. This method requires only two or three measurements of the objective function, regardless of the dimension of the vector of unknown parameters. In a study by Hoang and Baraille [17], the idea of the SPGA is described in detail, with a wide variety of applications in engineering domains. The application to estimation of ECM in the filtering problem is given in the work done by Hoang and Baraille [18]. In the research by Hoang and Baraille [19], a simple algorithm for estimating the elements of an unknown matrix as well as the way to decompose the estimated matrix into a product of two matrices, under the condition that only the matrix-vector product is accessible, has been proposed.

### 5.1. Theoretical background of SPGA: gradient approximation

The component-wise perturbation is a method for numerical computation of the cost function with respect to the vector of unknown parameters. It is based on the idea to perturb separately each component of the vector of parameters. For very HdS, this technique is impossible to implement. An alternative to the component-wise perturbation is the SSP approach.

Let

DJθ0=Jθ0θ1Jθ0θnθT

denote the gradient of Jθ computed at θ=θ0. Suppose Δj,j=1,,n are RVs independent identically distributed (i.i.d) according to the Bernoulli law which assumes two values +1 or −1 with equal probabilities 1/2. It implies that

EΔj=0,EΔj2=1,EΔj1=0,EΔj12=1,j=1,2,,nE23

Suppose Jθ is infinitely differentiable at θ=θ0. Using a Taylor series expansion,

ΔJJθ0+δ¯θJθ0=δ¯θTDJθ0+1/2δ¯θTD2Jθ0δ¯θ+E24

where D2Jθ0 is the Hessian matrix computed at θθ0. For the choice

δ¯θδθ1δθnθT=cΔ¯,Δ¯=Δ1Δ2ΔnθT,E25

c is a small positive value, from Eq. (24)

ΔJθ0=cΔ¯TDJθ0+c2/2Δ¯TD2Jθ0Δ¯+

Dividing both sides of the last equality by δθk=cΔk implies

ΔJθ0/δθk=DJθ0TΔ¯k+c/2Δ¯k,TD2Jθ0Δ¯+Δ¯kΔ1Δk11ΔnΔk1TE26

Taking the mathematical expectation for both sides of the last equation yields

EΔJθ0δθk1=DJθ0TEΔ¯k+c/2EΔ¯TD2Jθ0Δ¯k+E27

One sees that from the assumptions on Δ¯, EΔ¯k=010T it follows DJu0TEΔ¯k=Jθ0/θk. Moreover, as all the moments of the Bernoulli variables Δi and Δi1 are finite, EΔ¯TD2Jθ0Δ¯k=0 since there exists a finite D2Jθ0, one concludes that

EΔJθ0δθk1=Jθ0/θk+Oc2E28

The result expressed by Eq. (28) constitutes a basis for approximating the gradient vector by simultaneous perturbation. The left of Eq. (28) can be easily approximated by noticing that for an ensemble of L i.i.d samples Δ¯1Δ¯L, we can generate the corresponding ensemble of L i.i.d sample estimates for the gradient vector at θ=θ0,

DJlθ0=ΔJlθ0cΔ1lΔJlθ0cΔnlT,l=1,2,,LE29

where Δkl is the kth component of the lth sample Δ¯l. The left of Eq. (28) is then well approximated by averaging L sample gradients in Eq. (29),

EΔJθ0/δθk1/Ll=1Lηkl,ηklΔJlθ0cΔkl,E30

Introduce the notations

m¯kEΔJθ0/δθk,mkL1/Ll=1Lηkl,ekmkLm¯k.E31

Theorem 1 (Hoang and Baraille [19]) states that the estimate mLm1LmnθLT converges to the gradient vector DJθ0 as L and c0 with the order O1/L where

mL1/Ll=1Lηl.

### 5.2. Algorithm for estimation of an unknown matrix

Let Δ¯Δ1ΔnT, Δi,i=1,,n be Bernoulli independent and identically distributed (i.i.d.) variables assuming two values ±1 with equal probability 1/2. Introduce Δ¯11/Δ1,,1/ΔnT, Δ¯ccΔ¯,c>0 is a small positive value.

Algorithm 5.1. Suppose it is possible to compute the product Φx=bx for a given x. At the beginning let l=1. Let the value u be assigned to the vector x, i.e., xu, L be a (large) fixed integer number.

Step 1. Generate Δ¯l whose components are lth samples of the Bernoulli i.i.d. variables assuming two values +/− 1 with equal probabilities 1/2;

Step 2. Compute δbl=Φu+Δ¯clΦu, Δ¯cl=cΔ¯l.

Step 3. Compute gil=δbilΔ¯cl1, δbi is the ith component of δb, gil is the column vector consisting of derivative of biu w.r.t. u, i=1,,m.

Step 4. Go to Step 1 if l<L. Otherwise, go to Step 5.

Step 5. Compute

ĝi=1Ll=1Lgil,i=1,,m, Φ̂LDxb=ĝ1ĝmT.

### 5.3. Operations with Φ and its transpose

Algorithm 5.1 allows to store Φ̂L as composed from the two ensembles of vectors elements:

EnLδxδx1δxL,δxl=cΔ¯l,EnLδbδb1δbL,δbl=δb1lδbmlT,l=1,,L.E32

The product z=Φ̂Ly, yRn, can be performed as zi=k=1nϕ̂ikyk = k=1n1Ll=1Lδbilδxklyk, or in a more compact form

z=1Ll=1Lαlδbl,αlk=1nykδxkl.E33

Eq. (33) allows to perform z=Φ̂Ly with Lm+2n+1 elementary operations.

Similarly, computation of zi of z=Φ̂TLy,yRm is performed as

zi=1Ll=1L1δxilk=1mδbklyk,i=1,,nE34

### 5.4. Estimation of decomposition of Φ

Let Φ be a matrix of dimensions m×n. For definiteness, let mn with rankΦ=m. We want to find the best approximation for Φ among members of the class of matrices

Φe=ABT,ARm×r,BRn×r.E35

under the constraint

Condition (C) A,B are matrices of dimension m×r,r×n, rm, rankABT=r.

Under the condition (C), the optimization problem is formulated as

JAB=ΦΦeF2=ΦABTF2minAB,E36

where .F denotes the Frobenius matrix norm. Consider Φ and let UΣVT be SVD of Φ (5). Let Φ˜=Φ+ΔΦ,Φ˜=U˜Σ˜V˜T and σ˜1σ˜2σ˜m, σ˜k be the kth singular value of Φ˜. Then, we have

JAoBo=k=r+1mσk2E37

where AoBoT is a solution to the problem (36) s.t Condition (C) (Theorem 3.1 of Hoang and Baraille [19]).

Theorem 3.1. Hoang and Baraille [19] implies that ΦeoAoBoT is equal to the matrix formed by truncating the SVD of Φ to its first r SVs and singular values. It allows to avoid storing elements of the estimate Φ̂L of Φ (their number is of order O10m×n).

#### 5.4.1. Decomposition algorithms

Let the elements of Φ (or Φ̂) be available (may be in algorithmic form). By perturbing stochastically simultaneously all the elements of A and B, one can write out the iterative algorithm for estimating the elements of A and B. For more detail, see Hoang and Baraille [19].

#### 5.4.2. Iterative decomposition algorithm

Another way to decompose the matrix Φ is to solve iteratively the following optimization problems

Algorithm 5.2

At the beginning let i=1.

Step 1. For i=1, solve the minimization problem

J1=Φ1abTF2mina,b,aRm,bRn.Φ1Φ,rankabT=1

Its solution is denoted as âi,b̂i.

Step 2. For i<r, put ii+1 and solve the problem

Ji+1=ΦiabTF2mina,b,aRm,bRn.ΦiΦk=1i1âkb̂Tk,rankabT=1

Step 3. If i=r, compute

Φ̂=ÂrB̂Tr, Âr=â1âr, B̂r=b̂1b̂r.

and stop. Otherwise, go to Step 2.

From Theorem 3.2 of Hoang and Baraille [19], the couple Âr,B̂r is a solution for the problem (36)(C).

## 6. Numerical example

Consider the matrix ΦR2×2

ϕ11=5,ϕ12=7,ϕ21=2,ϕ22=4.

The singular values and the right SVs for Φ are displayed in Table 1 which are obtained by solving the classical equations for eigenvalues of ΦTΦ.

Elementϕijϕ̂ijϕ̂ij1ϕ̂ij2
(1,1)5.004.6774.914−0.237
(1,2)7.007.077.662−0.592
(2,1)−2.00−2.041−1.08−0.962
(2,2)−4.00−4.081−1.683−2.398

### Table 1.

Estimates of Φ obtained by Algorithm 5.2.

First, we apply Algorithm 5.1 to estimate the matrix Φ. Figure 1 shows the estimates produced by Algorithm 5.1. It is seen that the estimates are converging quickly to the true elements of Φ.

Next Algorithm 5.2 (Iterative Decomposition Algorithm) has been applied to estimate the decomposition of the matrix Φ. After each ith iteration, the algorithm yieds b̂i,ĉi.

The different OPs are shown in Table 2 where xsvri, xeii, and xschi are the theoretical SV-POs, EI-POs, Sch-POs. The vectors x̂schi are the components of Xt computed by Algorithm 3.1 (Sampling-P). As to ĉni, they are the results of normalization (with the unit Euclidean norm) of ĉi.

PerturbationsVectorPredictorAmplification
xsvr10.554,0.833T8.5974.438T9.676
xsvr20.8330.554T0.284,0.551T0.62
xei10.9620.275T2.8850.824T3
xei20.7070.707T1.414,1.414T2
xsch10.707,0.707T8.4854.243T9.487
xsch20.7070.707T1.414,1.414T2
x̂sch10.9620.275T8.1,4.4T9.22
x̂sch20.2750.962T2.8850.824T3
ĉn10.54,0.842T8.5924.447T9.674
ĉn20.372,0.928T8.3584.457T9.472

### Table 2.

Different OPs.

Looking at the first OPs, one sees that xsvr1, xsch1, x̂sch1, and ĉn1 produce almost the same amplification. The first xei1 has the amplification three times less than those of xsvr1, xsch1. The second xsch2 is much less opimal than xsvr2 and ĉn2. By comparing xsvri with ĉni for i=1,2, one concludes that the obtained results justify the correctness of Theorem 3.1 of Hoang and Baraille [19]. Mention that only x̂schi and ĉni can be calculated for HdS.

In Table 1, we show the results obtained by Algorithm 5.2 after two consecutive iterations (matrix estimation in R1 subspace). The elements of the true Φ=ϕij are displayed in the second column, whereas their estimates—in the third column,

Φ̂=Φ̂1+Φ̂2=i=12b̂iĉi,TΦ̂iϕ̂iji=b̂iĉi,T

The estimates, resulting from the first iteration, are the elements of Φ̂1 (Table 1, column 4). After the first iteration, Φ2ΦΦ̂1=b2c2,T and the optimization yields the estimates ϕ̂ij2 displayed in the column 5. From the columns 4–5, one sees that the first iteration allows to well estimate the two biggest elements Φ11=5, Φ12=7. In the similar way, the second iteration captures the two biggest elements of Φ2.

## 7. Assimilation in high-dimensional ocean model MICOM

### 7.1. Ocean model MICOM

To see the impact of optimal SchVs in the design of filtering algorithm for HdS, in this section, we present the results of the experiment on the Hd ocean model MICOM (Miami Isopycnal Ocean Model). This numerical experiment is identical to that described in Hoang and Baraille [15]. The model configuration is a domain situated in the North Atlantic from 30°N to 60°N and 80°W to 44°W; for the exact model domain and some main features of the ocean current produced by the model, see and Baraille [15]. The system state x=huv where h=hijk is a layer thickness and u=uijk,v=vijk are two velocity components. Mention that after discretization, the dimension of the system state is n=302400. The observations available at each assimilation instant are the sea surface height (SSH) with dimension p=221.

#### 7.1.1. Data matrix based on dominant Sch-Ops

The filter is a reduced-order filter (ROF) with the variable h as a reduced state and u,v are calculated on the basis of the geostrophy hypothesis. To obtain the gain in the ROF, first the Algorithm 3.1 has been implemented to generate an ensemble of dominant SchVs (totally 72 SchVs, denoted as EnSCH). The sample ECM MdSCH is computed on the basis of the EnSCH. Due to rank deficiency, the sample MdSCH is considered only as a data matrix. The optimization procedure is applied to minimize the distance between the data matrix MdSCH and the structured parametrized ECM MSCH=MvMh which is written in the form of the Schur product of two matrices MSCH=MvθMhθ. Here, Mv is the vertical ECM, Mh is the horizontal ECM [18]), θ is a vector of unknown parameters. Mention that the hypothesis on separability of the vertical and horizontal variables in the ECM is not new in the meteorology [20]. The gain is computed according to Eq. (3) with R=αI, α>0 is a small positive value. The ROF is denoted as PEF (SSP).

#### 7.1.2. Data matrix based on SSP approach

The second data matrix MdSSP is obtained by perturbing the system state according to the SSP method. The SSP samples are simulated in the way similar to that described above for generating EnSCH, with the difference that the perturbation components δhlijk are the i.i.d. random Bernoulli variables assuming two values ±1 with the equal probability 1/2. The same optimization procedure has been applied to estimate MSSP. The obtained ROF is denoted as PEF (SSP).

Figure 2 shows the evolution of estimates for the gain coefficients k1 computed from the estimated coefficients of ĉkl of MdSCH and MdSSP on the basis of EnSCH (curve” schur”) and EnSSP (curve “random”), during model integration. It is seen that two coefficients are evolved in nearly the same manner, of nearly the same magnitude as that of k1 in the CHF (Cooper-Haines filter, Cooper and Haines [21]). Mention that the CHF is a filter widely used in the oceanic data assimilation, which projects the PE of the surface height data by lifting or lowering of water columns.

The same pictures are obtained for the estimates ĉk,k=2,3,4. Mention that in the CHF, c2=c3=0.

### 7.2. Performance of different filters

In Table 3, the performances of the three filters are displayed. The errors are the averaged (spatially and temporally) rms of PE for the SSH and for the two velocity components u and v.

rmsCHF (cm)PEF (SCH) (cm/s)PEF (RAN) (cm/s)
ssh(fcst)7.395.094.95
u(fcst)7.595.365.29
v(fcst)7.725.735.64

### Table 3.

rms of PE for ssh, and u, v velocity components.

The results in Table 3 show that two filters PEF (SCH) and PEF (SSP) are practically of the same performance, and their estimates are much better compared to those of the CHF, with a slightly better performance for the PEF (SSP). We note that as the PEF (SCH) is constructed on the basis of an ensemble of samples tending to the first dominant SchV, its performance must be theoretically better than that of the PEF (SSP). The slightly better performance of PEF (SSP) (compared to that of PEF (SCH)) may be explained by the fact that the best theoretical performance of PEF (SCH) can be obtained only if the model is linear, stationary, and the number of PE samples in EnSCH at each iteration must be large enough. The ensemble size of EnSCH in the present experiment is too small compared with the dimension of the MICOM model.

To illustrate the efficiency of adaptation, in Figure 3, we show the cost functions (variances of innovation) resulting from the three filters PEF (SCH), PEF (SSP), and AF (i.e., APEF based on PEF (SCH); the same performance is observed for the AF based on PEF (SSP)). Undoubtedly, the adaptation allows to improve considerably the performances of nonadaptive filters.

## 8. Conclusion remarks

We have presented in this chapter the different types of OPs—deterministic, stochastic, or optimal, the invariant subspaces of the system dynamics. The ODPs and OSPs play an important role in the study on the predictability of the system dynamics as well as in construction of optimal OFS for environmental geophysical systems.

One other class of perturbation known as SSP is found to be a very efficient tool for solving optimization and estimation problems, especially with Hd matrices and in computing the optimal perturbations.

The numerical experiments presented in this chapter confirm the important role of the different types of OPs in the numerical study of Hd assimilation systems.

## References

1. 1. Myhrvold N. Moore’s Law Corollary: Pixel Power. New York Times. June 7, 2006 [Retrieved: 2011-11-27]
2. 2. Kalman RE. A new approach to linear filtering and prediction problems. Transactions of the ASME-Journal of Basic Engineering. 1960;82:35-45
3. 3. Lorenz EN. Deterministic non-periodic flow. Journal of the Atmospheric Sciences. 1963;20:130-141
4. 4. Lorenz EN. Atmospheric predictability experiments with a large numerical model. Tellus. 1982;34:505-513
5. 5. Palmer TN, Barkmeijer J, Buizza R, Petroliagis T. The ECMWF ensemble prediction system. Quarterly Journal of the Royal Meteorological Society. 1997;4(4):301-304
6. 6. Toth Z, Kalnay E. Ensemble forecasting at NMC: The generation of perturbations. Bulletin of the American Meteorological Society. 1993;74:2317-2330
7. 7. Hoang HS, De Mey P, Talagrand O, Baraille R. A new reduced-order adaptive filter for state estimation in high dimensional systems. Automatica. 1997;33:1475-1498
8. 8. Hoang HS, Talagrand O, Baraille R. On the design of a stable adaptive filter for high dimensional systems. Automatica. 2001;37:341-359
9. 9. Golub GH, Van Loan CF. Matrix Computations. 2nd ed. Baltimore-London: Johns Hopkin Press, 1993
10. 10. Hoang HS, Baraille R, Talagrand O. On the stability of a reduced-order filter based on dominant singular value decomposition of the systems dynamics. Automatica. 2009;45(10):2400-2405. DOI: 10.1016/j.automatica.2009.06.032
11. 11. Doucet A, Johansen AM. A tutorial on particle filtering and smoothing: fifteen years later (PDF). Technical report. Department of Statistics, University of British Columbia; 2008
12. 12. Evensen G. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dynamics. 2003;53:343-367
13. 13. Kalnay E. Atmospheric Modeling, Data Assimilation and Predictability. Cambridge University Press; 2002
14. 14. Lanczos C. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards. 1950;45:255-282
15. 15. Hoang HS, Baraille R. Prediction error sampling procedure based on dominant Schur decomposition. Application to state estimation in high dimensional oceanic model. Applied Mathematics and Computation. 2011;218(7):3689-3709. DOI: 10.1016/j.amc.2011.09.012
16. 16. Spall JC. Accelerated second-order stochastic optimization using only function measurements. In: Proceedings of 36th IEEE Conference on Decision and Control; 1997. pp. 1417-1424
17. 17. Hoang HS, Baraille R. Stochastic simultaneous perturbation as powerful method for state and parameter estimation in high dimensional systems. In: Baswell AR, editor. Advances in Mathematics Research. Vol. 20. New-York: Nova Science Publishers; 2015. pp. 117-148
18. 18. Hoang HS, Baraille R. On the efficient low cost procedure for estimation of high-dimensional prediction error covariance matrices. Automatica. 2017;83:317-330
19. 19. Hoang HS, Baraille R. A simple numerical method based simultaneous stochastic perturbation for estimation of high dimensional matrices. Multidimensional Systems and Signal Processing. 2018. DOI: 10.1007/s11045-018-0551-y
20. 20. Daley R. The effect of serially correlated observation and model error on atmospheric data assimilation. Monthly Weather Review. 1992;120:165-177
21. 21. Cooper M, Haines K. Altimetric assimilation with water property conservation. Journal of Geophysical Research. 1996;101:1059-1077

Written By

Hong Son Hoang and Remy Baraille

Submitted: November 21st, 2017 Reviewed: April 16th, 2018 Published: November 5th, 2018