Open access peer-reviewed chapter

# An Introduction to Ensemble-Based Data Assimilation Method in the Earth Sciences

By Youmin Tang, Zheqi Shen and Yanqiu Gao

Submitted: October 14th 2015Reviewed: June 27th 2016Published: October 19th 2016

DOI: 10.5772/64718

## Abstract

In this chapter, the ensemble-based data assimilation methods are introduced, including their developments, applications and existing concerns. These methods include both traditional methods such as Kalman filter and its derivatives and some advanced algorithms such as sigma-point Kalman filters and particle filters. Emphasis is placed on the challenges of applying these methods onto high-dimensional systems in the earth sciences.

### Keywords

• data assimilation
• Kalman filter
• EnOI
• EnKF
• particle filter

## 1. Introduction

In this chapter, we will talk about the modelling and simulation using both observed data and numerical models, that is, the observations will be incorporated into numerical models for optimal modelling and simulation. In statistics, this is called state-space estimation. In the earth science, it is called data assimilation. For example, a strict definition of data assimilation in atmospheric and oceanic sciences is the process to estimate the state of a dynamic system such as atmospheric and oceanic flow by combining the observational and model forecast data [1].

In general, assimilation methods can be classified into two categories: variational and sequential. This chapter is a tutorial on the sequential data assimilation methods such as ensemble Kalman filter (EnKF) and its variants. A brief introduction of the particle filter (PF) is also provided in this chapter.

This tutorial places emphasis on the rationale behind each method, including: (i) the principle for deriving the algorithm; (ii) the basic assumptions of each method; (iii) the connection and relation between different methods (e.g. extended Kalman filter (EKF) and EnKF, EnKF and sigma-point Kalman filters (SPKF), etc.); and (iv)the advantage and deficiency of each method.

This chapter has been written and organized through teaching for under-/graduatestudents in earth science courses. It can also be a good reference to researchers in the field of modelling and data assimilation.

## 2. The general framework of several assimilation approaches

Intuitionally, one might think that an optimal simulation scheme is to directly replace model variables by observations during numerical integrations. Such a direct replacement is usually not correct since observations are not perfect and contain errors. A simple replacement will introduce observation errors into models, and ignore possible impact of observation errors on model behaviours, easily resulting in imbalance of model dynamics and physics. Thus, the application of observations into numerical models must consider both model and observation errors that play a critical role in the assimilation process.

We will start to display the assimilation concept by a simple example. A detail introduction can be found in [2].

For an unknown true state value, denoted by Tt, there are two samples, denoted by T1(e.g. model simulation) and T2(observation), which have the errors

1 and
2, respectively. Thus, we have

T1=Tt+ϵ1,E1
T2=Tt+ϵ2.E2

We assume the measurement or observation is unbiased, and the variances of errors are known, i.e. E(

1) = E(
2) = 0, Var(
1) = σ12, Var(
2) = σ22. The question here is to seek an optimal estimate, denoted by Ta(called analysis in the assimilation field), for Ttusing T1and T2. This optimal estimate is the central issue of data assimilation.

There are several methods for this solution, as demonstrated below.

### 2.1. Least-squares method

Assume the analysis is a linear combination of both T1and T2, that is, Ta=a1T1+a2T2. Due to the assumption that both T1and T2are unbiased, Tashould be unbiased, i.e. E(Ta)=E(Tt), so a1E(T1)+a2E(T2)=E(Tt), and then a1+a2=1. The best (optimal) estimate should minimize the variance of Taas below:

σa2=E[TaTt]2=E[a1T1+a2T2Tt]2=E[a1(T1Tt)+a2(T2Tt)]2=E(a12ϵ12+a22ϵ22+2a1a2ϵ1ϵ2)=a12σ12+(1a1)2σ22,E3

here, we assumed that the errors of T1and T2are uncorrelated, i.e. E(

1
2)=0. To minimize σa2, let σa2/a1=0, thus

a1=σ22σ12+σ22E4

Namely,

Ta=a1T1+(1a1)T2=T1+σ22σ12+σ22(T2T1).E5

Using Eq. (5), the variance of Ta could be minimized.

### 2.2. Variational approach

In general, assimilation methods can be classified into two categories: variational and sequential. Variational methods such as three-dimensional variational (3D-Var) method and four-dimensional variational (4D-Var) method [3, 4] are batch methods, whereas sequential methods such as Kalman filter (KF) [5] belong to the estimation theory.

They both have had great success. The European Centre for Medium-Range Weather Forecasts (ECMWF) introduced the first 4D-Var method into the operational global analysis system in November 1997 [68]. The ensemble Kalman filter (EnKF) was first introduced into the operational ensemble prediction system by Canadian Meteorological Centre (CMC) in January 2005 [9].

This chapter is a tutorial of the ensemble-based sequential data assimilation methods, such as EnKF and its variants. However, we will briefly demonstrate the idea of variational assimilation by the above example.

First, a cost function should be defined for variational assimilation approach. For this simple example, we define the cost function as below:

J(T)=12[(TT1)2σ12+(TT2)2σ22]E6
T=a1T1+a2T2.E7

The solution is to seek an analysis Ta, determined by a1and a2, leading to the cost function minimum, i.e. J(Ta)=min{J(T)}. Obviously, we have J(T)/a1=0and J(T)/a2=0. Substitute with (6), it is

J(T)a1=TT1σ12Ta1+TT2σ22Ta1E8

Eq. (7) leads to Ta1=T1. Thus, the solution of (8), denoted by Ta, satisfies

Ta=σ22σ12+σ22T1+σ12σ12+σ22T2.E9

The above is a simple example of the 3D variational assimilation approach, where we only consider the analysis error (cost function) for a single time. However, in many cases, we need to consider the error growth during a period, i.e. the sum of errors during the period, in the cost function Eq. (6). For example, the cost function of 4D-Var is defined as below:

J(T)=12t=1N[(T(t)T1(t))2σ12+(T(t)T2(t))2σ22].E10

Meanwhile T(tn)follows a dynamical model, saying T(tn)=t0tnF(T(t))dt=Mn(T(t0)), where Fis a nonlinear dynamical model, Mn is the integral operator and t0is the initial time. Thus, the cost function value of (10) is only determined by the initial condition. Namely, the objective here is to seek optimal initial condition T(t0)that enables (10) minimum, i.e. minimizing (10) subject to dynamical model F. This is a standard conditional extreme problem that can be solved by Lagrange multiplier approach. However, the complexity of dynamical model excludes the possibility to get the analytical solution. We have to solve the minimum problem with aid of numerical methods, e.g. Newton conjugate gradient method. All of numerical methods require the gradient value JT0for solution.

Again, it is almost impossible for obtaining analytical solution of JT0due to the complexity of F. Usually researchers get the gradient value numerically using an approach of tangent linear and adjoint models. The details on tangent linear and adjoint models can be found in relevant references as cited above. It should be noticed that it is very difficult, even intractable sometimes, to construct tangent linear and adjoint models in some cases. Thus, more and more researchers have started to apply sequential assimilation methods instead of 4D-Var in recent years. Next, we will introduce the concept of the sequential assimilation method using the above example.

### 2.3. Bayesian approach

Assume T1and σ1are the mean value and standard deviation of the model prediction that implies a prior probability distribution of truth T,

p(T)=12πσ1e(T1T)22σ12E11

Obviously, this is a Gaussian distribution function, which can be denoted by N(T1, σ1) Given the observation T2and its standard deviation σ2, the posterior distribution of the truth can be expressed by Bayes’ theorem:

p(T|T2)=p(T2|T)p(T)p(T2)12πσ2e(T2T)22σ2212πσ1e(T1T)22σ12. E12

p(T2)was ignored in (12) since it is independent of T, and usually plays as a normalization factor. The likelihood function p(T2|T)describes the probability that the observation becomes T2given an estimation of T. It is commonly assumed to be Gaussian N(T,σ2). The object here is to estimate the truth by maximizing the posterior probability p(T|T2)(namely, we ask the truth to occur as much as possible—maximum probability). Maximizing p(T|T2)is equivalent to maximizing the logarithm of the right item of (12), i.e.

log(p(T|T2))=log(12πσ2)(T2T)22σ22+log(12πσ1)(T1T)22σ12=const12[(TT2)2σ22+(TT1)2σ12].E13

Obviously, the maximum of p(T|T2)occurs at the minimum of the second item on the right-hand side of (13), i.e. the minimum of the cost function Jof (6). Thus, under the assumption of Gaussian distribution, maximizing a posterior probability (Bayesian approach) is equivalent to minimizing cost function (variational assimilation approach). Further, if the model Fis linear and the probability distribution is Gaussian, it can be further proved that the Kalman filter is equivalent to 4D-Var adjoint assimilation method.

## 3. Optimal interpolation (OI) and Kalman filter (KF)

### 3.1. Optimal interpolation

The most special case in data assimilation is that the forecast model is linear and the errors are Gaussian. The solution among sequential methods to this case is provided by Kalman filter. Typically, the Kalman filter applies to the below state-space model:

xt+1=Mxt+ηt,E14
yt=Hxt+ζt,E15

where Mand Hare linear operators of model and measurement, respectively. xis model state and yis the observation, and the subscript implies the time step. ηtand ζtare the model errors and observational errors, respectively, which have variance:

. The objective here is to estimate model state xusing y, making it close to true state (unknown) as much as possible.

Assuming the estimate of model state xaat a time step is a linear combination of model forecast xband observation yo, i.e. the filter itself is linear, so

xa=xb+K[yoHxb].E16

Eq. (16) is the standard expression of Kalman filter. Kis called Kalman gain that determines the optimal estimate and yoHxbis called the innovation. An analysis step is essentially to determine the increment to the forecast by combining the Kalman gain and the innovation. Before deriving K, we denote the covariance matrix of the analysis error

aby Pa, i.e. Pa= <
a,(
a)T>, where
a=xaxtrand xtris the true value of model state. Similarly, observed errors and forecast errors are defined by
o=yoHxtrand
b=xbxtr, respectively. It should be noticed that the forecast error
bis different from the model error ζtthat is a systematic bias. Also, we denote B= <
b, (
b)T> as the background (forecast) error covariance and R= <
o, (
o)T> as the observational error covariance. It is also assumed that the observation error is not related to forecast error, so <
b, (
o)T> = <
o, (
b)T> = 0.

Clearly, we are seeking for Kthat can lead to Paminimum. Subtracting xtron both sides of Eq. (16) leads to the below equation:

xaxtr=xbxtr+K[yoHxb+HxtrHxtr].E17

Namely,

ϵa=ϵb+K(ϵoHϵb),E18

And

Pa=E[ϵb+K(ϵoHϵb)][ϵb+K(ϵoHϵb)]T=E[ϵb(ϵb)T+ϵb(ϵoHϵb)TKT+K(ϵoHϵb)(ϵb)T+K(ϵoHϵb)(ϵoHϵb)TKT]=BBHTKTKHB+K(R+HBHT)KT.E19

Here, we used B=BT. The optimal estimate asks the trace of Paminimum, namely, [trace(Pa)]/K=0. It can be computed that

K=BHT(HBHT+R)1.E20

Substitute into (13)

Pa=BBHTKTKHB+BHT(HBHT+R)(1)(R+HBHT)KT=(IKH)B.E21

Here, we invoked the below properties:

AxxT=xTAx=AE22
xTAxx=xT(A+AT)E23
ATxx=xTATxT=ATE24
(trace[xAxT])x=2xAE25
(trace[xA])x=(trace[AxT])x=AE26

Thus, we have the optimal estimate filter:

xa=xb+K[yoHxb],E27
K=BHT(HBHT+R)1,E28
Pa=(IKH)B.E29

In the estimate (27)–(29), if the background error covariance Bis prescribed, the estimate is called optimal interpolation. The OI does not involve state equation (14) and Bis unchanged during the entire assimilation process.

### 3.2. Kalman filter

Now, we consider that Bin (28) changes with the assimilation cycle. This is more realistic since the model prediction errors should be expected to decrease with the assimilation.

From Eq. (14), we have

xt+1tr=Mxttr+ηt,E30
xt+1b=E(Mxta+ηt)=MxtaE31

Eq. (30) indicates that even the true value is input at a time step, model cannot get a true value for next step due to model bias ηt. Eq. (31) shows a standard procedure for the model prediction of next step starting from the analysis of previous step.

Subtracting (30) from (31) produces

ϵt+1b=Mϵtaηt,E32
Bt+1=E(ϵt+1b(ϵt+1b)T)=E[(Mϵta+ηt)(Mϵta+ηt)T]=MPtaMT+QE33

where Pta = <

ta, (
ta)T> represents the analysis error covariance for time step t. The above equation considers the evolution of the background (prediction) error covariance with the time controlled by the dynamical model operator M. The above equations constitute the framework of Kalman filter (Table 1), namely

 Analysis step xta=xtb+K[yo−Hxtb],K=BtHT(HBt HT+R)−1,Pta=(I−KH)Bt, Prediction step xt+1b=Mxta,Bt+1=MPtaMT+Q

### Table 1.

The Kalman filter.

One Kalman filter cycle consists of two parts, namely, one analysis step (Eqs. (27)–(29)) and one prediction step (Eqs. (31) and (33)). The analysis state xtaand covariance Ptaare treated as initial conditions for the prediction step, until the next observation is available. Sometimes, Btis denoted by Ptfin Kalman filter literatures.

### 3.3. Extended Kalman filter (EKF)

In deriving the Kalman filter, we assume the state model Mand measurement model Hare both linear. Further, we also assume the error has Gaussian distribution. Therefore, classic KF only works for linear models and Gaussian distribution. If the dynamical model and measurement model are not linear, we cannot directly apply KF. Instead, linearization must be performed prior to apply KF. The linearized version of KF is called extended KF (EKF), which solves the below state-space estimate problem:

xt+1=f(xt)+ηt,E34
yt=h(xt)+ζt,E35

where fand hare nonlinear models, and ηtand ζtare additive noises.

The filter is still assumed to be ‘linear’, i.e.

xa=xb+K[yoh(xb)]E36

Actually, it is not a linear combination of the forecast xband observation yoif his not linear. However, we just extend the formulation of Eq. (16), and apply it intuitively in nonlinear cases. Ignoring high-order terms, the following holds approximately

h(x+δx)=h(x)+hxδx=h(x)+HδxE37

where His the linearization of hand Hi,j=hixj. So,

yoh(xb)=yoh(xtr+xbxtr)=yoh(xtr)H(xbxtr)=ϵoHϵbE38
xa=xb+K(ϵoHϵb)E39

Eq. (39) is identical to Eq. (16). Similarly, subtracting xtron both sides of Eq. (47) leads to the below equation:

ϵa=ϵb+K(ϵoHϵb)E40

which is the same as Eq. (18). Following the same derivation as that for Eq. (18), we can obtain the equations similar to (27)–(29). Therefore, if the measurement model his nonlinear, the KF can be still applied with a linearization of h.

Similar to Eqs. (30) and (31), the state model is as below:

xt+1tr=f(xttr)+ηtE41
xt+1f=E(f(xta)+ηt)=f(xta).E42

Subtracting Eq. (41) from Eq. (42) produces

ϵt+1f=f(xta)f(xttr)ηt=f(xta)f(xttrxta+xta)ηt=f(xta)f(xtaϵta)ηt=MϵtaηtE43

where Mi,j=fixj.

Comparing Eq. (31) with Eq. (33), it reveals that Eq. (33) still works. Thus, the EKF can be summarized as below (Table 2).

The procedure to perform EKF is similar to that for KF, as listed above. The disparities and similarities between EKF and KF include

1. Kalman gain K has the same form for both, especially the linear or linearized measurement model should be used;

2. the update equation of model error covariance has the same form, with linear and linearized state model used;

3. forecast model is different, with KF using linear Eq. (14) and EKF using nonlinear model Eq. (34); and

4. the filtering algorithm is different, linear measurement model Hused in KF and nonlinear model hin EKF.

It should be noticed that EKF is only an approximate of KF for nonlinear state model.

 Analysis step xta=xtb+K[yo−h(xtb)],K=BtHT(HBtHT+R)−1,Pta=(I−KH)Bt,Hi,j=∂hi∂xj. Prediction step xt+1f=f(xta),Bt+1=MPtaMT+QMi,j=∂fi∂xj,

### Table 2.

The extended Kalman filter.

## 4. Ensemble Kalman filter (EnKF)

### 4.1. Basics of EnKF

A challenge in EKF is to update background (prediction) error covariance, which requires the linearization of nonlinear model. The linearization of nonlinear model is often difficult technically, and even intractable in some cases, e.g. non-continuous functions existing in models. Another drawback of EKF is to neglect the contributions from higher-order statistical moments in calculating the error covariance.

To avoid the linearization of nonlinear model, the ensemble Kalman filter (EnKF) was introduced by Evensen [10, 11], in which the prediction error covariance B of Eq. (33) are estimated approximately using an ensemble of model forecasts. The main concept behind the formulation of the EnKF is that if the dynamical model is expressed as a stochastic differential equation, the prediction error statistics, which are described by the Fokker-Flank equation, can be estimated using ensemble integrations ( [10, 12]; thus, the error covariance matrix B can be calculated by integrating the ensemble of model states. The EnKF can overcome the EKF drawback that neglects the contributions from higher-order statistical moments in calculating the error covariance. The major strengths of the EnKF include the following:

1. there is no need to calculate the tangent linear model or Jacobian of nonlinear models, which is extremely difficult for ocean (or atmosphere) general circulation models (GCMs);

2. the covariance matrix is propagated in time via fully nonlinear model equations (no linear approximation as in the EKF); and

3. it is well suited to modern parallel computers (cluster computing) [13].

EnKF has attracted a broad attention and been widely used in atmospheric and oceanic data assimilation.

Simply saying, EnKF avoids the computation and evolution of the error covariance B as in Eq. (33), and computes Busing below formula as soon as it is required.

B=1N1i=1N(xibxb¯)(xibxb¯)TE44

where xibrepresents the i-th member of the forecast ensemble of system state vector at step t, and Nis the ensemble size. The use of Eq. (44) avoids processing M, the linearized operator of nonlinear model. However, the measurement function His still linear or linearized while computing the Kalman gain K, which causes concern. To avoid the linearization of nonlinear measurement function, Houtekamer and Mitchell [14] wrote Kalman gain by

K=BHT(HBHT+R)1,E45
BHT1N1i=1N[xibxb¯][h(xib)h(xb)¯]T,E46
HBHT1N1i=1N[h(xib)h(xb)¯][h(xib)h(xb)¯]T,E47

where h(xb)¯=1Ni=1Nh(xib). Eqs. (46) and (47) allow direct evaluation of the nonlinear measurement function hin calculating Kalman gain. However, Eqs. (46) and (47) have not been proven mathematically, and only were given intuitionally. Tang and Ambadan argued that Eqs. (46) and (47) approximately hold if and only if h(xb)¯=h(xb¯)and xibxb¯is small for i=1,2,...,N[15]. Under these conditions, Tang et al. argued Eqs. (46) and (47) actually linearize the nonlinear measurement functions hto H[16]. Therefore, direct application of the nonlinear measurement function in Eqs. (46) and (47), in fact, imposes an implicit linearization process using ensemble members. In next section, we will see that Eqs. (46) and (47) can be modified under a rigorous framework.

Thus, the procedures of EnKF are summarized as below (Table 3):

1. Imposing perturbations on initial conditions and integrate the model, i.e. xi,1=f(x0+γi),where i=1,2...,N(ensemble size) and x0is the initial condition.

2. Using K=BHT(HBHT+R)1and Eqs. (46) and (47) to calculate Kalman gain K.

3. Calculating analysis using

xia=xib+K[yo+εih(xib)],E48

after Kis obtained. It should be noted that each ensemble member produces an analysis; the average of all (N) analyses can be obtained.

4. Using xi,t+1b=f(xia)to obtain new ensemble members for next round analysis.

5. Repeating Steps 2–4 until the end of assimilation period.

 Analysis step xia=xib+K[yo+εi−h(xib)],i=1,…,NK=BHT(HBHT+R)−1,BHT=1N−1∑i=1N[xib−xb¯][h(xib)−h(xb)¯]T,HBHT=1N−1∑i=1N[h(xib)−h(xb)¯][h(xib)−h(xb)¯]T Prediction step xi,t+1b=f(xia+γi), i=1,…,N

### Table 3.

The ensemble Kalman filter.

It should be noted that the observation should be treated as a random variable with the mean equal to yo and covariance equal to R. This is why there is εiin Eq. (48). Simply, εiis often drawn from a normal distribution εiN(0,R).

From the above procedure, we find that Eq. (44) is not directly applied here. Instead, we use Eqs. (46) and (47) to calculate K. This is because Eqs. (46) and (47) avoid the linearization of nonlinear model and also avoid the explicit expression of matrix B, which is often very large and cannot be written in current computer sources in many realistic problems. The measurement function, h, projecting model space (dimension) to observation space (dimension), greatly reduces the number of dimension.

### 4.2. Some remarks on EnKF with large dimensional problems

#### 4.2.1. Initial perturbation

The success of EnKF highly depends on the quality of ensemble members produced by initial perturbations. It is impractical to represent all possible types of errors within the ensemble because of the computational cost, the method of generating initial perturbations must be chosen judiciously.

The first issue is the amplitude of initial perturbations. Usually, the following two factors are considered when selecting the amplitude of initial perturbations: the amplitude of observation error and the amplitude of model errors induced by model parameters and uncertainty in model physics. If a model is perfect, the amplitude of the perturbations should be the same as the amplitude of observation errors. This combined error is used to determine the amplitude of perturbations.

When the perturbation amplitude is determined, the practical initial perturbation field generating each ensemble member could be constructed by a normalized pseudorandom field multiplied by the prescribed amplitude. Considering the spatial coherence, the pseudorandom field is red noise as proposed by Evensen [17], summarized as below:

1. Calculate the statistical characteristics for the pseudorandom field to meet its variance of 1 and mean of 0 by solving the following nonlinear equation:

e1=l,pe2(kl2+rp2)/σ2cos(klrh)l,pe2(kl2+rp2)/σ2,E49

where kl=2πlxn=2πlNxΔx, rp=2πpym=2πpNyΔy, and Nxand Nyare the number of grid points in the x-axis (lon.) and the y-axis (lat.). The land pare wavenumbers, changing from 1 to the maximum value of N/2and M/2. Δxand Δyare the intervals of two adjacent points, often set to 1, and rhis the decorrelation length. The purpose of Eq. (49) is to derive σ2for the other feature:

c2=1Δkl,pe2(kl2+rp2)/σ2E50

2. After cand σ2are obtained, we can construct a two-dimensional pseudorandom field:

W(xn,ym)=l,pcΔke(kl2+rp2)σ2e2πiφ(l,p)ei(klxn+rpym)Δk.E51

3. While xn, ymcover the whole domain, Eq. (51) produces a Nx*Nytwo-dimensional random field with spatial coherence structure and the variance of 1 and mean of 0. If the realistic uncertainty (error) has an amplitude β, the perturbation should be βW. Similarly, Eq. (51) is often used for the error perturbation γiused in the fourth step of the EnKF procedure.

Sometimes, we need to consider the vertical coherence of pseudorandom fields between adjacent levels in oceanic models. A simple method for this purpose is to construct the pseudorandom field at the kth level εkby following equation:

εk=αεk1+1α2Wk,E52

where Wk(k=1,...,Nz)is the pseudorandom field at the kth level without considering vertical coherence, constructed using the above method. Initially, for the surface perturbation (k=1), the vertical coherence is not considered, i.e. equals to zero since εk1does not exist. Eq. (52) indicates that a practical pseudorandom at the kth level (εk) is composed of Wkand εk1. As such the εkis correlated with εk1, i.e. the practical pseudorandom fields of two adjacent levels (εk1and εk) are coherent with each other. Their correlation or coherent structure is determined by the coefficient α[0,1]. Eq. (52) generates a sequence that is white in the vertical direction if α=0(i.e. εk=Wk), but a sequence that is perfect correlated in vertical if α=1(i.e. εk=εk1). Eq. (52) is also often used to construct random field that is temporally coherent, for example, a continuous random noise that has coherence in time, as used for γiin the forecast model [17]. The random noise γiin the EnKF procedure can also be replaced by the random noise imposed in model forcing. For example, the random noise is continuously added to wind forcing for oceanic models. Even for some atmospheric models with transition processes, there are inherent random noises making γinot necessary. One important criteria for γiand the amplitude βis to examine ensemble spread by some sensitivity experiments.

#### 4.2.2. The computational cost of Kalman gain

The Kalman gain Khas dimension of L*m, where Lis the number of model variables and mis the number of observational variables. In many realistic problems, Land mare very large numbers (mN, the ensemble size), making the inversion very expensive.

A simple procedure is to rewrite the Kalman gain K, as below:

K=x˜x˜THT(Hx˜x˜THT+εεT)1,E53

where x˜indicates that the model ensemble predictions removed the ensemble mean (x˜i=[xibxb¯], for i=1,2,...,N). R=1NεεTwas invoked here. If we assume the ensemble prediction error (xbxtr¯xbxb¯=x˜) is not correlated to observation error, i.e. x˜εT=0, the following is valid [17]:

(Hx˜x˜THT+εεT)=(Hx˜+ε)(Hx˜+ε)T,E54

where (Hx˜+ε)has dimension m*N. Usually, ensemble size Nis much less than m. Using the singular-value decomposition (SVD) technique, we have

(Hx˜+ε)=UΣVTE55

Eq. (54) then becomes

(Hx˜x˜THT+εεT)=UΣVTVΣTU=UΣΣTUT=UΛUTE56

So,

(Hx˜x˜THT+εεT)1=UΛ1UTE57

where

and Λare the eigenvector and the square of eigenvalues of (Hx˜+ε). There are Nnon-zero eigenvalues for (Hx˜+ε), therefore the dimension is not large, allowing us to efficiently compute the inversion for a global analysis in most practical situations.

#### 4.2.3. Stochastic EnKF and deterministic EnKF

In EnKF introduced in the previous section, the observation assimilated into dynamical model should be treated to be stochastic variable, as expressed by yo+εiin Eq. (48). It is a must if the classic EnKF algorithm is used. It has been proven that if the EnKF assimilates deterministic observations (i.e., observation yonot changed at each ensemble member), the analysis error covariance will be systematically underestimated, typically leading to filter divergence, as indicated by below [11, 18]:

Pa*=(IKH)B(IKH)TE58

Eq. (58) gives the analysis error covariance if the observed is not perturbed. Comparing Eq. (58) with Eq. (29), a theoretically unbiased estimate, Pa*is always less than Pa.

However, the perturbed observation approach (i.e. yo+εi) introduces an additional source of sampling error that reduces analysis error covariance accuracy and increases the probability of understanding analysis error covariance [19, 20]. Thus, an approach that only uses a single observation realization but avoids systematical underestimation of analysis error covariance was pursued. There are several approaches to implement this goal, as summarized by Tippettet al. [20]. Below, we will introduce an approach developed by Whitaker and Hamill [19], called Ensemble squareroot filter (EnSRF).

Denote the deviation of analysis from the analysis mean by x˜a=xax¯a, it is easy to write

x˜a=x˜b+K˜[y˜oHx˜b]E59

where y˜o=yoy¯o. If a single observation realization is assimilated in all ensemble members, y˜o=0and

x˜a=x˜bK˜Hx˜b=(IK˜H)x˜b,E60
Pa*=(IK˜H)B(IK˜H)T.E61

We seek a definition for K˜that will result in an ensemble whose analysis error covariance equals to (IKH)B, i.e.

(IK˜H)B(IK˜H)T=(IKH)B.E62

The solution of Eq. (62) is

K˜=(1+RHBHT+R)1K.E63

Therefore, EnSRF is summarized as below (Table 4):

 x¯a=xb¯+K[yo−Hxb¯] x˜a=x˜b−K˜Hx˜b xa=x¯a+x˜a K=BHT(HBHT+R)−1,, [BHT]=1N−1∑​i=1N[xib−xb¯][h(xib)−h(xb)¯]T HBHT=1N−1∑​i=1N[h(xib)−h(xb)¯][h(xib)−h(xb)¯]T K˜=(1+RHBHT+R)−1K

### Table 4.

The analysis scheme of EnSRF.

It should be noted that there are two Kalman gains used in EnSRF, the original Kfor updating ensemble mean and a new K˜for updating the anomalies. It indicates that one single observation realization of classic EnKF has the same ensemble analysis mean as stochastic observations.

Initially, the term EnKF refers, in particular, to the stochastic ensemble Kalman filter that requires perturbing the observations. Subsequently, several deterministic EnKFs that avoid the use of perturbed observations were developed, e.g. the ETKF [21], the EAKF [22] and the EnSRF. These filter designs are labelled as variants of the EnKF because they are also based on the Kalman filtering formula and ensemble representations.

#### 4.2.4. Inflation approach

The forecast error covariance is defined by (44)

B=1N1i=1N(xibxb¯)(xibxb¯)T=1N1X˜*X˜T.E64

Eq. (64) is an approximation to Busing forecast ensemble. Due to limited computational source, the ensemble size Nis often restricted to a small value for many realistic issues. A small ensemble size may cause a very small ensemble spread, causing the approximation of Bby Eq. (64), which is seriously underestimated. To solve this problem, Bis multiplied by an inflator factor λ(slightly greater than 1). λis empirically determined, such as some sensitivity experiments, with the typical value of 1.01. λBis used to replace Bin EnKF formula. This approach is equivalent to the below approach:

xib=λ(xibxb¯)+xb¯E65

#### 4.2.5. Localization of EnKF

When EnKF is applied to high-dimensional atmospheric and oceanic models, the limited ensemble size will cause the estimated correlations to be noisy [11]. When the ensemble size is insufficient, it will produce spurious correlations between distant locations in the background covariance matrix B. Unless they are suppressed, these spurious correlations will cause observations from one location to affect the analysis in locations an arbitrarily large distance away, in an essentially random manner [23]. This needs to be remedied by the localization method.

Another reason for using localization is that the treatment of localization artificially reduces the spatial domain of influence of observations during the update. The localization dramatically reduces the necessary ensemble size, which is very important for operational systems. Two most common distance-based localization methods used in practice are local analysis and covariance localization.

Using local analysis, only measurements located within a certain distance from a grid point will impact the analysis in this grid point. This allows for an algorithm where the analysis is computed grid point by grid point. It was found that severe localization could lead to imbalance, but with large enough radius of influence (decorrelation length) for the measurements, this was not a problem. Hunt et al. use the local analysis method in their ETKF algorithm and developed a local ensemble transform Kalman filter (LETKF) [23].

To eliminate the small background error covariance associated with remote observations, Houtekamer and Mitchell uses a Schur (element-wise) product of a correlation function with local support and the covariance of the background error calculated from the ensemble [14]. That is, the matrix B in Eq. (48) is replaced by ρ

B, where “
” represents the element-wise product and the elements ρrelates to the distance rof the grid point to the observation ras below:

ρ(r)=(1+αr+α2r23)eαr.E66

Here, αis a scalar parameter. To the best of author’s knowledge, this is the first case that the covariance localization is used in EnKF.

Nowadays, a typical covariance localization approach is used to represent prior covariances using an element-wise product of ensemble covariance and a correlation function with compact support [24]. Anderson applied this approach to the Data Assimilation Research Testbed system [25], which has been used for realistic cases.

## 5. General form of ensemble-based filters for Gaussian models

In proceeding sections, we introduced Kalman-based filters. Originally Kalman filter applies linear model and linear measurement function. Further, EKF and EnKF were developed to address nonlinear models. However, the measurement functions are still assumed to be linear. Eqs. (46) and (47) can directly evaluate nonlinear measurement functions but they were proposed intuitionally and not proven yet. In this section, we will present a general form for nonlinear measurement function and further prove Eqs. (46) and (47) mathematically using the general form.

For generality, we assume the nonlinear model as Eqs. (34) and (35):

xt+1=f(xt)+ηt,E67
yt=h(xt)+ζt,E68

where fand hare nonlinear operators of model and measurement. xis model state and yis the observation. ηtand ζtare the model errors and observed errors, respectively, which have variance

. Assuming the estimate of model state xaat a time step is a linear combination of model forecast xband observation yo, i.e. the filter itself is linear, so

xa=xb+K[yoh(xb)]E69

Denoting x^a=xtxa, x^b=xtxb, y^=yoh(xb), we have

x^a=x^bKy^E70
Pa=E[x^a(x^a)T]=E[(x^bKy^)(x^bKy^)T]=E[x^b(x^b)Tx^by^TKTKy^(x^b)T+Ky^y^TKT=PbPx^y^KTKPy^x^+KPy^y^KTE71

The optimal estimate asks the trace of Paminimum, namely,

[trace(Pa)]K=Px^y^Py^x^+2KPy^y^=0,E72

where we invoked the below properties:

(trace[xAxT])x=x(A+AT)=2xA,E73
(trace[xAT])x=(trace[AxT])x=AT=A.,E74

Thus, we have the optimal estimate filter:

xta=xtb+K[yoh(xtb)]E75
K=Px^y^Py^y^1E76
Pa=PbKPx^y^, E77

Eqs. (75)–(77) give a general algorithm for Gaussian nonlinear model and nonlinear measurement function. The first term of Eq. (74) can be interpreted as the cross-covariance Px^y^between the state and observation errors, and the remaining expression can be interpreted as the error covariance Py^y^of the difference between model observation and observation itself. Here, y^is defined as the error between the noisy observation yoand its prediction h(xb).

If the model is linear, obviously,

xt+1b=Mxta+ηt,,E78
Bt+1=MPtaMT+Q.E79

If the measurement function is linear, i.e.

y^=yoh(xb)ζ=yoHxbζ=HxtrHxbζ=Hx^bζE80
Px^y^= <x^b,y^T> =PbHTE81
Py^y^= <y^,y^T> =HPbHT+RE82

So, Kalman gain

K=PbHT(HPbHT+R)1E83

Eq. (83) is identical to Eq. (28). Therefore, Eq. (28), or KF, EKF and EnKF, is a special case of Eq. (76) under the assumption of linear measurement function.

In the standard KF, the state error covariance is updated at each analysis cycle during the measurement update process. Updating the error covariance matrix is important because it represents the change in forecast error covariance when a measurement is performed. The EnKF implementation does not require the covariance update equation because it can directly calculate the updated error covariance matrix from a set of ensemble members. Evensen [17] has derived the analysis of covariance equation that is consistent with the standard KF error covariance to update Eq. (28). But the true representation of the updated error covariance requires a large ensemble size, which is often computationally infeasible.

The general form of the Kalman gain makes use of the reformulated error covariance. In a broad sense, the above algorithm implicitly uses the prior covariance update equation (or the analysis error covariance matrix) to calculate the forecast error covariance. Thus, the above algorithm is fully consistent with the time update and measurement update formulation of the Kalman filter algorithm. On this basis, one can develop a new type of Kalman filter that chooses the ensemble members deterministically in such a way that they can capture the statistical moments of the nonlinear model accurately. In the next subsection, we will discuss the new type of Kalman filter, called sigma-point Kalman filter, based on the above algorithm.

## 6. Sigma-point Kalman filters (SPKF)

### 6.1. Basics of SPKF

EnKF was developed in order to overcome the linearization of nonlinear models. As introduced earlier, the idea behind EnKF is to ‘integrate’ Fokker-Plank equation using ensemble technique to estimate the forecast error covariance. Theoretically, if the ensemble size is infinite, the estimate approaches the true value. However, in reality, we can only use finite ensemble size, even very small size for many problems, leading to truncation errors. Thus, some concerns exist such as how to wisely generate finite samples for the optimal estimate of prediction error covariance, how much the least ensemble size is for an efficient estimate of error covariance and how much the true error covariance can be taken into account in the EnKF, given an ensemble size. In this section, we will introduce a new ensemble technique for EnKF, which is called sigma-point Kalman filter (SPKF).

The so-called sigma-point approach is based on deterministic sampling of state distribution to calculate the approximate covariance matrices for the standard Kalman filter equations. The family of SPKF algorithms includes the unscented Kalman filter (UKF [26]), the central difference Kalman filter (CDKF [27]) and their square root versions [28]. Another interpretation of the sigma-point approach is that it implicitly performs a statistical linearization of the nonlinear model through a weighted statistical linear regression (WSLR) to calculate the covariance matrices [29]. In SPKF, the model linearization is done through a linear regression between a number of points (called sigma points) drawn from a prior distribution of a random variable rather than through a truncated Taylor series expansion at a single point. It has been found that this linearization is much more accurate than a truncated Taylor series linearization [28]. Eqs. (80)–(82) construct a core of SPKF. A central issue here is how to generate the optimal ensemble members for applying these equations. There are two basic approaches aforementioned, UKF and CDKF. For an L-dimensional dynamical system represented by a set of discretized state-space equations of (67), it has been proven that 2L+1ensemble members, constructed by UKF or CDKF, can precisely estimate the mean and covariance. We ignore the theoretical proof and only outline the UKF scheme as below.

Denote 2L+1sigma points at time kfor producing ensemble members by χk=[χk,0,χk,1+,...,χk,L+,χk,1,,χk,L], which that is defined according to the following expressions:

χk,0=X¯kaE84
χk,i+=X¯ka+[cPX,ka]iE85
χk,i+=X¯ka[cPX,ka]iE86

where L=Nx+Nη+Nζis the sum of the dimensions of model states, model noise and measurement noise. The augmented state vector X=[x;η;ζ]is a L-dimensional vector. PX,kais the covariance of the augmented state vector (analysis) at the previous step. [PX,ka]iis the ith row (column) of the weighted matrix square root of the covariance matrix (Ldimension). cis a scale parameter that will be specified later. The key point here is to produce (2L+1)ensemble members by integrating model with 2L+1initial conditions of Eqs. (84)–(86); by these ensemble members, the filter Eqs. (80)–(82) will be performed.

The procedure is summarized as below:

1. Initially, perturb a small amount, denoted by x˜0on initial condition x0, using Evensen method [17]; and also randomly generate perturbation for qand r, drawn from normal distributions of N(0,Q)and N(0,R). Thus, we can construct the augmented state vector and corresponding covariance (k=0)

X¯0a=[x0;0;0];E87
P0x=x˜0x˜0T;E88
PX,0=(P0x000Q000R).E89

2. From the above formula, we can calculate sigma points using Eqs. (84)–(86). Note that each set of sigma points, denoted by χk, has dimension L, e.g. the ith sigma point can be expressed by χk,i=[xk,i;ηk,i;ζk,i].

3. Using the 2L+1sigma points to integrate state-space model. For the ith sigma point, we have xk+1,if=f(xk,i,ηk,i). When ivaries from 1 to 2L+1, we produce 2L+1ensemble members, from which analysis mean and covariance will be obtained, which are in turn used to produce sigma points for next step (k+1), to form a recursive algorithm. Suppose we have 2L+1ensembles, the analysis mean and the covariance are calculated as follows:

x¯k+1f=i=02Lwi(m)xk+1,ifE90
(Pxxf)k+1=i=02Lwi(c)[xk+1,ifx¯k+1f][xk+1,ifx¯k+1f]TE91
yk+1,if=h(xk+1,if,ζk+1,i)E92
y¯k+1f=i=02Lwi(m)yk+1,ifE93
(Pyy)k+1=i=02Lwi(c)[yk+1,ify¯k+1f][yk+1,ify¯k+1f]TE94
(Pxy)k+1=i=02Lwi(c)[xk+1,ifx¯k+1f][yk+1,ify¯k+1f]TE95
Kk+1=PxyPyy1,E96
x¯k+1a=x¯k+1f+Kk+1[yk+1y¯k+1f]E97
Pk+1a=(Pxxf)k+1Kk+1PyyKk+1T,E98

where

c=L+λE99
w0(m)=λL+λE100
w0(c)=λL+λ+1α2+βE101
wi(m)=wi(c)=12(L+λ),i=1,2,...2LE102
λ=α2(L+κ)L,E103

αand κare tuning parameters. 0<α<1and κ0. Often κis chosen 0 as default value and β=2.

4. From Pk+1a, as well choosing random perturbation for model noise ηand observation noise ζ, drawn from Gaussian distribution of N(0,Q)and N(0,R), we calculate sigma points using Eqs. (84)–(86), and repeat Step 2 and Step 3 and so on until the assimilation is completed for the entire period.

### 6.2. Remarks of SPKF

SPKF was recently introduced into the earth sciences [15, 30]. The main differences between SPKF and EnKF include

1. SPKF chooses the ensemble members deterministically while EnKF uses random perturbation to generate ensemble members;

2. the number of sigma points is a fixed value as 2L+1, while the ensemble size in EnKF is pre-specified;

3. SPKF uses Eq. (98) to update the error covariance matrix, while EnKF does not update explicitly the error covariance matrix; and

4. Sigma points are calculated using Eqs. (84)–(86) every time when the observation is available, while the ensemble members in EnKF only perturbed in the initial time. Recent application of SPKF on a realistic oceanic model indicates that the SPKF is better than the EnKF in the similar level of computational cost [31].

In SPKF, the number of sigma points is 2L+1, here Lis the dimension of the augmented state vector X=[x;η;ζ], i.e. L=Nx+Nη+Nζis the sum of model state, model noise and observation noise. Usually, Lis the order 103–104, so the computational expense is a huge challenge in SPKF for realistic problems. A solution is to use the truncated singular-value decomposition (TSVD) to reduce the sigma points. As seen from Eqs. (84)–(86), the PX,kais a L*Lmatrix, thus the dimension of PX,kadetermines the ensemble size. Suppose that PX,kacan be expressed as

PX,ka=EX,kaΣk(EX,ka)TE104

where Σk=diag(σk1,σk2,...,σkL)is a diagonal matrix of eigenvalues that are sorted in descending order, i.e. σk1σk2...σkL, and EX,ka=[eX,k,1a,eX,k,2a,...,eX,k,La]. Truncating the first mmodes, so we can write the sigma points (84)–(86) as below:

χk,0=X¯kaE105
χk,i+=X¯ka+cσkieX,k,iaE106
χk,i=X¯kacσkieX,k,iaE107

i=1,2,...,m. Thus, the ensemble size becomes 2*m+1, where m<<L. Some fast SVD algorithms can be used here, such as Lanczos and block Lanczos [32]. The application of the truncated SVD was also found in [33, 34].

Further simplifying PX,kabased on its definition (or Cholesky decomposition), i.e. PX,ka=AX,ka*(AX,ka)T, where AX,kais the data that has subtracted the ensemble mean. Thus, Eqs.(82)–(84) can be written as follows:

χk,0=X¯kaE108
χk,i+=X¯ka+[cAX,ka]iE109
χk,i=X¯ka[cAX,ka]iE110

where [cAX,ka]i=[xka;ηk;ζk]i,i=1,2,...,L, (xka)i=(xkf)i+Kk[ykykf]. Eqs.(109) and (110) transfer the covariance matrix PX,kato data matrix AX,kain constructing sigma points. The largest advantage is to avoid explicit expression of PX,ka, which could be a very large matrix beyond memory of current computers. However, Eqs.(109) and (110) cannot reduce the ensemble size (2L+1). A solution is to decompose, such as principal component analysis, as used in [14]. Further discussions on optimal construction of sigma points should be conducted for a realistic application of SPKF.

Again, we look at sigma-point generation, i.e. Eqs. (106) and (107) or (109) and (110). As we defined, an augmented matrix is applied here [x;η;ζ]. Without losing the generality, rewrite them as below:

[xk,0ηk,0ζk,0]=[x¯k,000]E111
[xk,iηk,iζk,i]=[x¯k,000]+c[xk,iaηk,iζk,i]E112

Similarly, we can write Eq. (107) or (110) using individual variables. From Eqs. (111) and (112), we can draw

• Noise and model state analyses in constructing sigma points at kstep are independent. It should be noted that xkais from Eq. (97) and noise are draw from a Gaussian distribution. If we assume that noise is taken randomly each time, xkais only relevant to noise that is drawn at time step k, and independent with model noise and observation noise drawn for analysis of the time step k+1, thus, PX,kis a diagonal block matrix, i.e.

PX,k=(Pkx000Q000R)E113

• There are no update equations for noise, so they are randomly taken from Gaussian distribution, i.e. the index iin ηiand ζiactually does not have meaning. Thus, it should be a reasonable assumption that the ηiand ζi, used for constructing sigma points at time step k+1, are not related to PX,k(time step of k), as argued above. Thus, Eq. (108) always holds unless the noise is designed considering the temporal coherence such as red noise in time.

• Based on the above, the actual ensemble size is 2Nx+1, and not 2L+1. This is because neither model noise nor observation noise can produce ensemble alone. Model errors ηiand xk,ifmust be joined together to produce ensemble members with Nx. Let us see this in details: at the initial time, initial perturbation on model states plus drawn noise for model errors and measurement errors are with mean and variance as follows:

X¯0a=[x0;0;0],PX,0=(P0x000Q000R)E114

Theoretically, there are 2(Nx+Nη+Nζ)+1ensembles, denoted by the ith column of PX,0(i=1,...,Nx; Nx+1,...,Nx+Nη; Nx+Nη+1,...,Nx+Nη+Nζ) and formula (84)–(86). However, at the ith column, the elements of the row, indicating the model inputs (x,η,ζ), only have the non-zero values of Nx. Obviously, the sigma points of zero-values makes the update equation χk+1,i=f(χk,i)invalid, thus, the actual ensemble size is 2Nx+1.

When truncation technique is applied to reduce the ensemble size, the ensemble spread might be shrunk due to relatively small ensemble size. Like EnKF, an inflation approach of SPKF might be helpful. It is interested in developing such a scheme for SPKF. Also, we can localize SPKF, like localized EnKF, to solve memory and computation issues.

All of the remarks of SPKF are from the authors’ thinking and understanding. It is interesting to further test and validate these ideas and properties using simple models.

## 7. Beyond Kalman filters: particle filter and its derivatives

### 7.1. Standard particle filter

We have introduced the Kalman filter (KF), extended Kalman filter (EKF), ensemble Kalman filter (EnKF) and sigma-point Kalman filter (SPKF) in previous sections. All of those filters belong to the sequential data assimilation method, i.e. observation data is assimilated into the model system as soon as it is available. The Bayesian estimation theory provides a general framework of the sequential data assimilation methods. If we assume the state-space model is given by Eqs. (34) and (35), the analysis step of a Bayesian-based assimilation method is deduced by Bayes’ theorem:

p(xt|yt)=p(yt|xt)p(xt)p(yt),E115

where p(yt)plays as a normalization factor.

Recalling Section 2.3, Eq. (12) actually assumes that the prior probability density function p(xt)and the likelihood function p(yt|xt)are Gaussian distribution functions, and thus the posterior probability density function p(xt|yt)is also a Gaussian. Based on the Gaussian assumption, the cost function of 3D-Var (i.e. Eq. (6)) can be derived, and it is equivalent to the Kalman filter Eqs. (27)–(29). All the Kalman-based filters (e.g. EKF, EnKF, EnSRF, SPKF, etc.) contain the inherent Gaussian assumption, and they are derived and validated for Gaussian systems in theory. However, this Gaussian assumption is often not applicable for nonlinear systems. Even for an initial Gaussian error, it often becomes non-Gaussian while propagating forward with nonlinear models.

The particle filter (PF) is a sequential data assimilation method that is able to deal with the nonlinear and non-Gaussian state estimation problem. Like EnKF, PF also uses an ensemble, but it is used to approximately estimate the full probability density function rather than only the error covariance B. An ensemble member is also referred to as a particle in PF literatures. Suppose the prior probability density is the sum of Dirac delta functions

p(xt)=i=1Nδ(xtxti)E116

where {xti,i=1,2,...,N}are particles drawn from p(xt). The posterior probability density is derived by applying the Bayes’ theorem directly, that is

p(xt|yt)p(yt|xt)p(xt)=i=1Nwt,iδ(xtxti)E117

in which wt,ip(yt|xti), and a normalization step, is required to make {wt,i,i=1,2,...,N}sum up to 1. If we assume the likelihood function is Gaussian, wt,ican be computed by

p(yt|xti)=12πRexp{[yth(xti)]R1[yth(xti)]T}.E118

Or else we can use any specified probability density function of p(yt|xt)to compute the likelihood.

With the posterior probability density function p(xt|yt), the analysis value and covariance can be computed by

x¯t=x*p(x|yt)dx=i=1Nwt,ixtiE119
var(xt)=x2*p(x|yt)dxx¯t2=i=1Nwt,i(xti)2x¯t2E120

and higher-order moments of the posterior state can also be estimated.

Before stepping forward to next stage, a resampling step is required to make each particle with uniform weight. A typical resampling strategy is the sequential importance resampling (SIR) that removes particles with very small weights and duplicates those with large weights. A detailed algorithm of SIR can be found in [35]. The resampling algorithm gives the indices and number of copies of those particles that should be duplicated, i.e. computes s1,s2,,sNaccording to the weights, where each si1,2,,N. And then {xtsi,i=1,2,...,N}are regarded as new particles.

In summary, the algorithm of standard particle filter is given below:

1. generate the initial ensemble {x0i,i=1,2,...,N}as EnKF does;

2. integrate the model until the observation is available;

3. use Eq. (118) to compute the weight for each particle, and normalize them;

4. use Eq. (119) to obtain the analysis and Eq. (120) to obtain the covariance if necessary;

5. apply the resampling algorithm to derive the resampling indices, and derive the new ensemble {xtsi,i=1,2,...,N}; and

6. repeat Steps 2–5 until the end of assimilation period.

The standard particle filter [36] is also known as the bootstrap particle filter or SIR particle filter.

### 7.2. Variants of PF

The particle filter is a highly promising technique because it does not invoke any Gaussian assumptions. It has been widely used and studied in many other fields. The PF estimates the full probability density function of the forecasted state based on an ensemble of states with different weights. However, the PF suffers from the problem of filter degeneracy, i.e. the procedure collapses to a very small number of highly weighted particles among a horde of almost useless particles carrying a tiny proportion of the probability mass. Even if resampling techniques are used, the degeneracy cannot be completely avoided with limited ensemble size. The number of particles must grow substantially with the dimension of the system to avoid degeneracy [37, 38], a requirement that is apparently too costly for large models such as GCMs. Various efforts have been made to resolve this issue, as documented in an excellent overview [39].

Several strategies are often employed to address the problem of filter degeneracy in applications of the particle filter. For example, Papadakis et al. proposed a weighted ensemble Kalman filter (WEnKF) [40] that uses an ensemble-based Kalman filter as the proposal density from which the particles are drawn. Van Leeuwen et al. developed a fully nonlinear particle filter by exploiting the freedom of the proposal transition density, which ensures not only that all particles ultimately occupy high-probability regions of state-space but also that most of the particles have similar weights [41]. The implicit particle filter uses gradient descent minimization combined with random maps to find the region of high probability, avoiding the calculation of Hessians [42]. Luo et al. have proposed an efficient particle filter that uses residual nudging to prevent the residual norm of the state estimates from exceeding a pre-specified threshold [43]. These particle filters were very recently proposed and have attracted broad attention in the community of atmos./ocean. data assimilation. Below, we will briefly introduce the equivalent weights particle filter (EWPF) by Van Leeuwen [39, 41].

The equivalent weights particle filter is a fully nonlinear data assimilation method that works in a two-stage process. It uses the proposal density to ensure that the particles have almost equivalent weights, by which the filter degeneracy can be avoided.

In the standard PF, the particles at time step tare propagated by the original model, i.e. xt+1i=f(xti)+ηt, which implies that the particles at time step t+1are drawn from the transition density p(xt+1|xt). In that case, the weight of each xt+1ivaries greatly and filter degeneracy is very likely to happen.

In EWPF, another transition density, call the proposal density, is introduced. The proposal density depends on the future observation yt+1and all previous particles {xti,i=1,2,...,N}. With the help of proposal density, the particle xtiis propagated using a different model

xt+1i=g(xti,yt+1)+ηt.E121

The model gcan be anything, for instance, one can add a relaxation term and change random forcing:

xk+1i=f(xki)+ηki+A(yt+1H(xki)),k=1,...,p(k)E122

where p(k)is a function of the time between observations, and each kimplies each model step without observation. Ais a relaxation term that will ‘drag’ the particle towards future observation. In [44], it is given by

A=p(k)QHTR1,E123

where the matrices Qand Rcorrespond to the model error covariance and observation error covariance, respectively.

The second stage of EWPF involves updating each particle at the observation time t+1via the formula

xt+1i=f(xti)+αiQHT(HQHT+R)1(yt+1H(f(xti)))+ηtiE124

where αiare scalers computed so as to make the weights of the particles equal. Using the expression for weights and setting all weights equal to a target weight (e.g. 1/N)

wi=p(yt+1|xt+1i(αi))=wtargetE125

αican be solved by numerical methods.

Eqs. (122)–(125) show an example of how to construct the proposal model gin(121)), it can also be done by running 4D-var on each particle (implicit particle filter), or using the EnKF as proposal density. Those methods refer to Morzfeld et al. [42] and Papadakis et al. [40].

### 7.3. Remarks of PF

#### 7.3.1. Combined method of EnKF and PF

The ensemble Kalman particle filter (EnKPF) is a combination of the EnKF and the SIR particle filter. It was recently introduced to address non-Gaussian features in data assimilation for highly nonlinear systems, by providing a continuous interpolation between the EnKF and SIR-PF analysis schemes [45].

As stated above, both EnKF and PF methods are based on the Bayesian estimation theory, but they approximate the probability density function of the state in different ways. The EnKF only approximates the mean and covariance of the state through a series of equally weighted ensemble members. And the particle filter considers the weights of the ensemble members according to the likelihoods. The EnKF contains the Gaussian assumption but requires relatively small ensemble size to prevent filter degeneracy, which is in contrast with the PF.

The EnKPF takes advantage of both methods by combining the analysis schemes of the EnKF and the SIR-PF using a controllable index (i.e. tuning parameter). In contrast with both the EnKF and the SIR-PF, the analysis scheme of the EnKPF not only updates the ensemble members but also considers the weights.

Assume that the forecast ensemble {xif,i=1,2,,N}and the observation data yare available, and that the forecast covariance Pfcan be calculated using the ensemble, the analysis scheme of EnKPF is given below.

1. Choose γ[0,1]and apply the EnKF that is based on the inflated observation error covariance R/γas follows:

K1(γ)=PfHT(HPfHT+R/γ)1=γPfHT(γHPfHT+R)1E126
vi=xif+K1(γ)(yHxif)E127
Q=1γK1(γ)RK1(γ)TE128

2. Compute the weights wifor each updated member vias follows:

wi=ϕ(y;Hvi,R1γ+HQHT)E129

and normalize the weights by w^i=wi/i=1Nwi, in which ϕis the probability density function of a Gaussian.

3. Calculate the resampling index s(i)for each member viaccording to w^iusing the SIR algorithm, then set

xiu=vs(i)+K1(γ)ϵ1,iγE130

where

1, iis a random observation error drawn from the Gaussian N(0,R).

4. Compute K2(1γ)=(1γ)QHT[(1γ)HQHT+R]1, and generate

2, ifrom N(0,R)and EnKF with the inflated observation error again as follows:

xia=xiu+K2(1γ)[y+ϵ2,i1γHxiu]E131

γcan be determined recursively to match the optimal performance of EnKPF. More details of EnKPF can be found in [45, 46].

#### 7.3.2. Localization in PF

Previous sections have introduced the localization technique in EnKF, which greatly improves the performance of EnKF in high-dimensional models. The advantages of localization motivate the search for a localization procedure in particle filtering.

Van Leeuwen had a deep discussion on this topic [39]. He argued that one can calculate the weights locally, but it is not easy for resampling. In the resampling step low-weight particles are abandoned and high-weight particles are duplicated. However, with local weights, different particles are selected in different parts of the domain. The problem is that we have to have continuous (in space) model fields to propagate forward in time with the model. Just constructing a new particle that consists of one particle in one part of the model domain and another particle in another domain will lead to problems at the boundary between these two.

The problem of spatial discontinuity makes the localization in particle filter not feasible currently. Most of the advanced particle filters (e.g. EWPFand implicit particle filter) are using the idea of global weight, i.e. the weight for each member is a scalar.

However, there are still some attempts on the localization in particle filter. For example, Poterjoy developed the localized particle filter (LPF) that updates particles locally using ideas borrowed from EnKF [47]. The paper has demonstrated some advantages of the new filter over EnKF, especially when the observation networks consist of densely spaced measurements that relate nonlinearly to the model state. This is a very interesting work about the particle filter, it also has a potential to work with large atmos./ocean. data assimilation systems.

## 8. Remarks and conclusions

Data assimilation is the process by which observations of the actual system are incorporated into a numerical model to optimally estimate the system states. In this chapter, we introduced several ensemble-based data assimilation methods that are widely used in the earth sciences. One can read it as an introduction to ensemble-based data assimilation methods, but also can view it as a brief review of the application of these ensemble-based assimilation methods on the earth sciences. It is author’s effort to write such a ‘review’ chapter with introductory language, making it more readable. As found in the chapter, many discussions, derivations and analyses are actually very thoughtful, not only introducing these methods, but also deepening the understanding to them. This is emphasized by the analysis of the rationale behind each method, including: i). the principle for deriving the algorithm; ii) basic assumptions of each method; iii). the connection and relation of different methods (e.g., EKF and EnKF, EnKF and SPKF etc.); iv). the advantages and deficiencies of each method. Especially we put rather weights to discuss potential concerns, challenges and possible solutions when these methods are applied to high-dimensional systems in the earth sciences. This chapter can be a “textbook” for the beginners to learn these data assimilation algorithms, and also a good reference for researchers for better understanding and applying these methods.

## Acknowledgments

This work was supported by the NSERC (Natural Sciences and Engineering Research Council of Canada) Discovery Grant, the National Science Foundation of China (41276029, 41321004, 41530961,91528304), the National Programme on Global Change and Air-Sea Interaction (GASI-IPOVAI-06)and the National Basic Research Program (2013CB430302).

chapter PDF
Citations in RIS format
Citations in bibtex format

## More

© 2016 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## How to cite and reference

### Cite this chapter Copy to clipboard

Youmin Tang, Zheqi Shen and Yanqiu Gao (October 19th 2016). An Introduction to Ensemble-Based Data Assimilation Method in the Earth Sciences, Nonlinear Systems - Design, Analysis, Estimation and Control, Dongbin Lee, Tim Burg and Christos Volos, IntechOpen, DOI: 10.5772/64718. Available from:

### chapter statistics

2Crossref citations

### Related Content

Next chapter

#### Conditions for Optimality of Singular Controls in Dynamic Systems with Retarded Control

By Misir J. Mardanov and Telman K. Melikov

First chapter

#### Recent Advances in Fragment Molecular Orbital-Based Molecular Dynamics (FMO-MD) Simulations

By Yuto Komeiji, Yuji Mochizuki, Tatsuya Nakano and Hirotoshi Mori

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.