InTechOpen uses cookies to offer you the best online experience. By continuing to use our site, you agree to our Privacy Policy.

Mathematics » "Nonlinear Systems - Design, Analysis, Estimation and Control", book edited by Dongbin Lee, Tim Burg and Christos Volos, ISBN 978-953-51-2715-4, Print ISBN 978-953-51-2714-7, Published: October 19, 2016 under CC BY 3.0 license. © The Author(s).

# An Introduction to Ensemble-Based Data Assimilation Method in the Earth Sciences

By Youmin Tang, Zheqi Shen and Yanqiu Gao
DOI: 10.5772/64718

Article top

# An Introduction to Ensemble-Based Data Assimilation Method in the Earth Sciences

Youmin Tang1, 2, Zheqi Shen2 and Yanqiu Gao2
Show details

## Abstract

In this chapter, the ensemble-based data assimilation methods are introduced, including their developments, applications and existing concerns. These methods include both traditional methods such as Kalman filter and its derivatives and some advanced algorithms such as sigma-point Kalman filters and particle filters. Emphasis is placed on the challenges of applying these methods onto high-dimensional systems in the earth sciences.

Keywords: data assimilation, Kalman filter, EnOI, EnKF, particle filter

## 1. Introduction

In this chapter, we will talk about the modelling and simulation using both observed data and numerical models, that is, the observations will be incorporated into numerical models for optimal modelling and simulation. In statistics, this is called state-space estimation. In the earth science, it is called data assimilation. For example, a strict definition of data assimilation in atmospheric and oceanic sciences is the process to estimate the state of a dynamic system such as atmospheric and oceanic flow by combining the observational and model forecast data [1].

In general, assimilation methods can be classified into two categories: variational and sequential. This chapter is a tutorial on the sequential data assimilation methods such as ensemble Kalman filter (EnKF) and its variants. A brief introduction of the particle filter (PF) is also provided in this chapter.

This tutorial places emphasis on the rationale behind each method, including: (i) the principle for deriving the algorithm; (ii) the basic assumptions of each method; (iii) the connection and relation between different methods (e.g. extended Kalman filter (EKF) and EnKF, EnKF and sigma-point Kalman filters (SPKF), etc.); and (iv)the advantage and deficiency of each method.

This chapter has been written and organized through teaching for under-/graduatestudents in earth science courses. It can also be a good reference to researchers in the field of modelling and data assimilation.

## 2. The general framework of several assimilation approaches

Intuitionally, one might think that an optimal simulation scheme is to directly replace model variables by observations during numerical integrations. Such a direct replacement is usually not correct since observations are not perfect and contain errors. A simple replacement will introduce observation errors into models, and ignore possible impact of observation errors on model behaviours, easily resulting in imbalance of model dynamics and physics. Thus, the application of observations into numerical models must consider both model and observation errors that play a critical role in the assimilation process.

We will start to display the assimilation concept by a simple example. A detail introduction can be found in [2].

For an unknown true state value, denoted by Tt, there are two samples, denoted by T1(e.g. model simulation) and T2(observation), which have the errors 1 and 2, respectively. Thus, we have

 T1=Tt+ϵ1, (1)
 T2=Tt+ϵ2. (2)

We assume the measurement or observation is unbiased, and the variances of errors are known, i.e. E(1) = E (2) = 0, Var (1) = σ12, Var (2) = σ22. The question here is to seek an optimal estimate, denoted by Ta (called analysis in the assimilation field), for Tt using T1 and T2. This optimal estimate is the central issue of data assimilation.

There are several methods for this solution, as demonstrated below.

### 2.1. Least-squares method

Assume the analysis is a linear combination of both T1 and T2, that is, Ta=a1T1+a2T2. Due to the assumption that both T1 and T2 are unbiased, Ta should be unbiased, i.e. E(Ta)=E(Tt), so a1E(T1)+a2E(T2)=E(Tt), and then a1+a2=1. The best (optimal) estimate should minimize the variance of Ta as below:

 σa2=E[TaTt]2=E[a1T1+a2T2Tt]2=E[a1(T1Tt)+a2(T2Tt)]2=E(a12ϵ12+a22ϵ22+2a1a2ϵ1ϵ2)=a12σ12+(1a1)2σ22, (3)

here, we assumed that the errors of T1 and T2 are uncorrelated, i.e. E(12)=0. To minimize σa2, let σa2/a1=0, thus

 a1=σ22σ12+σ22 (4)

Namely,

 Ta=a1T1+(1−a1)T2=T1+σ22σ12+σ22(T2−T1). (5)

Using Eq. (5), the variance of Ta could be minimized.

### 2.2. Variational approach

In general, assimilation methods can be classified into two categories: variational and sequential. Variational methods such as three-dimensional variational (3D-Var) method and four-dimensional variational (4D-Var) method [3, 4] are batch methods, whereas sequential methods such as Kalman filter (KF) [5] belong to the estimation theory.

They both have had great success. The European Centre for Medium-Range Weather Forecasts (ECMWF) introduced the first 4D-Var method into the operational global analysis system in November 1997 [68]. The ensemble Kalman filter (EnKF) was first introduced into the operational ensemble prediction system by Canadian Meteorological Centre (CMC) in January 2005 [9].

This chapter is a tutorial of the ensemble-based sequential data assimilation methods, such as EnKF and its variants. However, we will briefly demonstrate the idea of variational assimilation by the above example.

First, a cost function should be defined for variational assimilation approach. For this simple example, we define the cost function as below:

 J(T)=12[(T−T1)2σ12+(T−T2)2σ22] (6)
 T=a1T1+a2T2. (7)

The solution is to seek an analysis Ta, determined by a1 and a2, leading to the cost function minimum, i.e. J(Ta)=min{J(T)}. Obviously, we have J(T)/a1=0 and J(T)/a2=0. Substitute with (6), it is

 ∂J(T)∂a1=T−T1σ12∂T∂a1+T−T2σ22∂T∂a1 (8)

Eq. (7) leads to Ta1=T1. Thus, the solution of (8), denoted by Ta, satisfies

 Ta=σ22σ12+σ22T1+σ12σ12+σ22T2. (9)

The above is a simple example of the 3D variational assimilation approach, where we only consider the analysis error (cost function) for a single time. However, in many cases, we need to consider the error growth during a period, i.e. the sum of errors during the period, in the cost function Eq. (6). For example, the cost function of 4D-Var is defined as below:

 J(T)=12∑t=1N[(T(t)−T1(t))2σ12+(T(t)−T2(t))2σ22]. (10)

Meanwhile T(tn) follows a dynamical model, saying T(tn)=t0tnF(T(t))dt=Mn(T(t0)), where F is a nonlinear dynamical model, Mn is the integral operator and t0 is the initial time. Thus, the cost function value of (10) is only determined by the initial condition. Namely, the objective here is to seek optimal initial condition T(t0) that enables (10) minimum, i.e. minimizing (10) subject to dynamical model F. This is a standard conditional extreme problem that can be solved by Lagrange multiplier approach. However, the complexity of dynamical model excludes the possibility to get the analytical solution. We have to solve the minimum problem with aid of numerical methods, e.g. Newton conjugate gradient method. All of numerical methods require the gradient value JT0 for solution.

Again, it is almost impossible for obtaining analytical solution of JT0 due to the complexity of F. Usually researchers get the gradient value numerically using an approach of tangent linear and adjoint models. The details on tangent linear and adjoint models can be found in relevant references as cited above. It should be noticed that it is very difficult, even intractable sometimes, to construct tangent linear and adjoint models in some cases. Thus, more and more researchers have started to apply sequential assimilation methods instead of 4D-Var in recent years. Next, we will introduce the concept of the sequential assimilation method using the above example.

### 2.3. Bayesian approach

Assume T1 and σ1 are the mean value and standard deviation of the model prediction that implies a prior probability distribution of truth T,

 p(T)=12πσ1e−(T1−T)22σ12 (11)

Obviously, this is a Gaussian distribution function, which can be denoted by N(T1, σ1) Given the observation T2 and its standard deviation σ2, the posterior distribution of the truth can be expressed by Bayes’ theorem:

 p(T|T2)=p(T2|T)p(T)p(T2)∝12πσ2e(T2T)22σ2212πσ1e(T1T)22σ12. (12)

p(T2) was ignored in (12) since it is independent of T, and usually plays as a normalization factor. The likelihood function p(T2|T) describes the probability that the observation becomes T2 given an estimation of T. It is commonly assumed to be Gaussian N(T,σ2). The object here is to estimate the truth by maximizing the posterior probability p(T|T2)(namely, we ask the truth to occur as much as possible—maximum probability). Maximizing p(T|T2) is equivalent to maximizing the logarithm of the right item of (12), i.e.

 log(p(T|T2))=log(12πσ2)−(T2−T)22σ22+log(12πσ1)−(T1−T)22σ12=const−12[(T−T2)2σ22+(T−T1)2σ12]. (13)

Obviously, the maximum of p(T|T2) occurs at the minimum of the second item on the right-hand side of (13), i.e. the minimum of the cost function J of (6). Thus, under the assumption of Gaussian distribution, maximizing a posterior probability (Bayesian approach) is equivalent to minimizing cost function (variational assimilation approach). Further, if the model F is linear and the probability distribution is Gaussian, it can be further proved that the Kalman filter is equivalent to 4D-Var adjoint assimilation method.

## 3. Optimal interpolation (OI) and Kalman filter (KF)

### 3.1. Optimal interpolation

The most special case in data assimilation is that the forecast model is linear and the errors are Gaussian. The solution among sequential methods to this case is provided by Kalman filter. Typically, the Kalman filter applies to the below state-space model:

 xt+1=Mxt+ηt, (14)
 yt=Hxt+ζt, (15)

where M and H are linear operators of model and measurement, respectively. x is model state and y is the observation, and the subscript implies the time step. ηt and ζt are the model errors and observational errors, respectively, which have variance: . The objective here is to estimate model state x using y, making it close to true state (unknown) as much as possible.

Assuming the estimate of model state xa at a time step is a linear combination of model forecast xb and observation yo, i.e. the filter itself is linear, so

 xa=xb+K[yo−Hxb]. (16)

Eq. (16) is the standard expression of Kalman filter. K is called Kalman gain that determines the optimal estimate and yoHxb is called the innovation. An analysis step is essentially to determine the increment to the forecast by combining the Kalman gain and the innovation. Before deriving K, we denote the covariance matrix of the analysis error a by Pa, i.e. Pa = < a ,(a)T >, where a=xaxtr and xtr is the true value of model state. Similarly, observed errors and forecast errors are defined by o=yoHxtr and b=xbxtr, respectively. It should be noticed that the forecast error b is different from the model error ζt that is a systematic bias. Also, we denote B = < b, (b )T > as the background (forecast) error covariance and R = < o, (o )T > as the observational error covariance. It is also assumed that the observation error is not related to forecast error, so < b, (o )T > = < o, (b )T > = 0.

Clearly, we are seeking for K that can lead to Pa minimum. Subtracting xtr on both sides of Eq. (16) leads to the below equation:

 xa−xtr=xb−xtr+K[yo−Hxb+Hxtr−Hxtr]. (17)

Namely,

 ϵa=ϵb+K(ϵo−Hϵb), (18)

And

 Pa=E[ϵb+K(ϵo−Hϵb)][ϵb+K(ϵo−Hϵb)]T=E[ϵb(ϵb)T+ϵb(ϵo−Hϵb)TKT+K(ϵo−Hϵb)(ϵb)T+K(ϵo−Hϵb)(ϵo−Hϵb)TKT]=B−BHTKT−KHB+K(R+HBHT)KT. (19)

Here, we used B=BT. The optimal estimate asks the trace of Pa minimum, namely, [trace(Pa)]/K=0. It can be computed that

 K=BHT(HBHT+R)−1. (20)

Substitute into (13)

 Pa=B−BHTKT−KHB+BHT(HBHT+R)(−1)(R+HBHT)KT=(I−KH)B. (21)

Here, we invoked the below properties:

 ∂Ax∂xT=∂xTA∂x=A (22)
 ∂xTAx∂x=xT(A+AT) (23)
 ∂ATx∂x=∂xTAT∂xT=AT (24)
 ∂(trace[xAxT])∂x=2xA (25)
 ∂(trace[xA])∂x=∂(trace[AxT])∂x=A (26)

Thus, we have the optimal estimate filter:

 xa=xb+K[yo−Hxb], (27)
 K=BHT(HBHT+R)−1, (28)
 Pa=(I−KH)B. (29)

In the estimate (27)–(29), if the background error covariance B is prescribed, the estimate is called optimal interpolation. The OI does not involve state equation (14) and B is unchanged during the entire assimilation process.

### 3.2. Kalman filter

Now, we consider that B in (28) changes with the assimilation cycle. This is more realistic since the model prediction errors should be expected to decrease with the assimilation.

From Eq. (14), we have

 xt+1tr=Mxttr+ηt, (30)
 xt+1b=E(Mxta+ηt)=Mxta (31)

Eq. (30) indicates that even the true value is input at a time step, model cannot get a true value for next step due to model bias ηt. Eq. (31) shows a standard procedure for the model prediction of next step starting from the analysis of previous step.

Subtracting (30) from (31) produces

 ϵt+1b=Mϵta−ηt, (32)
 Bt+1=E(ϵt+1b(ϵt+1b)T)=E[(Mϵta+ηt)(Mϵta+ηt)T]=MPtaMT+Q (33)

where Pta = < ta , (ta)T > represents the analysis error covariance for time step t. The above equation considers the evolution of the background (prediction) error covariance with the time controlled by the dynamical model operator M. The above equations constitute the framework of Kalman filter (Table 1), namely

 Analysis step xta=xtb+K[yo−Hxtb],K=BtHT(HBt HT+R)−1,Pta=(I−KH)Bt, Prediction step xt+1b=Mxta,Bt+1=MPtaMT+Q

### Table 1.

The Kalman filter.

One Kalman filter cycle consists of two parts, namely, one analysis step (Eqs. (27)–(29)) and one prediction step (Eqs. (31) and (33)). The analysis state xta and covariance Pta are treated as initial conditions for the prediction step, until the next observation is available. Sometimes, Bt is denoted by Ptf in Kalman filter literatures.

### 3.3. Extended Kalman filter (EKF)

In deriving the Kalman filter, we assume the state model M and measurement model H are both linear. Further, we also assume the error has Gaussian distribution. Therefore, classic KF only works for linear models and Gaussian distribution. If the dynamical model and measurement model are not linear, we cannot directly apply KF. Instead, linearization must be performed prior to apply KF. The linearized version of KF is called extended KF (EKF), which solves the below state-space estimate problem:

 xt+1=f(xt)+ηt, (34)
 yt=h(xt)+ζt, (35)

where f and h are nonlinear models, and ηt and ζt are additive noises.

The filter is still assumed to be ‘linear’, i.e.

 xa=xb+K[yo−h(xb)] (36)

Actually, it is not a linear combination of the forecast xb and observation yo if his not linear. However, we just extend the formulation of Eq. (16), and apply it intuitively in nonlinear cases. Ignoring high-order terms, the following holds approximately

 h(x+δx)=h(x)+∂h∂xδx=h(x)+Hδx (37)

where H is the linearization of h and Hi,j=hixj. So,

 yo−h(xb)=yo−h(xtr+xb−xtr)=yo−h(xtr)−H(xb−xtr)=ϵo−Hϵb (38)
 xa=xb+K(ϵo−Hϵb) (39)

Eq. (39) is identical to Eq. (16). Similarly, subtracting xtr on both sides of Eq. (47) leads to the below equation:

 ϵa=ϵb+K(ϵo−Hϵb) (40)

which is the same as Eq. (18). Following the same derivation as that for Eq. (18), we can obtain the equations similar to (27)–(29). Therefore, if the measurement model h is nonlinear, the KF can be still applied with a linearization of h.

Similar to Eqs. (30) and (31), the state model is as below:

 xt+1tr=f(xttr)+ηt (41)
 xt+1f=E(f(xta)+ηt)=f(xta). (42)

Subtracting Eq. (41) from Eq. (42) produces

 ϵt+1f=f(xta)−f(xttr)−ηt=f(xta)−f(xttr−xta+xta)−ηt=f(xta)−f(xta−ϵta)−ηt=Mϵta−ηt (43)

where Mi,j=fixj.

Comparing Eq. (31) with Eq. (33), it reveals that Eq. (33) still works. Thus, the EKF can be summarized as below (Table 2).

The procedure to perform EKF is similar to that for KF, as listed above. The disparities and similarities between EKF and KF include

1. Kalman gain K has the same form for both, especially the linear or linearized measurement model should be used;

2. the update equation of model error covariance has the same form, with linear and linearized state model used;

3. forecast model is different, with KF using linear Eq. (14) and EKF using nonlinear model Eq. (34); and

4. the filtering algorithm is different, linear measurement model H used in KF and nonlinear model h in EKF.

It should be noticed that EKF is only an approximate of KF for nonlinear state model.

 Analysis step xta=xtb+K[yo−h(xtb)],K=BtHT(HBtHT+R)−1,Pta=(I−KH)Bt,Hi,j=∂hi∂xj. Prediction step xt+1f=f(xta),Bt+1=MPtaMT+QMi,j=∂fi∂xj,

### Table 2.

The extended Kalman filter.

## 4. Ensemble Kalman filter (EnKF)

### 4.1. Basics of EnKF

A challenge in EKF is to update background (prediction) error covariance, which requires the linearization of nonlinear model. The linearization of nonlinear model is often difficult technically, and even intractable in some cases, e.g. non-continuous functions existing in models. Another drawback of EKF is to neglect the contributions from higher-order statistical moments in calculating the error covariance.

To avoid the linearization of nonlinear model, the ensemble Kalman filter (EnKF) was introduced by Evensen [10, 11], in which the prediction error covariance B of Eq. (33) are estimated approximately using an ensemble of model forecasts. The main concept behind the formulation of the EnKF is that if the dynamical model is expressed as a stochastic differential equation, the prediction error statistics, which are described by the Fokker-Flank equation, can be estimated using ensemble integrations ( [10, 12]; thus, the error covariance matrix B can be calculated by integrating the ensemble of model states. The EnKF can overcome the EKF drawback that neglects the contributions from higher-order statistical moments in calculating the error covariance. The major strengths of the EnKF include the following:

1. there is no need to calculate the tangent linear model or Jacobian of nonlinear models, which is extremely difficult for ocean (or atmosphere) general circulation models (GCMs);

2. the covariance matrix is propagated in time via fully nonlinear model equations (no linear approximation as in the EKF); and

3. it is well suited to modern parallel computers (cluster computing) [13].

EnKF has attracted a broad attention and been widely used in atmospheric and oceanic data assimilation.

Simply saying, EnKF avoids the computation and evolution of the error covariance B as in Eq. (33), and computes B using below formula as soon as it is required.

 B=1N−1∑i=1N(xib−xb¯)(xib−xb¯)T (44)

where xib represents the i-th member of the forecast ensemble of system state vector at step t, and N is the ensemble size. The use of Eq. (44) avoids processing M, the linearized operator of nonlinear model. However, the measurement function H is still linear or linearized while computing the Kalman gain K, which causes concern. To avoid the linearization of nonlinear measurement function, Houtekamer and Mitchell [14] wrote Kalman gain by

 K=BHT(HBHT+R)−1, (45)
 BHT≡1N−1∑i=1N[xib−xb¯][h(xib)−h(xb)¯]T, (46)
 HBHT≡1N−1∑i=1N[h(xib)−h(xb)¯][h(xib)−h(xb)¯]T, (47)

where h(xb)¯=1Ni=1Nh(xib). Eqs. (46) and (47) allow direct evaluation of the nonlinear measurement function h in calculating Kalman gain. However, Eqs. (46) and (47) have not been proven mathematically, and only were given intuitionally. Tang and Ambadan argued that Eqs. (46) and (47) approximately hold if and only if h(xb)¯=h(xb¯) and xibxb¯ is small for i=1,2,...,N [15]. Under these conditions, Tang et al. argued Eqs. (46) and (47) actually linearize the nonlinear measurement functions h to H [16]. Therefore, direct application of the nonlinear measurement function in Eqs. (46) and (47), in fact, imposes an implicit linearization process using ensemble members. In next section, we will see that Eqs. (46) and (47) can be modified under a rigorous framework.

Thus, the procedures of EnKF are summarized as below (Table 3):

1. Imposing perturbations on initial conditions and integrate the model, i.e. xi,1=f(x0+γi), where i=1,2...,N(ensemble size) and x0 is the initial condition.

2. Using K=BHT(HBHT+R)1 and Eqs. (46) and (47) to calculate Kalman gain K.

3. Calculating analysis using

 xia=xib+K[yo+εi−h(xib)], (48)

after K is obtained. It should be noted that each ensemble member produces an analysis; the average of all (N) analyses can be obtained.

4. Using xi,t+1b=f(xia) to obtain new ensemble members for next round analysis.

5. Repeating Steps 2–4 until the end of assimilation period.

 Analysis step xia=xib+K[yo+εi−h(xib)],i=1,…,NK=BHT(HBHT+R)−1,BHT=1N−1∑i=1N[xib−xb¯][h(xib)−h(xb)¯]T,HBHT=1N−1∑i=1N[h(xib)−h(xb)¯][h(xib)−h(xb)¯]T Prediction step xi,t+1b=f(xia+γi), i=1,…,N

### Table 3.

The ensemble Kalman filter.

It should be noted that the observation should be treated as a random variable with the mean equal to yo and covariance equal to R. This is why there is εi in Eq. (48). Simply, εi is often drawn from a normal distribution εiN(0,R).

From the above procedure, we find that Eq. (44) is not directly applied here. Instead, we use Eqs. (46) and (47) to calculate K. This is because Eqs. (46) and (47) avoid the linearization of nonlinear model and also avoid the explicit expression of matrix B, which is often very large and cannot be written in current computer sources in many realistic problems. The measurement function, h, projecting model space (dimension) to observation space (dimension), greatly reduces the number of dimension.

### 4.2. Some remarks on EnKF with large dimensional problems

#### 4.2.1. Initial perturbation

The success of EnKF highly depends on the quality of ensemble members produced by initial perturbations. It is impractical to represent all possible types of errors within the ensemble because of the computational cost, the method of generating initial perturbations must be chosen judiciously.

The first issue is the amplitude of initial perturbations. Usually, the following two factors are considered when selecting the amplitude of initial perturbations: the amplitude of observation error and the amplitude of model errors induced by model parameters and uncertainty in model physics. If a model is perfect, the amplitude of the perturbations should be the same as the amplitude of observation errors. This combined error is used to determine the amplitude of perturbations.

When the perturbation amplitude is determined, the practical initial perturbation field generating each ensemble member could be constructed by a normalized pseudorandom field multiplied by the prescribed amplitude. Considering the spatial coherence, the pseudorandom field is red noise as proposed by Evensen [17], summarized as below:

1. Calculate the statistical characteristics for the pseudorandom field to meet its variance of 1 and mean of 0 by solving the following nonlinear equation:

 e−1=∑l,pe−2(kl2+rp2)/σ2cos(klrh)∑l,pe−2(kl2+rp2)/σ2, (49)

where kl=2πlxn=2πlNxΔx, rp=2πpym=2πpNyΔy, and Nx and Ny are the number of grid points in the x-axis (lon.) and the y-axis (lat.). The l and p are wavenumbers, changing from 1 to the maximum value of N/2 and M/2. Δx and Δy are the intervals of two adjacent points, often set to 1, and rh is the decorrelation length. The purpose of Eq. (49) is to derive σ2 for the other feature:

 c2=1Δk∑l,pe−2(kl2+rp2)/σ2 (50)

2. After c and σ2 are obtained, we can construct a two-dimensional pseudorandom field:

 W(xn,ym)=∑l,pcΔke−(kl2+rp2)σ2e2πiφ(l,p)ei(klxn+rpym)Δk. (51)

3. While xn, ym cover the whole domain, Eq. (51) produces a Nx*Ny two-dimensional random field with spatial coherence structure and the variance of 1 and mean of 0. If the realistic uncertainty (error) has an amplitude β, the perturbation should be βW. Similarly, Eq. (51) is often used for the error perturbation γi used in the fourth step of the EnKF procedure.

Sometimes, we need to consider the vertical coherence of pseudorandom fields between adjacent levels in oceanic models. A simple method for this purpose is to construct the pseudorandom field at the kth level εk by following equation:

 εk=αεk−1+1−α2Wk, (52)

where Wk(k=1,...,Nz) is the pseudorandom field at the kth level without considering vertical coherence, constructed using the above method. Initially, for the surface perturbation (k=1), the vertical coherence is not considered, i.e. equals to zero since εk1 does not exist. Eq. (52) indicates that a practical pseudorandom at the kth level (εk) is composed of Wk and εk1. As such the εk is correlated with εk1, i.e. the practical pseudorandom fields of two adjacent levels (εk1 and εk) are coherent with each other. Their correlation or coherent structure is determined by the coefficient α[0,1]. Eq. (52) generates a sequence that is white in the vertical direction if α=0(i.e. εk=Wk), but a sequence that is perfect correlated in vertical if α=1(i.e. εk=εk1). Eq. (52) is also often used to construct random field that is temporally coherent, for example, a continuous random noise that has coherence in time, as used for γi in the forecast model [17]. The random noise γi in the EnKF procedure can also be replaced by the random noise imposed in model forcing. For example, the random noise is continuously added to wind forcing for oceanic models. Even for some atmospheric models with transition processes, there are inherent random noises making γi not necessary. One important criteria for γi and the amplitude β is to examine ensemble spread by some sensitivity experiments.

#### 4.2.2. The computational cost of Kalman gain

The Kalman gain K has dimension of L*m, where L is the number of model variables and m is the number of observational variables. In many realistic problems, L and m are very large numbers (mN, the ensemble size), making the inversion very expensive.

A simple procedure is to rewrite the Kalman gain K, as below:

 K=x˜x˜THT(Hx˜x˜THT+εεT)−1, (53)

where x˜ indicates that the model ensemble predictions removed the ensemble mean (x˜i=[xibxb¯], for i=1,2,...,N). R=1NεεT was invoked here. If we assume the ensemble prediction error (xbxtr¯xbxb¯=x˜) is not correlated to observation error, i.e. x˜εT=0, the following is valid [17]:

 (Hx˜x˜THT+εεT)=(Hx˜+ε)(Hx˜+ε)T, (54)

where (Hx˜+ε) has dimension m*N. Usually, ensemble size N is much less than m. Using the singular-value decomposition (SVD) technique, we have

 (Hx˜+ε)=UΣVT (55)

Eq. (54) then becomes

 (Hx˜x˜THT+εεT)=UΣVTVΣTU=UΣΣTUT=UΛUT (56)

So,

 (Hx˜x˜THT+εεT)−1=UΛ−1UT (57)

where and Λ are the eigenvector and the square of eigenvalues of (Hx˜+ε). There are N non-zero eigenvalues for (Hx˜+ε), therefore the dimension is not large, allowing us to efficiently compute the inversion for a global analysis in most practical situations.

#### 4.2.3. Stochastic EnKF and deterministic EnKF

In EnKF introduced in the previous section, the observation assimilated into dynamical model should be treated to be stochastic variable, as expressed by yo+εi in Eq. (48). It is a must if the classic EnKF algorithm is used. It has been proven that if the EnKF assimilates deterministic observations (i.e., observation yo not changed at each ensemble member), the analysis error covariance will be systematically underestimated, typically leading to filter divergence, as indicated by below [11, 18]:

 Pa*=(I−KH)B(I−KH)T (58)

Eq. (58) gives the analysis error covariance if the observed is not perturbed. Comparing Eq. (58) with Eq. (29), a theoretically unbiased estimate, Pa* is always less than Pa.

However, the perturbed observation approach (i.e. yo+εi) introduces an additional source of sampling error that reduces analysis error covariance accuracy and increases the probability of understanding analysis error covariance [19, 20]. Thus, an approach that only uses a single observation realization but avoids systematical underestimation of analysis error covariance was pursued. There are several approaches to implement this goal, as summarized by Tippettet al. [20]. Below, we will introduce an approach developed by Whitaker and Hamill [19], called Ensemble squareroot filter (EnSRF).

Denote the deviation of analysis from the analysis mean by x˜a=xax¯a, it is easy to write

 x˜a=x˜b+K˜[y˜o−Hx˜b] (59)

where y˜o=yoy¯o. If a single observation realization is assimilated in all ensemble members, y˜o=0 and

 x˜a=x˜b−K˜Hx˜b=(I−K˜H)x˜b, (60)
 Pa*=(I−K˜H)B(I−K˜H)T. (61)

We seek a definition for K˜ that will result in an ensemble whose analysis error covariance equals to (IKH)B, i.e.

 (I−K˜H)B(I−K˜H)T=(I−KH)B. (62)

The solution of Eq. (62) is

 K˜=(1+RHBHT+R)−1K. (63)

Therefore, EnSRF is summarized as below (Table 4):

 x¯a=xb¯+K[yo−Hxb¯] x˜a=x˜b−K˜Hx˜b xa=x¯a+x˜a K=BHT(HBHT+R)−1,, [BHT]=1N−1∑​i=1N[xib−xb¯][h(xib)−h(xb)¯]T HBHT=1N−1∑​i=1N[h(xib)−h(xb)¯][h(xib)−h(xb)¯]T K˜=(1+RHBHT+R)−1K

### Table 4.

The analysis scheme of EnSRF.

It should be noted that there are two Kalman gains used in EnSRF, the original K for updating ensemble mean and a new K˜ for updating the anomalies. It indicates that one single observation realization of classic EnKF has the same ensemble analysis mean as stochastic observations.

Initially, the term EnKF refers, in particular, to the stochastic ensemble Kalman filter that requires perturbing the observations. Subsequently, several deterministic EnKFs that avoid the use of perturbed observations were developed, e.g. the ETKF [21], the EAKF [22] and the EnSRF. These filter designs are labelled as variants of the EnKF because they are also based on the Kalman filtering formula and ensemble representations.

#### 4.2.4. Inflation approach

The forecast error covariance is defined by (44)

 B=1N−1∑i=1N(xib−xb¯)(xib−xb¯)T=1N−1X˜*X˜T. (64)

Eq. (64) is an approximation to B using forecast ensemble. Due to limited computational source, the ensemble size N is often restricted to a small value for many realistic issues. A small ensemble size may cause a very small ensemble spread, causing the approximation of B by Eq. (64), which is seriously underestimated. To solve this problem, B is multiplied by an inflator factor λ (slightly greater than 1). λ is empirically determined, such as some sensitivity experiments, with the typical value of 1.01. λB is used to replace B in EnKF formula. This approach is equivalent to the below approach:

 xib=λ(xib−xb¯)+xb¯ (65)

#### 4.2.5. Localization of EnKF

When EnKF is applied to high-dimensional atmospheric and oceanic models, the limited ensemble size will cause the estimated correlations to be noisy [11]. When the ensemble size is insufficient, it will produce spurious correlations between distant locations in the background covariance matrix B. Unless they are suppressed, these spurious correlations will cause observations from one location to affect the analysis in locations an arbitrarily large distance away, in an essentially random manner [23]. This needs to be remedied by the localization method.

Another reason for using localization is that the treatment of localization artificially reduces the spatial domain of influence of observations during the update. The localization dramatically reduces the necessary ensemble size, which is very important for operational systems. Two most common distance-based localization methods used in practice are local analysis and covariance localization.

Using local analysis, only measurements located within a certain distance from a grid point will impact the analysis in this grid point. This allows for an algorithm where the analysis is computed grid point by grid point. It was found that severe localization could lead to imbalance, but with large enough radius of influence (decorrelation length) for the measurements, this was not a problem. Hunt et al. use the local analysis method in their ETKF algorithm and developed a local ensemble transform Kalman filter (LETKF) [23].

To eliminate the small background error covariance associated with remote observations, Houtekamer and Mitchell uses a Schur (element-wise) product of a correlation function with local support and the covariance of the background error calculated from the ensemble [14]. That is, the matrix B in Eq. (48) is replaced by ρ B, where “” represents the element-wise product and the elements ρ relates to the distance r of the grid point to the observation r as below:

 ρ(r)=(1+αr+α2r23)e−αr. (66)

Here, α is a scalar parameter. To the best of author’s knowledge, this is the first case that the covariance localization is used in EnKF.

Nowadays, a typical covariance localization approach is used to represent prior covariances using an element-wise product of ensemble covariance and a correlation function with compact support [24]. Anderson applied this approach to the Data Assimilation Research Testbed system [25], which has been used for realistic cases.

## 5. General form of ensemble-based filters for Gaussian models

In proceeding sections, we introduced Kalman-based filters. Originally Kalman filter applies linear model and linear measurement function. Further, EKF and EnKF were developed to address nonlinear models. However, the measurement functions are still assumed to be linear. Eqs. (46) and (47) can directly evaluate nonlinear measurement functions but they were proposed intuitionally and not proven yet. In this section, we will present a general form for nonlinear measurement function and further prove Eqs. (46) and (47) mathematically using the general form.

For generality, we assume the nonlinear model as Eqs. (34) and (35):

 xt+1=f(xt)+ηt, (67)
 yt=h(xt)+ζt, (68)

where f and h are nonlinear operators of model and measurement. x is model state and y is the observation. ηt and ζt are the model errors and observed errors, respectively, which have variance . Assuming the estimate of model state xa at a time step is a linear combination of model forecast xb and observation yo, i.e. the filter itself is linear, so

 xa=xb+K[yo−h(xb)] (69)

Denoting x^a=xtxa, x^b=xtxb, y^=yoh(xb), we have

 x^a=x^b−Ky^ (70)
 Pa=E[x^a(x^a)T]=E[(x^b−Ky^)(x^b−Ky^)T]=E[x^b(x^b)T−x^by^TKT−Ky^(x^b)T+Ky^y^TKT=Pb−Px^y^KT−KPy^x^+KPy^y^KT (71)

The optimal estimate asks the trace of Pa minimum, namely,

 ∂[trace(Pa)]∂K=−Px^y^−Py^x^+2KPy^y^=0, (72)

where we invoked the below properties:

 ∂(trace[xAxT])∂x=x(A+AT)=2xA, (73)
 ∂(trace[xAT])∂x=∂(trace[AxT])∂x=AT=A., (74)

Thus, we have the optimal estimate filter:

 xta=xtb+K[yo−h(xtb)] (75)
 K=Px^y^Py^y^−1 (76)
 Pa=Pb−KPx^y^, (77)

Eqs. (75)–(77) give a general algorithm for Gaussian nonlinear model and nonlinear measurement function. The first term of Eq. (74) can be interpreted as the cross-covariance Px^y^ between the state and observation errors, and the remaining expression can be interpreted as the error covariance Py^y^ of the difference between model observation and observation itself. Here, y^ is defined as the error between the noisy observation yo and its prediction h(xb).

If the model is linear, obviously,

 xt+1b=Mxta+ηt,, (78)
 Bt+1=MPtaMT+Q. (79)

If the measurement function is linear, i.e.

 y^=yo−h(xb)−ζ=yo−Hxb−ζ=Hxtr−Hxb−ζ=Hx^b−ζ (80)
 Px^y^=  =PbHT (81)
 Py^y^=  =HPbHT+R (82)

So, Kalman gain

 K=PbHT(HPbHT+R)−1 (83)

Eq. (83) is identical to Eq. (28). Therefore, Eq. (28), or KF, EKF and EnKF, is a special case of Eq. (76) under the assumption of linear measurement function.

In the standard KF, the state error covariance is updated at each analysis cycle during the measurement update process. Updating the error covariance matrix is important because it represents the change in forecast error covariance when a measurement is performed. The EnKF implementation does not require the covariance update equation because it can directly calculate the updated error covariance matrix from a set of ensemble members. Evensen [17] has derived the analysis of covariance equation that is consistent with the standard KF error covariance to update Eq. (28). But the true representation of the updated error covariance requires a large ensemble size, which is often computationally infeasible.

The general form of the Kalman gain makes use of the reformulated error covariance. In a broad sense, the above algorithm implicitly uses the prior covariance update equation (or the analysis error covariance matrix) to calculate the forecast error covariance. Thus, the above algorithm is fully consistent with the time update and measurement update formulation of the Kalman filter algorithm. On this basis, one can develop a new type of Kalman filter that chooses the ensemble members deterministically in such a way that they can capture the statistical moments of the nonlinear model accurately. In the next subsection, we will discuss the new type of Kalman filter, called sigma-point Kalman filter, based on the above algorithm.

## 6. Sigma-point Kalman filters (SPKF)

### 6.1. Basics of SPKF

EnKF was developed in order to overcome the linearization of nonlinear models. As introduced earlier, the idea behind EnKF is to ‘integrate’ Fokker-Plank equation using ensemble technique to estimate the forecast error covariance. Theoretically, if the ensemble size is infinite, the estimate approaches the true value. However, in reality, we can only use finite ensemble size, even very small size for many problems, leading to truncation errors. Thus, some concerns exist such as how to wisely generate finite samples for the optimal estimate of prediction error covariance, how much the least ensemble size is for an efficient estimate of error covariance and how much the true error covariance can be taken into account in the EnKF, given an ensemble size. In this section, we will introduce a new ensemble technique for EnKF, which is called sigma-point Kalman filter (SPKF).

The so-called sigma-point approach is based on deterministic sampling of state distribution to calculate the approximate covariance matrices for the standard Kalman filter equations. The family of SPKF algorithms includes the unscented Kalman filter (UKF [26]), the central difference Kalman filter (CDKF [27]) and their square root versions [28]. Another interpretation of the sigma-point approach is that it implicitly performs a statistical linearization of the nonlinear model through a weighted statistical linear regression (WSLR) to calculate the covariance matrices [29]. In SPKF, the model linearization is done through a linear regression between a number of points (called sigma points) drawn from a prior distribution of a random variable rather than through a truncated Taylor series expansion at a single point. It has been found that this linearization is much more accurate than a truncated Taylor series linearization [28]. Eqs. (80)–(82) construct a core of SPKF. A central issue here is how to generate the optimal ensemble members for applying these equations. There are two basic approaches aforementioned, UKF and CDKF. For an L-dimensional dynamical system represented by a set of discretized state-space equations of (67), it has been proven that 2L+1 ensemble members, constructed by UKF or CDKF, can precisely estimate the mean and covariance. We ignore the theoretical proof and only outline the UKF scheme as below.

Denote 2L+1 sigma points at time k for producing ensemble members by χk=[χk,0,χk,1+,...,χk,L+,χk,1,,χk,L], which that is defined according to the following expressions:

 χk,0=X¯ka (84)
 χk,i+=X¯ka+[cPX,ka]i (85)
 χk,i+=X¯ka−[cPX,ka]i (86)

where L=Nx+Nη+Nζ is the sum of the dimensions of model states, model noise and measurement noise. The augmented state vector X=[x;η;ζ] is a L-dimensional vector. PX,ka is the covariance of the augmented state vector (analysis) at the previous step. [PX,ka]i is the ith row (column) of the weighted matrix square root of the covariance matrix (L dimension). c is a scale parameter that will be specified later. The key point here is to produce (2L+1) ensemble members by integrating model with 2L+1 initial conditions of Eqs. (84)–(86); by these ensemble members, the filter Eqs. (80)–(82) will be performed.

The procedure is summarized as below:

1. Initially, perturb a small amount, denoted by x˜0 on initial condition x0, using Evensen method [17]; and also randomly generate perturbation for q and r, drawn from normal distributions of N(0,Q) and N(0,R). Thus, we can construct the augmented state vector and corresponding covariance (k=0)

 X¯0a=[x0;0;0]; (87)
 P0x=x˜0x˜0T; (88)
 PX,0=(P0x000Q000R). (89)

2. From the above formula, we can calculate sigma points using Eqs. (84)–(86). Note that each set of sigma points, denoted by χk, has dimension L, e.g. the ith sigma point can be expressed by χk,i=[xk,i;ηk,i;ζk,i].

3. Using the 2L+1 sigma points to integrate state-space model. For the ith sigma point, we have xk+1,if=f(xk,i,ηk,i). When i varies from 1 to 2L+1, we produce 2L+1 ensemble members, from which analysis mean and covariance will be obtained, which are in turn used to produce sigma points for next step (k+1), to form a recursive algorithm. Suppose we have 2L+1 ensembles, the analysis mean and the covariance are calculated as follows:

 x¯k+1f=∑i=02Lwi(m)xk+1,if (90)
 (Pxxf)k+1=∑i=02Lwi(c)[xk+1,if−x¯k+1f][xk+1,if−x¯k+1f]T (91)
 yk+1,if=h(xk+1,if,ζk+1,i) (92)
 y¯k+1f=∑i=02Lwi(m)yk+1,if (93)
 (Pyy)k+1=∑i=02Lwi(c)[yk+1,if−y¯k+1f][yk+1,if−y¯k+1f]T (94)
 (Pxy)k+1=∑i=02Lwi(c)[xk+1,if−x¯k+1f][yk+1,if−y¯k+1f]T (95)
 Kk+1=PxyPyy−1, (96)
 x¯k+1a=x¯k+1f+Kk+1[yk+1−y¯k+1f] (97)
 Pk+1a=(Pxxf)k+1−Kk+1PyyKk+1T, (98)

where

 c=L+λ (99)
 w0(m)=λL+λ (100)
 w0(c)=λL+λ+1−α2+β (101)
 wi(m)=wi(c)=12(L+λ),i=1,2,...2L (102)
 λ=α2(L+κ)−L, (103)

α and κ are tuning parameters. 0<α<1 and κ0. Often κ is chosen 0 as default value and β=2.

4. From Pk+1a, as well choosing random perturbation for model noise η and observation noise ζ, drawn from Gaussian distribution of N(0,Q) and N(0,R), we calculate sigma points using Eqs. (84)–(86), and repeat Step 2 and Step 3 and so on until the assimilation is completed for the entire period.

### 6.2. Remarks of SPKF

SPKF was recently introduced into the earth sciences [15, 30]. The main differences between SPKF and EnKF include

1. SPKF chooses the ensemble members deterministically while EnKF uses random perturbation to generate ensemble members;

2. the number of sigma points is a fixed value as 2L+1, while the ensemble size in EnKF is pre-specified;

3. SPKF uses Eq. (98) to update the error covariance matrix, while EnKF does not update explicitly the error covariance matrix; and

4. Sigma points are calculated using Eqs. (84)–(86) every time when the observation is available, while the ensemble members in EnKF only perturbed in the initial time. Recent application of SPKF on a realistic oceanic model indicates that the SPKF is better than the EnKF in the similar level of computational cost [31].

In SPKF, the number of sigma points is 2L+1, here L is the dimension of the augmented state vector X=[x;η;ζ], i.e. L=Nx+Nη+Nζ is the sum of model state, model noise and observation noise. Usually, L is the order 103–104, so the computational expense is a huge challenge in SPKF for realistic problems. A solution is to use the truncated singular-value decomposition (TSVD) to reduce the sigma points. As seen from Eqs. (84)–(86), the PX,ka is a L*L matrix, thus the dimension of PX,ka determines the ensemble size. Suppose that PX,ka can be expressed as

 PX,ka=EX,kaΣk(EX,ka)T (104)

where Σk=diag(σk1,σk2,...,σkL) is a diagonal matrix of eigenvalues that are sorted in descending order, i.e. σk1σk2...σkL, and EX,ka=[eX,k,1a,eX,k,2a,...,eX,k,La]. Truncating the first m modes, so we can write the sigma points (84)–(86) as below:

 χk,0=X¯ka (105)
 χk,i+=X¯ka+cσkieX,k,ia (106)
 χk,i−=X¯ka−cσkieX,k,ia (107)

i=1,2,...,m. Thus, the ensemble size becomes 2*m+1, where m<<L. Some fast SVD algorithms can be used here, such as Lanczos and block Lanczos [32]. The application of the truncated SVD was also found in [33, 34].

Further simplifying PX,ka based on its definition (or Cholesky decomposition), i.e. PX,ka=AX,ka*(AX,ka)T, where AX,ka is the data that has subtracted the ensemble mean. Thus, Eqs.(82)–(84) can be written as follows:

 χk,0=X¯ka (108)
 χk,i+=X¯ka+[cAX,ka]i (109)
 χk,i−=X¯ka−[cAX,ka]i (110)

where [cAX,ka]i=[xka;ηk;ζk]i,i=1,2,...,L, (xka)i=(xkf)i+Kk[ykykf]. Eqs.(109) and (110) transfer the covariance matrix PX,ka to data matrix AX,ka in constructing sigma points. The largest advantage is to avoid explicit expression of PX,ka, which could be a very large matrix beyond memory of current computers. However, Eqs.(109) and (110) cannot reduce the ensemble size (2L+1). A solution is to decompose, such as principal component analysis, as used in [14]. Further discussions on optimal construction of sigma points should be conducted for a realistic application of SPKF.

Again, we look at sigma-point generation, i.e. Eqs. (106) and (107) or (109) and (110). As we defined, an augmented matrix is applied here [x;η;ζ]. Without losing the generality, rewrite them as below:

 [xk,0ηk,0ζk,0]=[x¯k,000] (111)
 [xk,iηk,iζk,i]=[x¯k,000]+c[xk,iaηk,iζk,i] (112)

Similarly, we can write Eq. (107) or (110) using individual variables. From Eqs. (111) and (112), we can draw

• Noise and model state analyses in constructing sigma points at k step are independent. It should be noted that xka is from Eq. (97) and noise are draw from a Gaussian distribution. If we assume that noise is taken randomly each time, xka is only relevant to noise that is drawn at time step k, and independent with model noise and observation noise drawn for analysis of the time step k+1, thus, PX,k is a diagonal block matrix, i.e.

 PX,k=(Pkx000Q000R) (113)

• There are no update equations for noise, so they are randomly taken from Gaussian distribution, i.e. the index i in ηi and ζi actually does not have meaning. Thus, it should be a reasonable assumption that the ηi and ζi, used for constructing sigma points at time step k+1, are not related to PX,k (time step of k), as argued above. Thus, Eq. (108) always holds unless the noise is designed considering the temporal coherence such as red noise in time.

• Based on the above, the actual ensemble size is 2Nx+1, and not 2L+1. This is because neither model noise nor observation noise can produce ensemble alone. Model errors ηi and xk,if must be joined together to produce ensemble members with Nx. Let us see this in details: at the initial time, initial perturbation on model states plus drawn noise for model errors and measurement errors are with mean and variance as follows:

 X¯0a=[x0;0;0],PX,0=(P0x000Q000R) (114)

Theoretically, there are 2(Nx+Nη+Nζ)+1 ensembles, denoted by the ith column of PX,0 (i=1,...,Nx; Nx+1,...,Nx+Nη; Nx+Nη+1,...,Nx+Nη+Nζ) and formula (84)–(86). However, at the ith column, the elements of the row, indicating the model inputs (x,η,ζ), only have the non-zero values of Nx. Obviously, the sigma points of zero-values makes the update equation χk+1,i=f(χk,i) invalid, thus, the actual ensemble size is 2Nx+1.

When truncation technique is applied to reduce the ensemble size, the ensemble spread might be shrunk due to relatively small ensemble size. Like EnKF, an inflation approach of SPKF might be helpful. It is interested in developing such a scheme for SPKF. Also, we can localize SPKF, like localized EnKF, to solve memory and computation issues.

All of the remarks of SPKF are from the authors’ thinking and understanding. It is interesting to further test and validate these ideas and properties using simple models.

## 7. Beyond Kalman filters: particle filter and its derivatives

### 7.1. Standard particle filter

We have introduced the Kalman filter (KF), extended Kalman filter (EKF), ensemble Kalman filter (EnKF) and sigma-point Kalman filter (SPKF) in previous sections. All of those filters belong to the sequential data assimilation method, i.e. observation data is assimilated into the model system as soon as it is available. The Bayesian estimation theory provides a general framework of the sequential data assimilation methods. If we assume the state-space model is given by Eqs. (34) and (35), the analysis step of a Bayesian-based assimilation method is deduced by Bayes’ theorem:

 p(xt|yt)=p(yt|xt)p(xt)p(yt), (115)

where p(yt) plays as a normalization factor.

Recalling Section 2.3, Eq. (12) actually assumes that the prior probability density function p(xt) and the likelihood function p(yt|xt) are Gaussian distribution functions, and thus the posterior probability density function p(xt|yt) is also a Gaussian. Based on the Gaussian assumption, the cost function of 3D-Var (i.e. Eq. (6)) can be derived, and it is equivalent to the Kalman filter Eqs. (27)–(29). All the Kalman-based filters (e.g. EKF, EnKF, EnSRF, SPKF, etc.) contain the inherent Gaussian assumption, and they are derived and validated for Gaussian systems in theory. However, this Gaussian assumption is often not applicable for nonlinear systems. Even for an initial Gaussian error, it often becomes non-Gaussian while propagating forward with nonlinear models.

The particle filter (PF) is a sequential data assimilation method that is able to deal with the nonlinear and non-Gaussian state estimation problem. Like EnKF, PF also uses an ensemble, but it is used to approximately estimate the full probability density function rather than only the error covariance B. An ensemble member is also referred to as a particle in PF literatures. Suppose the prior probability density is the sum of Dirac delta functions

 p(xt)=∑i=1Nδ(xt−xti) (116)

where {xti,i=1,2,...,N} are particles drawn from p(xt). The posterior probability density is derived by applying the Bayes’ theorem directly, that is

 p(xt|yt)∝p(yt|xt)p(xt)=∑i=1Nwt,iδ(xt−xti) (117)

in which wt,ip(yt|xti), and a normalization step, is required to make {wt,i,i=1,2,...,N} sum up to 1. If we assume the likelihood function is Gaussian, wt,i can be computed by

 p(yt|xti)=12πRexp{[yt−h(xti)]R−1[yt−h(xti)]T}. (118)

Or else we can use any specified probability density function of p(yt|xt) to compute the likelihood.

With the posterior probability density function p(xt|yt), the analysis value and covariance can be computed by

 x¯t=∫​x*p(x|yt)dx=∑i=1Nwt,ixti (119)
 var(xt)=∫​x2*p(x|yt)dx−x¯t2=∑i=1Nwt,i(xti)2−x¯t2 (120)

and higher-order moments of the posterior state can also be estimated.

Before stepping forward to next stage, a resampling step is required to make each particle with uniform weight. A typical resampling strategy is the sequential importance resampling (SIR) that removes particles with very small weights and duplicates those with large weights. A detailed algorithm of SIR can be found in [35]. The resampling algorithm gives the indices and number of copies of those particles that should be duplicated, i.e. computes s1,s2,,sN according to the weights, where each si1,2,,N. And then {xtsi,i=1,2,...,N} are regarded as new particles.

In summary, the algorithm of standard particle filter is given below:

1. generate the initial ensemble {x0i,i=1,2,...,N} as EnKF does;

2. integrate the model until the observation is available;

3. use Eq. (118) to compute the weight for each particle, and normalize them;

4. use Eq. (119) to obtain the analysis and Eq. (120) to obtain the covariance if necessary;

5. apply the resampling algorithm to derive the resampling indices, and derive the new ensemble {xtsi,i=1,2,...,N}; and

6. repeat Steps 2–5 until the end of assimilation period.

The standard particle filter [36] is also known as the bootstrap particle filter or SIR particle filter.

### 7.2. Variants of PF

The particle filter is a highly promising technique because it does not invoke any Gaussian assumptions. It has been widely used and studied in many other fields. The PF estimates the full probability density function of the forecasted state based on an ensemble of states with different weights. However, the PF suffers from the problem of filter degeneracy, i.e. the procedure collapses to a very small number of highly weighted particles among a horde of almost useless particles carrying a tiny proportion of the probability mass. Even if resampling techniques are used, the degeneracy cannot be completely avoided with limited ensemble size. The number of particles must grow substantially with the dimension of the system to avoid degeneracy [37, 38], a requirement that is apparently too costly for large models such as GCMs. Various efforts have been made to resolve this issue, as documented in an excellent overview [39].

Several strategies are often employed to address the problem of filter degeneracy in applications of the particle filter. For example, Papadakis et al. proposed a weighted ensemble Kalman filter (WEnKF) [40] that uses an ensemble-based Kalman filter as the proposal density from which the particles are drawn. Van Leeuwen et al. developed a fully nonlinear particle filter by exploiting the freedom of the proposal transition density, which ensures not only that all particles ultimately occupy high-probability regions of state-space but also that most of the particles have similar weights [41]. The implicit particle filter uses gradient descent minimization combined with random maps to find the region of high probability, avoiding the calculation of Hessians [42]. Luo et al. have proposed an efficient particle filter that uses residual nudging to prevent the residual norm of the state estimates from exceeding a pre-specified threshold [43]. These particle filters were very recently proposed and have attracted broad attention in the community of atmos./ocean. data assimilation. Below, we will briefly introduce the equivalent weights particle filter (EWPF) by Van Leeuwen [39, 41].

The equivalent weights particle filter is a fully nonlinear data assimilation method that works in a two-stage process. It uses the proposal density to ensure that the particles have almost equivalent weights, by which the filter degeneracy can be avoided.

In the standard PF, the particles at time step t are propagated by the original model, i.e. xt+1i=f(xti)+ηt, which implies that the particles at time step t+1 are drawn from the transition density p(xt+1|xt). In that case, the weight of each xt+1i varies greatly and filter degeneracy is very likely to happen.

In EWPF, another transition density, call the proposal density, is introduced. The proposal density depends on the future observation yt+1 and all previous particles {xti,i=1,2,...,N}. With the help of proposal density, the particle xti is propagated using a different model

 xt+1i=g(xti,yt+1)+ηt. (121)

The model g can be anything, for instance, one can add a relaxation term and change random forcing:

 xk+1i=f(xki)+ηki+A(yt+1−H(xki)),k=1,...,p(k) (122)

where p(k) is a function of the time between observations, and each k implies each model step without observation. A is a relaxation term that will ‘drag’ the particle towards future observation. In [44], it is given by

 A=p(k)QHTR−1, (123)

where the matrices Q and R correspond to the model error covariance and observation error covariance, respectively.

The second stage of EWPF involves updating each particle at the observation time t+1 via the formula

 xt+1i=f(xti)+αiQHT(HQHT+R)−1(yt+1−H(f(xti)))+ηti (124)

where αi are scalers computed so as to make the weights of the particles equal. Using the expression for weights and setting all weights equal to a target weight (e.g. 1/N)

 wi=p(yt+1|xt+1i(αi))=wtarget (125)

αi can be solved by numerical methods.

Eqs. (122)–(125) show an example of how to construct the proposal model g in(121)), it can also be done by running 4D-var on each particle (implicit particle filter), or using the EnKF as proposal density. Those methods refer to Morzfeld et al. [42] and Papadakis et al. [40].

### 7.3. Remarks of PF

#### 7.3.1. Combined method of EnKF and PF

The ensemble Kalman particle filter (EnKPF) is a combination of the EnKF and the SIR particle filter. It was recently introduced to address non-Gaussian features in data assimilation for highly nonlinear systems, by providing a continuous interpolation between the EnKF and SIR-PF analysis schemes [45].

As stated above, both EnKF and PF methods are based on the Bayesian estimation theory, but they approximate the probability density function of the state in different ways. The EnKF only approximates the mean and covariance of the state through a series of equally weighted ensemble members. And the particle filter considers the weights of the ensemble members according to the likelihoods. The EnKF contains the Gaussian assumption but requires relatively small ensemble size to prevent filter degeneracy, which is in contrast with the PF.

The EnKPF takes advantage of both methods by combining the analysis schemes of the EnKF and the SIR-PF using a controllable index (i.e. tuning parameter). In contrast with both the EnKF and the SIR-PF, the analysis scheme of the EnKPF not only updates the ensemble members but also considers the weights.

Assume that the forecast ensemble {xif,i=1,2,,N} and the observation data y are available, and that the forecast covariance Pf can be calculated using the ensemble, the analysis scheme of EnKPF is given below.

1. Choose γ[0,1] and apply the EnKF that is based on the inflated observation error covariance R/γ as follows:

 K1(γ)=PfHT(HPfHT+R/γ)−1=γPfHT(γHPfHT+R)−1 (126)
 vi=xif+K1(γ)(y−Hxif) (127)
 Q=1γK1(γ)RK1(γ)T (128)

2. Compute the weights wi for each updated member vi as follows:

 wi=ϕ(y;Hvi,R1−γ+HQHT) (129)

and normalize the weights by w^i=wi/i=1Nwi, in which ϕ is the probability density function of a Gaussian.

3. Calculate the resampling index s(i) for each member vi according to w^i using the SIR algorithm, then set

 xiu=vs(i)+K1(γ)ϵ1,iγ (130)

where 1, i is a random observation error drawn from the Gaussian N(0,R).

4. Compute K2(1γ)=(1γ)QHT[(1γ)HQHT+R]1, and generate 2, i from N(0,R) and EnKF with the inflated observation error again as follows:

 xia=xiu+K2(1−γ)[y+ϵ2,i1−γ−Hxiu] (131)

γ can be determined recursively to match the optimal performance of EnKPF. More details of EnKPF can be found in [45, 46].

#### 7.3.2. Localization in PF

Previous sections have introduced the localization technique in EnKF, which greatly improves the performance of EnKF in high-dimensional models. The advantages of localization motivate the search for a localization procedure in particle filtering.

Van Leeuwen had a deep discussion on this topic [39]. He argued that one can calculate the weights locally, but it is not easy for resampling. In the resampling step low-weight particles are abandoned and high-weight particles are duplicated. However, with local weights, different particles are selected in different parts of the domain. The problem is that we have to have continuous (in space) model fields to propagate forward in time with the model. Just constructing a new particle that consists of one particle in one part of the model domain and another particle in another domain will lead to problems at the boundary between these two.

The problem of spatial discontinuity makes the localization in particle filter not feasible currently. Most of the advanced particle filters (e.g. EWPFand implicit particle filter) are using the idea of global weight, i.e. the weight for each member is a scalar.

However, there are still some attempts on the localization in particle filter. For example, Poterjoy developed the localized particle filter (LPF) that updates particles locally using ideas borrowed from EnKF [47]. The paper has demonstrated some advantages of the new filter over EnKF, especially when the observation networks consist of densely spaced measurements that relate nonlinearly to the model state. This is a very interesting work about the particle filter, it also has a potential to work with large atmos./ocean. data assimilation systems.

## 8. Remarks and conclusions

Data assimilation is the process by which observations of the actual system are incorporated into a numerical model to optimally estimate the system states. In this chapter, we introduced several ensemble-based data assimilation methods that are widely used in the earth sciences. One can read it as an introduction to ensemble-based data assimilation methods, but also can view it as a brief review of the application of these ensemble-based assimilation methods on the earth sciences. It is author’s effort to write such a ‘review’ chapter with introductory language, making it more readable. As found in the chapter, many discussions, derivations and analyses are actually very thoughtful, not only introducing these methods, but also deepening the understanding to them. This is emphasized by the analysis of the rationale behind each method, including: i). the principle for deriving the algorithm; ii) basic assumptions of each method; iii). the connection and relation of different methods (e.g., EKF and EnKF, EnKF and SPKF etc.); iv). the advantages and deficiencies of each method. Especially we put rather weights to discuss potential concerns, challenges and possible solutions when these methods are applied to high-dimensional systems in the earth sciences. This chapter can be a “textbook” for the beginners to learn these data assimilation algorithms, and also a good reference for researchers for better understanding and applying these methods.

## Acknowledgements

This work was supported by the NSERC (Natural Sciences and Engineering Research Council of Canada) Discovery Grant, the National Science Foundation of China (41276029, 41321004, 41530961,91528304), the National Programme on Global Change and Air-Sea Interaction (GASI-IPOVAI-06)and the National Basic Research Program (2013CB430302).

## References

1 - Talagrand O. Assimilation of observations, an introduction. Journal of Meteorological Society of Japan Series 2. 1997;75:81–99.
2 - Kalnay E. Atmospheric modeling, data assimilation, and predictability. Cambridge University Press. New York. 2003.
3 - Le Dimet F. X., Talagrand O. Variational algorithms for analysis and assimilation of meteorological observations: Theoretical Aspects. Tellus A. 1986;38(2):97–110.
4 - Courtier P., Andersson E., Heckley W., Vasiljevic D., Hamrud M., Hollingsworth A. The ECMWF implementation of three-dimensional variational assimilation (3D-Var). I:Formulation. Quarterly Journal of the Royal Meteorological Society. 1998;124(550):1783–1807.
5 - Kalman R. E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering. 1960;82(1):35–45.
6 - Rabier F., Jarvinen H., Klinker E., Mahfouf J. F., Simmons A. The ECMWF operational implementation of four-dimensional variational assimilation. I: Experimental results with simplified physics. Quarterly Journal of the Royal Meteorological Society. 2000;126(564):1143–1170.
7 - Mahfouf J. F., Rabier F. The ECMWF operational implementation of four-dimensional variational assimilation. II: Experimental results with improved physics. Quarterly Journal of the Royal Meteorological Society. 2000;126(564):1171–1190.
8 - Klinker E., Rabier F., Kelly G., Mahfouf J. F. The ECMWF operational implementation of four-dimensional variational assimilation. III: Experimental results and diagnostics with operational configuration. Quarterly Journal of the Royal Meteorological Society. 2000;126(564):1191–1215.
9 - Houtekamer P. L., Mitchell H. L., Pellerin G., Buehner M., Charron M., Spacek L., et al. Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Monthly Weather Review. 2005;133(3):604–620.
10 - Evensen G. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research: Oceans (1978–2012). 1994;99(C5):10143–10162.
11 - Houtekamer P. L., Mitchell H. L. Data assimilation using an ensemble Kalman filter technique. Monthly Weather Review. 1998;126(3):796–811.
12 - Evensen G. Advanced data assimilation for strongly nonlinear dynamics. Monthly Weather Review. 1997;125(6):1342–1354.
13 - Keppenne C. L. Data assimilation into a primitive-equation model with a parallel ensemble Kalman filter. Monthly Weather Review. 2000;128:1971–1981.
14 - Houtekamer P. L., Mitchell H. L. A sequential ensemble Kalman filter for atmospheric data assimilation. Monthly Weather Review. 2001;129(1):123–137.
15 - Ambadan J. T., Tang Y. Sigma-point Kalman filter data assimilation methods for strongly. Journal of the Atmospheric Sciences. 2009;66(2):261–285.
16 - Tang Y., Ambandan J., Chen D. Nonlinear measurement function in the ensemble Kalman filter. Advances in Atmospheric Sciences. 2014;31(3):551–558.
17 - Evensen G. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dynamics. 2003;53(4):343–367.
18 - Burgers G., van Leeuwen P. J., Evensen G. Analysis scheme in the ensemble Kalman filter. Monthly Weather Review. 1998;126(6):1719–1724.
19 - Whitaker J. S., Hamill T. M. Ensemble data assimilation without perturbed observations. Monthly Weather Review. 2002;130(7):1913–1924.
20 - Tippett M. K., Anderson J. L., Bishop C. H., Hamill T. M., Whitaker J. S. Ensemble square root filters. Monthly Weather Review. 2003;131(7):1485–1490.
21 - Bishop C. H., Etherton B. J., Majumdar S. J. Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Monthly Weather Review. 2001;129(3):420–436.
22 - Anderson J. L. An ensemble adjustment Kalman filter for data assimilation. Monthly Weather Review. 2001;129(12):2884–2903.
23 - Hunt B. R., Kostelich E. J., Szunyogh I. Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D: Nonlinear Phenomena. 2007;230(1):112–126.
24 - Gaspari G., Cohn, S. Construction of correlation functions in two and three dimensions. Quarterly Journal of the Royal Meteorological Society. 1999;125(554):723–757.
25 - Anderson J. L. Ensemble Kalman filters for large geophysical applications. IEEE Control Systems. 2009;29(3):66–82.
26 - Julier S. J., Uhlmann J. K., Durrant-Whyte H. F. A new approach for filtering nonlinear systems. Proceedings of American Control Conference. 1995;3:1628–1632.
27 - Ito K., Xiong K. Gaussian filters for nonlinear filtering problems. IEEE Transactions on Automatic Control. 2000;45(5):910–927.
28 - van Der Merwe R., Wan E. Efficient derivative-free Kalman filters for online learning. In: Proceedings of ESANN. 2001.
29 - Gelb A. Applied optimal estimation. MIT Press. Cambridge. 1974.
30 - Luo X., Morez I. M. Ensemble Kalman filter with the unscented transform. Physica D: Nonlinear Phenomena. 2009;238(5):549–562.
31 - Tang Y., Deng Z., Manoj K. K., Chen D. A practical scheme of the sigma‐point Kalman filter for high‐dimensional systems. Journal of Advances in Modeling Earth Systems. 2014;6:21–37.
32 - Golub G. H., van Loan C. F. Matrix computations. JHU Press. Maryland. 2012.
33 - Hansen B. Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Monthly Weather Review.2005;133(3):604–620.
34 - Ehrendorfer M., Tribbia J. J. Optimal prediction of forecast error covariance through singular vectors. Journal of the Atmospheric Sciences. 1997;52(2):286–313.
35 - Arulampalam M. S., Maskell S., Gordon N., Clapp T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing. 2002;50(2):174–188.
36 - Gordon N. J., Salmond D. J., Smith A. F. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F (Radar and Signal Processing). 1993;140(2):107–113.
37 - Bengtsson T., Snyder C., Nychka D. Toward a nonlinear ensemble filter for high-dimensional systems. Journal of Geophysical Research. 2003;108(D24):8775.
38 - Snyder C., Bengtsson T., Bickel P., Anderson J. Obstacles to high-dimensional particle filtering. Monthly Weather Review. 2008;136(12):4629–4640.
39 - van Leeuwen P. J. Particle filtering in geophysical systems. Monthly Weather Review. 2009;137(12):4089–4114.
40 - Papadakis N., Mémin E., Cuzol A., Gengembre N. Data assimilation with the weighted ensemble Kalman filter. Tellus A. 2010;62(5):673–697.
41 - van Leeuwen P. J. Efficient nonlinear data-assimilation in geophysical fluid dynamics. Computers and Fluids. 2011;46(1):52–58.
42 - Morzfeld M., Chorin A. J. Implicit particle filtering for models with partial noise, and an application to geomagnetic data assimilation. Nonlinear Processes in Geophysics. 2012;19(3):365–382.
43 - Luo X., Hoteit I. Efficient particle filtering through residual nudging. Quarterly Journal of the Royal Meteorological Society. 2014;140(679):557–572.
44 - Browne P. A., van Leeuwen P. J. Twin experiments with the equivalent weights particle filter and HADCM3. Quarterly Journal of the Royal Meteorological Society. 2015;141(693):3399–3414.
45 - Frei M., Kunsch H. R. Bridging the ensemble Kalman and particle filters. Biometrika. 2013;100(4):781–800.
46 - Shen Z., Tang Y. A modified ensemble Kalman particle filter for non-Gaussian systems with nonlinear measurement functions. Journal of Advances in Modeling Earth Systems. 2015;7(1):50–66.
47 - Poterjoy J. A localized particle filter for high-dimensional nonlinear systems. Monthly Weather Review. 2016;144(1):59–76.