The Kalman filter.
In this chapter, the ensemble-based data assimilation methods are introduced, including their developments, applications and existing concerns. These methods include both traditional methods such as Kalman filter and its derivatives and some advanced algorithms such as sigma-point Kalman filters and particle filters. Emphasis is placed on the challenges of applying these methods onto high-dimensional systems in the earth sciences.
- data assimilation
- Kalman filter
- particle filter
In this chapter, we will talk about the modelling and simulation using both observed data and numerical models, that is, the observations will be incorporated into numerical models for optimal modelling and simulation. In statistics, this is called state-space estimation. In the earth science, it is called data assimilation. For example, a strict definition of data assimilation in atmospheric and oceanic sciences is the process to estimate the state of a dynamic system such as atmospheric and oceanic flow by combining the observational and model forecast data .
In general, assimilation methods can be classified into two categories: variational and sequential. This chapter is a tutorial on the sequential data assimilation methods such as ensemble Kalman filter (EnKF) and its variants. A brief introduction of the particle filter (PF) is also provided in this chapter.
This tutorial places emphasis on the rationale behind each method, including: (i) the principle for deriving the algorithm; (ii) the basic assumptions of each method; (iii) the connection and relation between different methods (e.g. extended Kalman filter (EKF) and EnKF, EnKF and sigma-point Kalman filters (SPKF), etc.); and (iv)the advantage and deficiency of each method.
This chapter has been written and organized through teaching for under-/graduatestudents in earth science courses. It can also be a good reference to researchers in the field of modelling and data assimilation.
2. The general framework of several assimilation approaches
Intuitionally, one might think that an optimal simulation scheme is to directly replace model variables by observations during numerical integrations. Such a direct replacement is usually not correct since observations are not perfect and contain errors. A simple replacement will introduce observation errors into models, and ignore possible impact of observation errors on model behaviours, easily resulting in imbalance of model dynamics and physics. Thus, the application of observations into numerical models must consider both model and observation errors that play a critical role in the assimilation process.
We will start to display the assimilation concept by a simple example. A detail introduction can be found in .
For an unknown true state value, denoted by , there are two samples, denoted by (e.g. model simulation) and (observation), which have the errors
We assume the measurement or observation is unbiased, and the variances of errors are known, i.e. E(
There are several methods for this solution, as demonstrated below.
2.1. Least-squares method
Assume the analysis is a linear combination of both and , that is, . Due to the assumption that both and are unbiased, should be unbiased, i.e. , so , and then . The best (optimal) estimate should minimize the variance of as below:
here, we assumed that the errors of and are uncorrelated, i.e. E(
Using Eq. (5), the variance of Ta could be minimized.
2.2. Variational approach
In general, assimilation methods can be classified into two categories: variational and sequential. Variational methods such as three-dimensional variational (3D-Var) method and four-dimensional variational (4D-Var) method [3, 4] are batch methods, whereas sequential methods such as Kalman filter (KF)  belong to the estimation theory.
They both have had great success. The European Centre for Medium-Range Weather Forecasts (ECMWF) introduced the first 4D-Var method into the operational global analysis system in November 1997 [6–8]. The ensemble Kalman filter (EnKF) was first introduced into the operational ensemble prediction system by Canadian Meteorological Centre (CMC) in January 2005 .
This chapter is a tutorial of the ensemble-based sequential data assimilation methods, such as EnKF and its variants. However, we will briefly demonstrate the idea of variational assimilation by the above example.
First, a cost function should be defined for variational assimilation approach. For this simple example, we define the cost function as below:
The solution is to seek an analysis , determined by and , leading to the cost function minimum, i.e. . Obviously, we have and . Substitute with (6), it is
The above is a simple example of the 3D variational assimilation approach, where we only consider the analysis error (cost function) for a single time. However, in many cases, we need to consider the error growth during a period, i.e. the sum of errors during the period, in the cost function Eq. (6). For example, the cost function of 4D-Var is defined as below:
Meanwhile follows a dynamical model, saying , where F is a nonlinear dynamical model, Mn is the integral operator and is the initial time. Thus, the cost function value of (10) is only determined by the initial condition. Namely, the objective here is to seek optimal initial condition that enables (10) minimum, i.e. minimizing (10) subject to dynamical model F. This is a standard conditional extreme problem that can be solved by Lagrange multiplier approach. However, the complexity of dynamical model excludes the possibility to get the analytical solution. We have to solve the minimum problem with aid of numerical methods, e.g. Newton conjugate gradient method. All of numerical methods require the gradient value for solution.
Again, it is almost impossible for obtaining analytical solution of due to the complexity of F. Usually researchers get the gradient value numerically using an approach of tangent linear and adjoint models. The details on tangent linear and adjoint models can be found in relevant references as cited above. It should be noticed that it is very difficult, even intractable sometimes, to construct tangent linear and adjoint models in some cases. Thus, more and more researchers have started to apply sequential assimilation methods instead of 4D-Var in recent years. Next, we will introduce the concept of the sequential assimilation method using the above example.
2.3. Bayesian approach
Assume and are the mean value and standard deviation of the model prediction that implies a prior probability distribution of truth T,
Obviously, this is a Gaussian distribution function, which can be denoted by N(T1, ) Given the observation and its standard deviation , the posterior distribution of the truth can be expressed by Bayes’ theorem:
was ignored in (12) since it is independent of T, and usually plays as a normalization factor. The likelihood function describes the probability that the observation becomes given an estimation of T. It is commonly assumed to be Gaussian . The object here is to estimate the truth by maximizing the posterior probability (namely, we ask the truth to occur as much as possible—maximum probability). Maximizing is equivalent to maximizing the logarithm of the right item of (12), i.e.
Obviously, the maximum of occurs at the minimum of the second item on the right-hand side of (13), i.e. the minimum of the cost function J of (6). Thus, under the assumption of Gaussian distribution, maximizing a posterior probability (Bayesian approach) is equivalent to minimizing cost function (variational assimilation approach). Further, if the model F is linear and the probability distribution is Gaussian, it can be further proved that the Kalman filter is equivalent to 4D-Var adjoint assimilation method.
3. Optimal interpolation (OI) and Kalman filter (KF)
3.1. Optimal interpolation
The most special case in data assimilation is that the forecast model is linear and the errors are Gaussian. The solution among sequential methods to this case is provided by Kalman filter. Typically, the Kalman filter applies to the below state-space model:
where M and H are linear operators of model and measurement, respectively. x is model state and y is the observation, and the subscript implies the time step. and are the model errors and observational errors, respectively, which have variance:
Assuming the estimate of model state at a time step is a linear combination of model forecast and observation , i.e. the filter itself is linear, so
Eq. (16) is the standard expression of Kalman filter. K is called Kalman gain that determines the optimal estimate and is called the innovation. An analysis step is essentially to determine the increment to the forecast by combining the Kalman gain and the innovation. Before deriving K, we denote the covariance matrix of the analysis error
Clearly, we are seeking for K that can lead to minimum. Subtracting on both sides of Eq. (16) leads to the below equation:
Here, we used . The optimal estimate asks the trace of minimum, namely, . It can be computed that
Substitute into (13)
Here, we invoked the below properties:
Thus, we have the optimal estimate filter:
In the estimate (27)–(29), if the background error covariance B is prescribed, the estimate is called optimal interpolation. The OI does not involve state equation (14) and B is unchanged during the entire assimilation process.
3.2. Kalman filter
Now, we consider that B in (28) changes with the assimilation cycle. This is more realistic since the model prediction errors should be expected to decrease with the assimilation.
From Eq. (14), we have
Eq. (30) indicates that even the true value is input at a time step, model cannot get a true value for next step due to model bias . Eq. (31) shows a standard procedure for the model prediction of next step starting from the analysis of previous step.
where Pta = <
One Kalman filter cycle consists of two parts, namely, one analysis step (Eqs. (27)–(29)) and one prediction step (Eqs. (31) and (33)). The analysis state and covariance are treated as initial conditions for the prediction step, until the next observation is available. Sometimes, is denoted by in Kalman filter literatures.
3.3. Extended Kalman filter (EKF)
In deriving the Kalman filter, we assume the state model M and measurement model H are both linear. Further, we also assume the error has Gaussian distribution. Therefore, classic KF only works for linear models and Gaussian distribution. If the dynamical model and measurement model are not linear, we cannot directly apply KF. Instead, linearization must be performed prior to apply KF. The linearized version of KF is called extended KF (EKF), which solves the below state-space estimate problem:
where f and h are nonlinear models, and and are additive noises.
The filter is still assumed to be ‘linear’, i.e.
Actually, it is not a linear combination of the forecast and observation if his not linear. However, we just extend the formulation of Eq. (16), and apply it intuitively in nonlinear cases. Ignoring high-order terms, the following holds approximately
where H is the linearization of h and . So,
which is the same as Eq. (18). Following the same derivation as that for Eq. (18), we can obtain the equations similar to (27)–(29). Therefore, if the measurement model h is nonlinear, the KF can be still applied with a linearization of h.
The procedure to perform EKF is similar to that for KF, as listed above. The disparities and similarities between EKF and KF include
Kalman gain K has the same form for both, especially the linear or linearized measurement model should be used;
the update equation of model error covariance has the same form, with linear and linearized state model used;
the filtering algorithm is different, linear measurement model H used in KF and nonlinear model h in EKF.
It should be noticed that EKF is only an approximate of KF for nonlinear state model.
4. Ensemble Kalman filter (EnKF)
4.1. Basics of EnKF
A challenge in EKF is to update background (prediction) error covariance, which requires the linearization of nonlinear model. The linearization of nonlinear model is often difficult technically, and even intractable in some cases, e.g. non-continuous functions existing in models. Another drawback of EKF is to neglect the contributions from higher-order statistical moments in calculating the error covariance.
To avoid the linearization of nonlinear model, the ensemble Kalman filter (EnKF) was introduced by Evensen [10, 11], in which the prediction error covariance B of Eq. (33) are estimated approximately using an ensemble of model forecasts. The main concept behind the formulation of the EnKF is that if the dynamical model is expressed as a stochastic differential equation, the prediction error statistics, which are described by the Fokker-Flank equation, can be estimated using ensemble integrations ( [10, 12]; thus, the error covariance matrix B can be calculated by integrating the ensemble of model states. The EnKF can overcome the EKF drawback that neglects the contributions from higher-order statistical moments in calculating the error covariance. The major strengths of the EnKF include the following:
there is no need to calculate the tangent linear model or Jacobian of nonlinear models, which is extremely difficult for ocean (or atmosphere) general circulation models (GCMs);
the covariance matrix is propagated in time via fully nonlinear model equations (no linear approximation as in the EKF); and
it is well suited to modern parallel computers (cluster computing) .
EnKF has attracted a broad attention and been widely used in atmospheric and oceanic data assimilation.
Simply saying, EnKF avoids the computation and evolution of the error covariance B as in Eq. (33), and computes B using below formula as soon as it is required.
where represents the i-th member of the forecast ensemble of system state vector at step t, and N is the ensemble size. The use of Eq. (44) avoids processing M, the linearized operator of nonlinear model. However, the measurement function H is still linear or linearized while computing the Kalman gain K, which causes concern. To avoid the linearization of nonlinear measurement function, Houtekamer and Mitchell  wrote Kalman gain by
where . Eqs. (46) and (47) allow direct evaluation of the nonlinear measurement function h in calculating Kalman gain. However, Eqs. (46) and (47) have not been proven mathematically, and only were given intuitionally. Tang and Ambadan argued that Eqs. (46) and (47) approximately hold if and only if and is small for . Under these conditions, Tang et al. argued Eqs. (46) and (47) actually linearize the nonlinear measurement functions h to H . Therefore, direct application of the nonlinear measurement function in Eqs. (46) and (47), in fact, imposes an implicit linearization process using ensemble members. In next section, we will see that Eqs. (46) and (47) can be modified under a rigorous framework.
Thus, the procedures of EnKF are summarized as below (Table 3):
Imposing perturbations on initial conditions and integrate the model, i.e. where (ensemble size) and is the initial condition.
Calculating analysis using
after K is obtained. It should be noted that each ensemble member produces an analysis; the average of all (N) analyses can be obtained.
Using to obtain new ensemble members for next round analysis.
Repeating Steps 2–4 until the end of assimilation period.
It should be noted that the observation should be treated as a random variable with the mean equal to yo and covariance equal to R. This is why there is in Eq. (48). Simply, is often drawn from a normal distribution .
From the above procedure, we find that Eq. (44) is not directly applied here. Instead, we use Eqs. (46) and (47) to calculate K. This is because Eqs. (46) and (47) avoid the linearization of nonlinear model and also avoid the explicit expression of matrix B, which is often very large and cannot be written in current computer sources in many realistic problems. The measurement function, h, projecting model space (dimension) to observation space (dimension), greatly reduces the number of dimension.
4.2. Some remarks on EnKF with large dimensional problems
4.2.1. Initial perturbation
The success of EnKF highly depends on the quality of ensemble members produced by initial perturbations. It is impractical to represent all possible types of errors within the ensemble because of the computational cost, the method of generating initial perturbations must be chosen judiciously.
The first issue is the amplitude of initial perturbations. Usually, the following two factors are considered when selecting the amplitude of initial perturbations: the amplitude of observation error and the amplitude of model errors induced by model parameters and uncertainty in model physics. If a model is perfect, the amplitude of the perturbations should be the same as the amplitude of observation errors. This combined error is used to determine the amplitude of perturbations.
When the perturbation amplitude is determined, the practical initial perturbation field generating each ensemble member could be constructed by a normalized pseudorandom field multiplied by the prescribed amplitude. Considering the spatial coherence, the pseudorandom field is red noise as proposed by Evensen , summarized as below:
Calculate the statistical characteristics for the pseudorandom field to meet its variance of 1 and mean of 0 by solving the following nonlinear equation:
where , , and and are the number of grid points in the x-axis (lon.) and the y-axis (lat.). The l and p are wavenumbers, changing from 1 to the maximum value of and . and are the intervals of two adjacent points, often set to 1, and is the decorrelation length. The purpose of Eq. (49) is to derive for the other feature:
After c and are obtained, we can construct a two-dimensional pseudorandom field:
While , cover the whole domain, Eq. (51) produces a two-dimensional random field with spatial coherence structure and the variance of 1 and mean of 0. If the realistic uncertainty (error) has an amplitude β, the perturbation should be βW. Similarly, Eq. (51) is often used for the error perturbation used in the fourth step of the EnKF procedure.
Sometimes, we need to consider the vertical coherence of pseudorandom fields between adjacent levels in oceanic models. A simple method for this purpose is to construct the pseudorandom field at the kth level by following equation:
where is the pseudorandom field at the kth level without considering vertical coherence, constructed using the above method. Initially, for the surface perturbation (), the vertical coherence is not considered, i.e. equals to zero since does not exist. Eq. (52) indicates that a practical pseudorandom at the kth level () is composed of and . As such the is correlated with , i.e. the practical pseudorandom fields of two adjacent levels (and ) are coherent with each other. Their correlation or coherent structure is determined by the coefficient . Eq. (52) generates a sequence that is white in the vertical direction if (i.e. ), but a sequence that is perfect correlated in vertical if (i.e. ). Eq. (52) is also often used to construct random field that is temporally coherent, for example, a continuous random noise that has coherence in time, as used for in the forecast model . The random noise in the EnKF procedure can also be replaced by the random noise imposed in model forcing. For example, the random noise is continuously added to wind forcing for oceanic models. Even for some atmospheric models with transition processes, there are inherent random noises making not necessary. One important criteria for and the amplitude β is to examine ensemble spread by some sensitivity experiments.
4.2.2. The computational cost of Kalman gain
The Kalman gain K has dimension of , where L is the number of model variables and m is the number of observational variables. In many realistic problems, L and m are very large numbers (, the ensemble size), making the inversion very expensive.
A simple procedure is to rewrite the Kalman gain K, as below:
where indicates that the model ensemble predictions removed the ensemble mean (, for ). was invoked here. If we assume the ensemble prediction error () is not correlated to observation error, i.e. , the following is valid :
where has dimension . Usually, ensemble size N is much less than m. Using the singular-value decomposition (SVD) technique, we have
Eq. (54) then becomes
4.2.3. Stochastic EnKF and deterministic EnKF
In EnKF introduced in the previous section, the observation assimilated into dynamical model should be treated to be stochastic variable, as expressed by in Eq. (48). It is a must if the classic EnKF algorithm is used. It has been proven that if the EnKF assimilates deterministic observations (i.e., observation not changed at each ensemble member), the analysis error covariance will be systematically underestimated, typically leading to filter divergence, as indicated by below [11, 18]:
However, the perturbed observation approach (i.e. ) introduces an additional source of sampling error that reduces analysis error covariance accuracy and increases the probability of understanding analysis error covariance [19, 20]. Thus, an approach that only uses a single observation realization but avoids systematical underestimation of analysis error covariance was pursued. There are several approaches to implement this goal, as summarized by Tippettet al. . Below, we will introduce an approach developed by Whitaker and Hamill , called Ensemble squareroot filter (EnSRF).
Denote the deviation of analysis from the analysis mean by , it is easy to write
where . If a single observation realization is assimilated in all ensemble members, and
We seek a definition for that will result in an ensemble whose analysis error covariance equals to , i.e.
The solution of Eq. (62) is
Therefore, EnSRF is summarized as below (Table 4):
It should be noted that there are two Kalman gains used in EnSRF, the original K for updating ensemble mean and a new for updating the anomalies. It indicates that one single observation realization of classic EnKF has the same ensemble analysis mean as stochastic observations.
Initially, the term EnKF refers, in particular, to the stochastic ensemble Kalman filter that requires perturbing the observations. Subsequently, several deterministic EnKFs that avoid the use of perturbed observations were developed, e.g. the ETKF , the EAKF  and the EnSRF. These filter designs are labelled as variants of the EnKF because they are also based on the Kalman filtering formula and ensemble representations.
4.2.4. Inflation approach
The forecast error covariance is defined by (44)
Eq. (64) is an approximation to B using forecast ensemble. Due to limited computational source, the ensemble size N is often restricted to a small value for many realistic issues. A small ensemble size may cause a very small ensemble spread, causing the approximation of B by Eq. (64), which is seriously underestimated. To solve this problem, B is multiplied by an inflator factor λ (slightly greater than 1). λ is empirically determined, such as some sensitivity experiments, with the typical value of 1.01. λB is used to replace B in EnKF formula. This approach is equivalent to the below approach:
4.2.5. Localization of EnKF
When EnKF is applied to high-dimensional atmospheric and oceanic models, the limited ensemble size will cause the estimated correlations to be noisy . When the ensemble size is insufficient, it will produce spurious correlations between distant locations in the background covariance matrix B. Unless they are suppressed, these spurious correlations will cause observations from one location to affect the analysis in locations an arbitrarily large distance away, in an essentially random manner . This needs to be remedied by the localization method.
Another reason for using localization is that the treatment of localization artificially reduces the spatial domain of influence of observations during the update. The localization dramatically reduces the necessary ensemble size, which is very important for operational systems. Two most common distance-based localization methods used in practice are local analysis and covariance localization.
Using local analysis, only measurements located within a certain distance from a grid point will impact the analysis in this grid point. This allows for an algorithm where the analysis is computed grid point by grid point. It was found that severe localization could lead to imbalance, but with large enough radius of influence (decorrelation length) for the measurements, this was not a problem. Hunt et al. use the local analysis method in their ETKF algorithm and developed a local ensemble transform Kalman filter (LETKF) .
To eliminate the small background error covariance associated with remote observations, Houtekamer and Mitchell uses a Schur (element-wise) product of a correlation function with local support and the covariance of the background error calculated from the ensemble . That is, the matrix B in Eq. (48) is replaced by ρ
Here, α is a scalar parameter. To the best of author’s knowledge, this is the first case that the covariance localization is used in EnKF.
Nowadays, a typical covariance localization approach is used to represent prior covariances using an element-wise product of ensemble covariance and a correlation function with compact support . Anderson applied this approach to the Data Assimilation Research Testbed system , which has been used for realistic cases.
5. General form of ensemble-based filters for Gaussian models
In proceeding sections, we introduced Kalman-based filters. Originally Kalman filter applies linear model and linear measurement function. Further, EKF and EnKF were developed to address nonlinear models. However, the measurement functions are still assumed to be linear. Eqs. (46) and (47) can directly evaluate nonlinear measurement functions but they were proposed intuitionally and not proven yet. In this section, we will present a general form for nonlinear measurement function and further prove Eqs. (46) and (47) mathematically using the general form.
where f and h are nonlinear operators of model and measurement. x is model state and y is the observation. and are the model errors and observed errors, respectively, which have variance
Denoting , , , we have
The optimal estimate asks the trace of minimum, namely,
where we invoked the below properties:
Thus, we have the optimal estimate filter:
Eqs. (75)–(77) give a general algorithm for Gaussian nonlinear model and nonlinear measurement function. The first term of Eq. (74) can be interpreted as the cross-covariance between the state and observation errors, and the remaining expression can be interpreted as the error covariance of the difference between model observation and observation itself. Here, is defined as the error between the noisy observation and its prediction .
If the model is linear, obviously,
If the measurement function is linear, i.e.
So, Kalman gain
In the standard KF, the state error covariance is updated at each analysis cycle during the measurement update process. Updating the error covariance matrix is important because it represents the change in forecast error covariance when a measurement is performed. The EnKF implementation does not require the covariance update equation because it can directly calculate the updated error covariance matrix from a set of ensemble members. Evensen  has derived the analysis of covariance equation that is consistent with the standard KF error covariance to update Eq. (28). But the true representation of the updated error covariance requires a large ensemble size, which is often computationally infeasible.
The general form of the Kalman gain makes use of the reformulated error covariance. In a broad sense, the above algorithm implicitly uses the prior covariance update equation (or the analysis error covariance matrix) to calculate the forecast error covariance. Thus, the above algorithm is fully consistent with the time update and measurement update formulation of the Kalman filter algorithm. On this basis, one can develop a new type of Kalman filter that chooses the ensemble members deterministically in such a way that they can capture the statistical moments of the nonlinear model accurately. In the next subsection, we will discuss the new type of Kalman filter, called sigma-point Kalman filter, based on the above algorithm.
6. Sigma-point Kalman filters (SPKF)
6.1. Basics of SPKF
EnKF was developed in order to overcome the linearization of nonlinear models. As introduced earlier, the idea behind EnKF is to ‘integrate’ Fokker-Plank equation using ensemble technique to estimate the forecast error covariance. Theoretically, if the ensemble size is infinite, the estimate approaches the true value. However, in reality, we can only use finite ensemble size, even very small size for many problems, leading to truncation errors. Thus, some concerns exist such as how to wisely generate finite samples for the optimal estimate of prediction error covariance, how much the least ensemble size is for an efficient estimate of error covariance and how much the true error covariance can be taken into account in the EnKF, given an ensemble size. In this section, we will introduce a new ensemble technique for EnKF, which is called sigma-point Kalman filter (SPKF).
The so-called sigma-point approach is based on deterministic sampling of state distribution to calculate the approximate covariance matrices for the standard Kalman filter equations. The family of SPKF algorithms includes the unscented Kalman filter (UKF ), the central difference Kalman filter (CDKF ) and their square root versions . Another interpretation of the sigma-point approach is that it implicitly performs a statistical linearization of the nonlinear model through a weighted statistical linear regression (WSLR) to calculate the covariance matrices . In SPKF, the model linearization is done through a linear regression between a number of points (called sigma points) drawn from a prior distribution of a random variable rather than through a truncated Taylor series expansion at a single point. It has been found that this linearization is much more accurate than a truncated Taylor series linearization . Eqs. (80)–(82) construct a core of SPKF. A central issue here is how to generate the optimal ensemble members for applying these equations. There are two basic approaches aforementioned, UKF and CDKF. For an L-dimensional dynamical system represented by a set of discretized state-space equations of (67), it has been proven that ensemble members, constructed by UKF or CDKF, can precisely estimate the mean and covariance. We ignore the theoretical proof and only outline the UKF scheme as below.
Denote sigma points at time k for producing ensemble members by , which that is defined according to the following expressions:
where is the sum of the dimensions of model states, model noise and measurement noise. The augmented state vector is a L-dimensional vector. is the covariance of the augmented state vector (analysis) at the previous step. is the ith row (column) of the weighted matrix square root of the covariance matrix (L dimension). c is a scale parameter that will be specified later. The key point here is to produce ensemble members by integrating model with initial conditions of Eqs. (84)–(86); by these ensemble members, the filter Eqs. (80)–(82) will be performed.
The procedure is summarized as below:
Initially, perturb a small amount, denoted by on initial condition , using Evensen method ; and also randomly generate perturbation for q and r, drawn from normal distributions of and . Thus, we can construct the augmented state vector and corresponding covariance ()
E87 E88 E89
Using the sigma points to integrate state-space model. For the ith sigma point, we have . When i varies from 1 to , we produce ensemble members, from which analysis mean and covariance will be obtained, which are in turn used to produce sigma points for next step , to form a recursive algorithm. Suppose we have ensembles, the analysis mean and the covariance are calculated as follows:
E90 E91 E92 E93 E94 E95 E96 E97 E98
E99 E100 E101 E102 E103
α and κ are tuning parameters. and . Often κ is chosen 0 as default value and .
From , as well choosing random perturbation for model noise η and observation noise , drawn from Gaussian distribution of and , we calculate sigma points using Eqs. (84)–(86), and repeat Step 2 and Step 3 and so on until the assimilation is completed for the entire period.
6.2. Remarks of SPKF
SPKF chooses the ensemble members deterministically while EnKF uses random perturbation to generate ensemble members;
the number of sigma points is a fixed value as , while the ensemble size in EnKF is pre-specified;
SPKF uses Eq. (98) to update the error covariance matrix, while EnKF does not update explicitly the error covariance matrix; and
Sigma points are calculated using Eqs. (84)–(86) every time when the observation is available, while the ensemble members in EnKF only perturbed in the initial time. Recent application of SPKF on a realistic oceanic model indicates that the SPKF is better than the EnKF in the similar level of computational cost .
In SPKF, the number of sigma points is , here L is the dimension of the augmented state vector , i.e. is the sum of model state, model noise and observation noise. Usually, L is the order 103–104, so the computational expense is a huge challenge in SPKF for realistic problems. A solution is to use the truncated singular-value decomposition (TSVD) to reduce the sigma points. As seen from Eqs. (84)–(86), the is a matrix, thus the dimension of determines the ensemble size. Suppose that can be expressed as
where , . Eqs.(109) and (110) transfer the covariance matrix to data matrix in constructing sigma points. The largest advantage is to avoid explicit expression of , which could be a very large matrix beyond memory of current computers. However, Eqs.(109) and (110) cannot reduce the ensemble size . A solution is to decompose, such as principal component analysis, as used in . Further discussions on optimal construction of sigma points should be conducted for a realistic application of SPKF.
Noise and model state analyses in constructing sigma points at k step are independent. It should be noted that is from Eq. (97) and noise are draw from a Gaussian distribution. If we assume that noise is taken randomly each time, is only relevant to noise that is drawn at time step k, and independent with model noise and observation noise drawn for analysis of the time step , thus, is a diagonal block matrix, i.e.
There are no update equations for noise, so they are randomly taken from Gaussian distribution, i.e. the index i in and actually does not have meaning. Thus, it should be a reasonable assumption that the and , used for constructing sigma points at time step , are not related to (time step of k), as argued above. Thus, Eq. (108) always holds unless the noise is designed considering the temporal coherence such as red noise in time.
Based on the above, the actual ensemble size is , and not . This is because neither model noise nor observation noise can produce ensemble alone. Model errors and must be joined together to produce ensemble members with . Let us see this in details: at the initial time, initial perturbation on model states plus drawn noise for model errors and measurement errors are with mean and variance as follows:
Theoretically, there are ensembles, denoted by the ith column of (; ; ) and formula (84)–(86). However, at the ith column, the elements of the row, indicating the model inputs , only have the non-zero values of . Obviously, the sigma points of zero-values makes the update equation invalid, thus, the actual ensemble size is .
When truncation technique is applied to reduce the ensemble size, the ensemble spread might be shrunk due to relatively small ensemble size. Like EnKF, an inflation approach of SPKF might be helpful. It is interested in developing such a scheme for SPKF. Also, we can localize SPKF, like localized EnKF, to solve memory and computation issues.
All of the remarks of SPKF are from the authors’ thinking and understanding. It is interesting to further test and validate these ideas and properties using simple models.
7. Beyond Kalman filters: particle filter and its derivatives
7.1. Standard particle filter
We have introduced the Kalman filter (KF), extended Kalman filter (EKF), ensemble Kalman filter (EnKF) and sigma-point Kalman filter (SPKF) in previous sections. All of those filters belong to the sequential data assimilation method, i.e. observation data is assimilated into the model system as soon as it is available. The Bayesian estimation theory provides a general framework of the sequential data assimilation methods. If we assume the state-space model is given by Eqs. (34) and (35), the analysis step of a Bayesian-based assimilation method is deduced by Bayes’ theorem:
where plays as a normalization factor.
Recalling Section 2.3, Eq. (12) actually assumes that the prior probability density function and the likelihood function are Gaussian distribution functions, and thus the posterior probability density function is also a Gaussian. Based on the Gaussian assumption, the cost function of 3D-Var (i.e. Eq. (6)) can be derived, and it is equivalent to the Kalman filter Eqs. (27)–(29). All the Kalman-based filters (e.g. EKF, EnKF, EnSRF, SPKF, etc.) contain the inherent Gaussian assumption, and they are derived and validated for Gaussian systems in theory. However, this Gaussian assumption is often not applicable for nonlinear systems. Even for an initial Gaussian error, it often becomes non-Gaussian while propagating forward with nonlinear models.
The particle filter (PF) is a sequential data assimilation method that is able to deal with the nonlinear and non-Gaussian state estimation problem. Like EnKF, PF also uses an ensemble, but it is used to approximately estimate the full probability density function rather than only the error covariance B. An ensemble member is also referred to as a particle in PF literatures. Suppose the prior probability density is the sum of Dirac delta functions
where are particles drawn from . The posterior probability density is derived by applying the Bayes’ theorem directly, that is
in which , and a normalization step, is required to make sum up to 1. If we assume the likelihood function is Gaussian, can be computed by
Or else we can use any specified probability density function of to compute the likelihood.
With the posterior probability density function , the analysis value and covariance can be computed by
and higher-order moments of the posterior state can also be estimated.
Before stepping forward to next stage, a resampling step is required to make each particle with uniform weight. A typical resampling strategy is the sequential importance resampling (SIR) that removes particles with very small weights and duplicates those with large weights. A detailed algorithm of SIR can be found in . The resampling algorithm gives the indices and number of copies of those particles that should be duplicated, i.e. computes according to the weights, where each . And then are regarded as new particles.
In summary, the algorithm of standard particle filter is given below:
generate the initial ensemble as EnKF does;
integrate the model until the observation is available;
use Eq. (118) to compute the weight for each particle, and normalize them;
apply the resampling algorithm to derive the resampling indices, and derive the new ensemble ; and
repeat Steps 2–5 until the end of assimilation period.
The standard particle filter  is also known as the bootstrap particle filter or SIR particle filter.
7.2. Variants of PF
The particle filter is a highly promising technique because it does not invoke any Gaussian assumptions. It has been widely used and studied in many other fields. The PF estimates the full probability density function of the forecasted state based on an ensemble of states with different weights. However, the PF suffers from the problem of filter degeneracy, i.e. the procedure collapses to a very small number of highly weighted particles among a horde of almost useless particles carrying a tiny proportion of the probability mass. Even if resampling techniques are used, the degeneracy cannot be completely avoided with limited ensemble size. The number of particles must grow substantially with the dimension of the system to avoid degeneracy [37, 38], a requirement that is apparently too costly for large models such as GCMs. Various efforts have been made to resolve this issue, as documented in an excellent overview .
Several strategies are often employed to address the problem of filter degeneracy in applications of the particle filter. For example, Papadakis et al. proposed a weighted ensemble Kalman filter (WEnKF)  that uses an ensemble-based Kalman filter as the proposal density from which the particles are drawn. Van Leeuwen et al. developed a fully nonlinear particle filter by exploiting the freedom of the proposal transition density, which ensures not only that all particles ultimately occupy high-probability regions of state-space but also that most of the particles have similar weights . The implicit particle filter uses gradient descent minimization combined with random maps to find the region of high probability, avoiding the calculation of Hessians . Luo et al. have proposed an efficient particle filter that uses residual nudging to prevent the residual norm of the state estimates from exceeding a pre-specified threshold . These particle filters were very recently proposed and have attracted broad attention in the community of atmos./ocean. data assimilation. Below, we will briefly introduce the equivalent weights particle filter (EWPF) by Van Leeuwen [39, 41].
The equivalent weights particle filter is a fully nonlinear data assimilation method that works in a two-stage process. It uses the proposal density to ensure that the particles have almost equivalent weights, by which the filter degeneracy can be avoided.
In the standard PF, the particles at time step t are propagated by the original model, i.e. , which implies that the particles at time step are drawn from the transition density . In that case, the weight of each varies greatly and filter degeneracy is very likely to happen.
In EWPF, another transition density, call the proposal density, is introduced. The proposal density depends on the future observation and all previous particles . With the help of proposal density, the particle is propagated using a different model
The model g can be anything, for instance, one can add a relaxation term and change random forcing:
where is a function of the time between observations, and each k implies each model step without observation. A is a relaxation term that will ‘drag’ the particle towards future observation. In , it is given by
where the matrices Q and R correspond to the model error covariance and observation error covariance, respectively.
The second stage of EWPF involves updating each particle at the observation time via the formula
where are scalers computed so as to make the weights of the particles equal. Using the expression for weights and setting all weights equal to a target weight (e.g. )
can be solved by numerical methods.
Eqs. (122)–(125) show an example of how to construct the proposal model g in(121)), it can also be done by running 4D-var on each particle (implicit particle filter), or using the EnKF as proposal density. Those methods refer to Morzfeld et al.  and Papadakis et al. .
7.3. Remarks of PF
7.3.1. Combined method of EnKF and PF
The ensemble Kalman particle filter (EnKPF) is a combination of the EnKF and the SIR particle filter. It was recently introduced to address non-Gaussian features in data assimilation for highly nonlinear systems, by providing a continuous interpolation between the EnKF and SIR-PF analysis schemes .
As stated above, both EnKF and PF methods are based on the Bayesian estimation theory, but they approximate the probability density function of the state in different ways. The EnKF only approximates the mean and covariance of the state through a series of equally weighted ensemble members. And the particle filter considers the weights of the ensemble members according to the likelihoods. The EnKF contains the Gaussian assumption but requires relatively small ensemble size to prevent filter degeneracy, which is in contrast with the PF.
The EnKPF takes advantage of both methods by combining the analysis schemes of the EnKF and the SIR-PF using a controllable index (i.e. tuning parameter). In contrast with both the EnKF and the SIR-PF, the analysis scheme of the EnKPF not only updates the ensemble members but also considers the weights.
Assume that the forecast ensemble and the observation data y are available, and that the forecast covariance can be calculated using the ensemble, the analysis scheme of EnKPF is given below.
Choose and apply the EnKF that is based on the inflated observation error covariance as follows:
E126 E127 E128
Compute the weights for each updated member as follows:
and normalize the weights by , in which ϕ is the probability density function of a Gaussian.
Calculate the resampling index for each member according to using the SIR algorithm, then set
1, i is a random observation error drawn from the Gaussian .
Compute , and generate
2, i from and EnKF with the inflated observation error again as follows: E131
7.3.2. Localization in PF
Previous sections have introduced the localization technique in EnKF, which greatly improves the performance of EnKF in high-dimensional models. The advantages of localization motivate the search for a localization procedure in particle filtering.
Van Leeuwen had a deep discussion on this topic . He argued that one can calculate the weights locally, but it is not easy for resampling. In the resampling step low-weight particles are abandoned and high-weight particles are duplicated. However, with local weights, different particles are selected in different parts of the domain. The problem is that we have to have continuous (in space) model fields to propagate forward in time with the model. Just constructing a new particle that consists of one particle in one part of the model domain and another particle in another domain will lead to problems at the boundary between these two.
The problem of spatial discontinuity makes the localization in particle filter not feasible currently. Most of the advanced particle filters (e.g. EWPFand implicit particle filter) are using the idea of global weight, i.e. the weight for each member is a scalar.
However, there are still some attempts on the localization in particle filter. For example, Poterjoy developed the localized particle filter (LPF) that updates particles locally using ideas borrowed from EnKF . The paper has demonstrated some advantages of the new filter over EnKF, especially when the observation networks consist of densely spaced measurements that relate nonlinearly to the model state. This is a very interesting work about the particle filter, it also has a potential to work with large atmos./ocean. data assimilation systems.
8. Remarks and conclusions
Data assimilation is the process by which observations of the actual system are incorporated into a numerical model to optimally estimate the system states. In this chapter, we introduced several ensemble-based data assimilation methods that are widely used in the earth sciences. One can read it as an introduction to ensemble-based data assimilation methods, but also can view it as a brief review of the application of these ensemble-based assimilation methods on the earth sciences. It is author’s effort to write such a ‘review’ chapter with introductory language, making it more readable. As found in the chapter, many discussions, derivations and analyses are actually very thoughtful, not only introducing these methods, but also deepening the understanding to them. This is emphasized by the analysis of the rationale behind each method, including: i). the principle for deriving the algorithm; ii) basic assumptions of each method; iii). the connection and relation of different methods (e.g., EKF and EnKF, EnKF and SPKF etc.); iv). the advantages and deficiencies of each method. Especially we put rather weights to discuss potential concerns, challenges and possible solutions when these methods are applied to high-dimensional systems in the earth sciences. This chapter can be a “textbook” for the beginners to learn these data assimilation algorithms, and also a good reference for researchers for better understanding and applying these methods.
This work was supported by the NSERC (Natural Sciences and Engineering Research Council of Canada) Discovery Grant, the National Science Foundation of China (41276029, 41321004, 41530961,91528304), the National Programme on Global Change and Air-Sea Interaction (GASI-IPOVAI-06)and the National Basic Research Program (2013CB430302).