Average of L-2 norm of errors for 100 runs of each configuration for the compared filter implementations.
This chapter discusses efficient and practical matrix-free implementations of the ensemble Kalman filter (EnKF) in order to account for localization during the assimilation of observations. In the EnKF context, an ensemble of model realizations is utilized in order to estimate the moments of its underlying error distribution. Since ensemble members come at high computational costs (owing to current operational model resolutions) ensemble sizes are constrained by the hundreds while, typically, their error distributions range in the order of millions. This induces spurious correlations in estimates of prior error correlations when these are approximated via the ensemble covariance matrix. Localization methods are commonly utilized in order to counteract this effect. EnKF implementations in this context are based on a modified Cholesky decomposition. Different flavours of Cholesky-based filters are discussed in this chapter. Furthermore, the computational effort in all formulations is linear with regard to model resolutions. Experimental tests are performed making use of the Lorenz 96 model. The results reveal that, in terms of root-mean-square-errors, all formulations perform equivalently.
- ensemble Kalman filter
- modified Cholesky decomposition
- sampling methods
The ensemble Kalman filter (EnKF) is a sequential Monte Carlo method for parameter and state estimation in highly nonlinear models. The popularity of the EnKF owes to its simple formulation and relatively easy implementation. In the EnKF, an ensemble of model realizations is utilized in order to, among other things, estimate the moments of the prior error distribution (forecast distribution). During this estimation and the assimilation of observations, many challenges are present in practice. Some of them are addressed in this chapter: the first one is related to the proper estimation of background error correlations, which has a high impact on the quality of the analysis corrections while the last one is related to the proper estimation of the posterior ensemble when the observation operator is nonlinear. In all cases, background error correlations are estimated based on the Bickel and Levina estimator .
We want to estimate the state of a dynamical system , which (approximately) evolves according to some (imperfect) numerical model operators , for instance, a model which mimics the behavior of the atmosphere and/or the ocean where is the model dimension (typically associated with the model resolution). The estimation process is based on two sources of information: a prior estimate of , with , and a noisy observation with , where is the number of observed model components, is the estimated data error covariance matrix, is the (nonlinear) observation operator that maps the information from the model domain to the observation space, is the nth dimensional vector whose components are all zeros. In practice, or . The assimilation process can be performed by using sequential data assimilation methods, and one of the most widely used methods is the ensemble Kalman filter, which is detailed in the following paragraphs.
2.1. The ensemble Kalman filter
In the ensemble Kalman filter (EnKF) , an ensemble of model realizations
is utilized in order to estimate the moments of the background error distribution,
via the empirical moments of the ensemble (1) and therefore,
where is the ensemble size, is the ith ensemble member, for , is well known as the background state, stands for the background error covariance matrix, is the ensemble mean, and is the ensemble covariance matrix. Likewise, the matrix of member deviations reads,
A variety of EnKF implementations have been proposed in the current literature, most of them rely on basic assumptions over the moments of two probability distributions: the background (prior) and the analysis (posterior) distributions. In most of EnKF formulations, normal assumptions are done over both error distributions.
In general, depending on how the assimilation process is performed, the EnKF implementation will fall in one of two categories: stochastic filter  or deterministic filter . In the context of stochastic filters, for instance, the assimilation of observations can be performed by using any of the next formulations,
where the analysis covariance matrix is given by the matrix of innovations (on observations) is denoted by and the matrix of perturbed observations reads , where the columns of are formed by samples from a normal distribution with moments . In practice, one of the most utilized formulations is given by (8).
2.2. Cholesky and modified Cholesky decomposition
The forms of the Cholesky and the modified Cholesky decomposition were discussed by Golub and Van Loan in . If is symmetric positive definite, then there exist a unique lower triangular with positive diagonal entries such that . If all the leading principal submatrices of are non-singular, then there exist unique lower triangular matrices and and a unique diagonal matrix such that If is symmetric, then and . Golub and Van Loan provide proofs of those decompositions, as well as various ways to compute them. We can use the Cholesky factor of a covariance matrix to quickly solve linear systems that involve a covariance matrix. Note that if we compute the Cholesky factorization and solve the triangular system and , then
For a symmetric matrix , the modified Cholesky decomposition involves finding a non-negative diagonal matrix such that is positive definite. In particular, whenever is positive definite, otherwise should be sufficiently small. This allows applying the usual Cholesky factorization to , i.e. finding matrices such that .
Recall . Thus, for . Let , the vector holding the ith row across all columns of for . Bickel and Levina in  discussed a modified Cholesky decomposition for the estimation of precision covariance matrix, and this allows in our context to estimate of by fitting regressions of the form:
then, , for .
3. Filters based on modified Cholesky decomposition
3.1. Ensemble Kalman filter based on modified Cholesky
In the ensemble Kalman filter based on a modified Cholesky decomposition (EnKF-MC) , the analysis ensemble is estimated by using the following equations,
In this context, error correlations are estimated via a modified Cholesky decomposition. This provides an estimate of the inverse background error covariance matrix of the form,
where is a lower-triangular matrix whose diagonal elements are all ones, and is a diagonal matrix. Even more, when only local effects are considered during the estimation of , sparse estimators of the precision background can be obtained. Furthermore, the matrix can be sparse with only a few non-zero elements per row. Typically, the number of non-zero elements depends on the radius of influence during the estimation of background error correlations. For instance, in the one-dimensional case, the radius of influence denotes the maximum number of non-zero elements, per row, in . The EnKF-MC is then obtained by plugging in the estimator (3) in (6). Given the structure of the Cholesky factors, the EnKF-MC can be seen as a matrix-free implementation of the EnKF.
Recall that the precision analysis covariance matrix reads,
moreover, since can be written as a sum of rank-one matrices, the factors (3) can be updated in order to obtain an estimate of the inverse analysis covariance matrix. The next section discussed such approximation in detail.
3.1.1. Posterior ensemble Kalman filter stochastic (PEnKF-S)
based on the background ensemble , where is the analysis state and is the analysis covariance matrix. Consider the estimate of the inverse background error covariance matrix Eq. (3), the precision analysis covariance matrix Eq. (4) can be approximated as follows:
where . The matrix (17) can be written as follows:
where denotes the ith column of matrix , for . Consider the sequence of factor updates,
where , for , and
We can use of Dolittle’s method in order to compute the factors and in (9), and it is enough to note that,
After some math simplifications, the next equations are obtained,
Algorithm 1. Rank-one update for the factors and
1. function UPD_CHOLESKY_FACTORS
2. Compute from
3. for do
4. Compute via Eq. (10).
5. Set .
6. for do
7. Compute according to (11).
8. end for
9. end for
10. Set and .
11. return ,
12. end function
Algorithm 1 can be used in order to update the factors of for all column vectors in , and this process is detailed in Algorithm 2.
Algorithm 2. Computing the factors and of
13. function COMPUTE_ANALYSIS_FACTOR
14. Set .
15. for do
17. end for
18. return ,
19. end function
Once the updating process has been performed, the resulting factors form an estimate of the inverse analysis covariance matrix,
From this covariance matrix, the posterior mode of the distribution can be approximated as follows:
with . Note that the linear system (23) involves lower and upper triangular matrices, and therefore, can be estimated without the need of matrix inversion. Once the posterior mode is computed, the analysis ensemble is built about it. Note that reads , and therefore, a square root of can be approximated as follows:
which can be utilized in order to build the analysis ensemble,
where is given by the solution of the linear system,
In addition, the columns of are formed by samples from a multivariate standard normal distribution. Again, since is lower triangular, the solution of (26) can be obtained readily.
3.1.2. Posterior ensemble Kalman filter deterministic (PEnKF-D)
The deterministic posterior ensemble Kalman filter (PEnKF-D) is a square root formulation of the PEnKF-S. The main difference between them lies on the use of synthetic data. In both methods, the matrices and are computed similarly (i.e., by using Dolittle’s method and the algorithms described in the previous section).
In the PEnKF-D, the computation of is performed (13) based on the empirical moments of the ensemble. The analysis update is performed by using the perturbations
which are consistent with the dynamics of the numerical model, for instance, samples from (27) are driven by the physics and the dynamics of the numerical model. Finally, the analysis equation of the PEnKF-D
Since the moments of the posterior distribution are unchanged, we expect both methods PEnKF-S and PEnKF-D to perform equivalently in terms of errors during the estimation process.
3.2. Markov Chain Monte Carlo-based filters
Definitely, the EnKF represents a breakthrough in the data assimilation context, perhaps its biggest appeal is that we can obtain a closed form expression for the analysis members. Nevertheless, as can be noted, during the derivation of the analysis equations, some constrains are imposed, for instance, the observation operator is assumed linear, and therefore, the likelihood obeys a Gaussian distribution. Of course, in practice, this assumption can be easily broken, and therefore, bias can be introduced on the posterior prediction; this is notoriously significant if one considers the fields in which data assimilation lives are submerged. From a statistical point of view, more specifically, under the Bayesian framework, this situation can be summarized as follows:
The prediction is obtained from the posterior distribution: .
The information from numerical model is known like the prior of background:
The information from the observations is incorporated through the likelihood: .
The posterior distribution is calculated: .
If and are Gaussian, is Gaussian too, and the analysis is equal to the mean of .
A significant number of approaches have been proposed in the current literature in order to deal with these constraints; for instance, the particle filters (PFs) and the maximum likelihood ensemble filter (MLEF) are ensemble-based methods, which deal with nonlinear observation operators. Unfortunately, in the context of PF, its use is questionable under realistic weather forecast scenarios since the number of particles (ensemble members) increases exponentially regarding the number of components in the model state. Anyways, an extended analysis of these methods exceeds the scope of this document, but its exploration is highly recommended.
Taking samples directly from the posterior distribution is a strategy that can help to remove the bias induced by wrong assumptions on the posterior distribution (i.e., normal assumptions). We do not put the sights on finding the mode of the posterior distribution; instead of this, we want a set of state vectors that allow us to create a satisfactory representation of the posterior error distribution, then, based on these samples, it is possible to estimate moments of the posterior distribution from which the analysis ensemble can be obtained .
Hereafter, we will construct a modification of the sequential scheme of data assimilation; first, we will describe how to compute the analysis members by using variations in Markov Chain Monte Carlo (MCMC)-based methods, and then, we will include the modified Cholesky decomposition in order to estimate a precision background covariance.
An overview of the proposed method is as follows:
The forecast step is unchanged; the forecast ensemble is obtained by applying the model to the ensemble of states of the previous assimilation cycle.
The analysis step is modified, so that the analysis is not obtained anymore by, for instance (28), but we perform k iterations of an algorithm from the Markov Chain Monte Carlo (MCMC) family in order to obtain samples from the posterior distribution.
In order to be more specific in the explanation, let us define Metropolis-Hastings (MH) as the selected algorithm from the MCMC family. Now let us focus on MCMC in the most intuitive way possible. Let us define equal to the posterior pdf obtained from the expression . MH explores the state space in order to include model states in a so-called Markov Chain. The selection of candidates is based on the condition , that is, if the value of for a candidate is greater than that for a previous vector . Candidates are generated from a pre-defined proposal distribution ; generally, a Gaussian multivariate distribution with mean is chosen, and a covariance matrix is defined in order to handle the step sizes. Concisely, MCMC methods proceed as follows: at first iteration, the chain is started with vector. At end, the sample is obtained by extracting the last vectors on the chain, and the other vectors are dismissed or “burned”. In (32), we can see the expression to calculate the acceptance probability that relates the target pdf, , with the proposal
The terms that involve can be canceled if a symmetric distribution is chosen as the proposal one, for instance, in the Gaussian distribution case. The procedure for the calculation of the analysis applying MH is described below:
Initialize the Markov Chain, , assigning the background value to .
Generate a candidate vector state , from .
Obtain from a uniform (0, 1) distribution.
If Then: Else:
Repeat Steps 2 through 4, for until
Remove the first vectors of the chain, the burned ones.
The analysis is computed over the sample:
MCMC methods are straightforward to implement; when an enough number of iterations is performed, the behavior of the target function is captured on the chain . This is true for even, complex density functions such as those with multiple modes. Briefly, let us focus on the fact that, generally, simulation-based methods such as MCMC explore a discretized grid, and as the mesh is refined, a huge number of iterations are needed before a high probability zone of the posterior distribution is reached . Concretely, Hu et al.  proposed a family of modified MCMC dimension-independent algorithms under the name of preconditioned Crank-Nicolson (pCN) MCMC. These methods are robust regarding the curse of dimensionality in the statistical context. Initially, the Crank-Nicolson discretization is applied to a stochastic partial differential equation (SPDE) in order to obtain a new expression for the proposal distribution:
where and . Of course, we use like a computationally efficient estimation of . The acceptance probability is defined in (35):
The procedure for the calculation of the analysis applying pCN Metropolis-Hastings is described as follows:
Initialize the Markov Chain, , assigning the background value to .
Generate a candidate vector state using (3): .
Obtain from a uniform (0, 1) distribution.
If , then . Else:
Repeat Steps 2 through 4, for until .
Remove the first vectors of the chain, the burned ones.
The analysis is calculated over the sample: .
Finally, in the data assimilation context, by using the modified Cholesky decomposition described in the earlier sections, the pCN-MH filter reads:
The forecast step remains unchanged; the forecast ensemble is obtained by applying the model to the ensemble of states of the previous iteration, but the estimation of background covariance matrix is calculated by modified Cholesky.
The update step is modified, so that the analysis is obtained by run k iterations of pCN-MH, to obtain a sample of the posterior error distribution.
Now we are ready to numerically test the methods discussed in this chapter.
4. Experimental results
4.1. Stochastic filter PEnKF-S
We assess the accuracy of the PEnKF-S and compare it against that of the LETFK implementation proposed by Hunt . The numerical model is the Lorenz 96 model , which mimics the behavior of the atmosphere:
where is the number of components in the model and is an external force; with the value of the parameter, , the model presents a great entropy.
The experimental settings are described below:
An initial random solution is propagated for a while in time by using Lorenz 96 model and a fourth-order Runge Kutta method in order to obtain a vector state whose physics are consistent with the dynamics of such numerical model. This vector state serves as our reference solution.
The reference solution is perturbed by using samples from a normal distribution with parameters . Three different values for are considered during the numerical experiments . This perturbed state is propagated in time in order to make it consistent with the physics and dynamics of the numerical model. From here, an initial background state is obtained.
A similar procedure is performed in order to build a perturbed ensemble about . The ensemble members are propagated in time from where an ensemble of model realizations of model is obtained.
The assimilation windows consist of 15 equidistant observations. The frequency of observations is 0.5 time units, which represents 3.5 days in the atmosphere.
The dimension of the vector state is . The external force of the numerical model is set to .
The number of observed components is 50% of the dimension of the vector state.
Three ensemble sizes are tried during the experiments .
As a measure of quality, the norm of the analysis state and the reference solution are computed across assimilation steps.
A total of 100 runs are performed for each pair . For each run, a different initial random vector is utilized in order to build the initial perturbed reference solution (before the model is applied ). This yields to different initial ensembles as well as synthetic data for the different runs of each configuration (pair).
The average of the error norms of each pair for the LETKF and the PEnKF implementations is shown in Table 1 . As can be seen, on average across 100 runs, the performance of the proposed EnKF implementation outperforms that of the LETKF in terms of norm of the error. Even more, the PEnKF-S seems to be invariant to the initial background error since, in all cases, when the ensemble size is increased a better estimation of the reference state at different observation times is obtained. This can also obey to the estimation of background error correlations via the modified Cholesky decomposition since it is drastically improved whenever the ensemble size is increased as pointed out by Bickel and Levina in . In such case, the error decreases by . This is crucial in the PEnKF-S formulation since estimates of the precision analysis covariance matrix are obtained by rank-one updates on the inverse background error covariance matrix. On the other hand, in the LETKF context, increasing the ensemble size can improve the accuracy of the method but that is not better than the one shown by the PEnKF.
Some plots of the norm of error for the PEnKF and the LETKF across different configurations and runs are shown in Figure 1 . Note that the error of the PEnKF decreases aggressively since the earlier iterations. In the LETKF context, the accuracy is similar to that of the PEnKF only at the end of the assimilation window.
4.2. Deterministic filter PEnKF-D
Holding the settings from the previous section, experiments are performed by using the PEnKF-D. The results have similar behavior to that of the LETFK and the PEnKF-S implementation proposed by Niño et al. in . This can be appreciated in Figures 1 – 3 . As can be seen, the error distributions reveal similar behavior across all compared filters.
4.3. MCMC filter based on modified Cholesky decomposition
We will describe experiments of our proposed filter based on a modified Cholesky decomposition and the preconditioned Crank-Nicolson Metropolis-Hastings. Again, the numerical model is the Lorenz 96 model. In this section, we are mainly interested on describing the behavior of the method, especially when the observational operator is nonlinear.
The experimental settings are as follows:
The observational operator is quadratic: , where denotes the ith component of the vector state .
The vector of states has components.
The ensemble size is .
The length of the Markov Chain is 1000.
The proposal is pCN with .
Figure 4 describes the distribution of six of the 40 components of the state vector after 1000 iterations of the algorithm, attempting to visualize if the actual value is within or near the region of the highest probability density described by the sample contained in the Markov Chain.
Note that, not in all cases, the actual values of the model components were located at the peaks of the histogram, but most of them were within or near to zones of high probability described by the sample. This result is important if we take into account that probably the posterior distribution is not normal as is the case of quadratic observation operators.
In this chapter, efficient EnKF implementations were discussed. All of them are based on a modified Cholesky decomposition wherein a precision background covariance is obtained in terms of Cholesky factors. In the first filter, the PEnKF-S, synthetic data are utilized in order to compute the posterior members, as done in stochastic formulations of the filter. Even more, a sequence of rank-one updates can be applied over the factors of the prior precision matrix in order to estimate those of the posterior precision. In the second filter, the PEnKF-D, synthetic data are avoided by using perturbations obtained from the physics and the dynamics of the numerical model. Finally, a MCMC-based filter is obtained in order to reduce the impact of bias when Gaussian assumptions are broken during the assimilation of observations, for instance, when the observation operator is nonlinear. Numerical experiments with the Lorenz 96 model reveal that the proposed filters are comparable to filters from the specialized literature.
This work was supported by the Applied Math and Computational Science Lab (AML-CS) at Universidad del Norte in Barranquilla, Colombia.