Efficient Matrix-Free Ensemble Kalman Filter Implementations: Accounting for Localization

This chapter discusses efficient and practical matrix-free implementations of the ensemble Kalman filter (EnKF) in order to account for localization during the assimilation of observations. In the EnKF context, an ensemble of model realizations is utilized in order to estimate the moments of its underlying error distribution. Since ensemble members come at high computational costs (owing to current operational model resolutions) ensemble sizes are constrained by the hundreds while, typically, their error distributions range in the order of millions. This induces spurious correlations in estimates of prior error correlations when these are approximated via the ensemble covariance matrix. Localization methods are commonly utilized in order to counteract this effect. EnKF implementations in this context are based on a modified Cholesky decomposition. Different flavours of Choleskybased filters are discussed in this chapter. Furthermore, the computational effort in all formulations is linear with regard to model resolutions. Experimental tests are performed making use of the Lorenz 96 model. The results reveal that, in terms of root-mean-squareerrors, all formulations perform equivalently.


Introduction
The ensemble Kalman filter (EnKF) is a sequential Monte Carlo method for parameter and state estimation in highly nonlinear models. The popularity of the EnKF owes to its simple formulation and relatively easy implementation. In the EnKF, an ensemble of model realizations is utilized in order to, among other things, estimate the moments of the prior error distribution (forecast distribution). During this estimation and the assimilation of observations, many challenges are present in practice. Some of them are addressed in this chapter: the first one is related to the proper estimation of background error correlations, which has a high impact on the quality of the analysis corrections while the last one is related to the proper estimation of the posterior ensemble when the observation operator is nonlinear. In all cases, background error correlations are estimated based on the Bickel and Levina estimator [1].

Preliminaries
We want to estimate the state of a dynamical system x * ∈ R nÂ1 , which (approximately) evolves according to some (imperfect) numerical model operators , for instance, a model which mimics the behavior of the atmosphere and/or the ocean where n is the model dimension (typically associated with the model resolution). The estimation process is based on two sources of information: a prior estimate of x * , x b ¼ x * þ ξ with ξ $ N 0 n ; B ðÞ , and a noisy observation y ¼ H x * ðÞ þ e with e $ N 0 m ; R ðÞ , where m is the number of observed model components, R mÂm is the estimated data error covariance matrix, H : R nÂ1 ! R mÂ1 is the (nonlinear) observation operator that maps the information from the model domain to the observation space, 0 n is the nth dimensional vector whose components are all zeros. In practice, m ≪ n or m < n. The assimilation process can be performed by using sequential data assimilation methods, and one of the most widely used methods is the ensemble Kalman filter, which is detailed in the following paragraphs.

The ensemble Kalman filter
In the ensemble Kalman filter (EnKF) [2], an ensemble of model realizations is utilized in order to estimate the moments of the background error distribution, via the empirical moments of the ensemble (1) and therefore, and where N is the ensemble size, x bi ½ ∈ R nÂ1 is the ith ensemble member, for 1 ≤ i ≤ N, x b ∈ R nÂ1 is well known as the background state, B ∈ R nÂn stands for the background error covariance matrix, x b is the ensemble mean, and P b is the ensemble covariance matrix. Likewise, the matrix of member deviations ΔX b ∈ R nÂN reads, A variety of EnKF implementations have been proposed in the current literature, most of them rely on basic assumptions over the moments of two probability distributions: the background (prior) and the analysis (posterior) distributions. In most of EnKF formulations, normal assumptions are done over both error distributions.
In general, depending on how the assimilation process is performed, the EnKF implementation will fall in one of two categories: stochastic filter [3] or deterministic filter [4]. In the context of stochastic filters, for instance, the assimilation of observations can be performed by using any of the next formulations, and where the analysis covariance matrix A ∈ R nÂn is given by where the columns of E ∈ R mÂN are formed by samples from a normal distribution with moments N 0; R ðÞ . In practice, one of the most utilized formulations is given by (8).

Cholesky and modified Cholesky decomposition
The forms of the Cholesky and the modified Cholesky decomposition were discussed by Golub and Van Loan in [5]. If A ∈ R nÂn is symmetric positive definite, then there exist a unique lower triangular L ∈ R nÂn with positive diagonal entries such that A ¼ L Á L T . If all the leading principal submatrices of A ∈ R nÂn are non-singular, then there exist unique lower triangular matrices L and M and a unique diagonal matrix Golub and Van Loan provide proofs of those decompositions, as well as various ways to compute them. We can use the Cholesky factor of a covariance matrix to quickly solve linear systems that involve a covariance matrix. Note that if we compute the Cholesky factorization and solve the triangular system L Á y ¼ b and L T Á x ¼ y, then For a symmetric matrix A ∈ R nÂn , the modified Cholesky decomposition involves finding a non-negative diagonal matrix E such that A þ E is positive definite. In particular, E ¼ 0 whenever A is positive definite, otherwise E kk should be sufficiently small. This allows applying the usual Cholesky factorization to A þ E, i.e. finding matrices L, D ∈ R nÂn such that vector holding the ith row across all columns of ΔX b for 1 ≤ i ≤ n. Bickel and Levina in [1] discussed a modified Cholesky decomposition for the estimation of precision covariance matrix, and this allows in our context to estimate of B À1 by fitting regressions of the form: then, L fg ij ¼Àβ i, j , for 1 ≤ i < j ≤ n.

Ensemble Kalman filter based on modified Cholesky
In the ensemble Kalman filter based on a modified Cholesky decomposition (EnKF-MC) [6], the analysis ensemble is estimated by using the following equations, and In this context, error correlations are estimated via a modified Cholesky decomposition. This provides an estimate of the inverse background error covariance matrix of the form, or equivalently where L ∈ R nÂn is a lower-triangular matrix whose diagonal elements are all ones, and D ∈ R nÂn is a diagonal matrix. Even more, when only local effects are considered during the , sparse estimators of the precision background can be obtained. Furthermore, the matrix L can be sparse with only a few non-zero elements per row. Typically, the number of non-zero elements depends on the radius of influence during the estimation of background error correlations. For instance, in the one-dimensional case, the radius of influence denotes the maximum number of non-zero elements, per row, in L. The EnKF-MC is then obtained by plugging in the estimator (3) in (6). Given the structure of the Cholesky factors, the EnKF-MC can be seen as a matrix-free implementation of the EnKF.
Recall that the precision analysis covariance matrix reads, moreover, since H T Á R À1 Á H ∈ R nÂn can be written as a sum of m rank-one matrices, the factors (3) can be updated in order to obtain an estimate of the inverse analysis covariance matrix. The next section discussed such approximation in detail.

Posterior ensemble Kalman filter stochastic (PEnKF-S)
In the stochastic posterior ensemble Kalman filter (PEnKF-S) [7,8], we want to estimate the moments of the analysis distribution, based on the background ensemble X b ÀÁ , where x a is the analysis state and A ∈ R nÂn is the analysis covariance matrix. Consider the estimate of the inverse background error covariance matrix Eq. (3), the precision analysis covariance matrix Eq. (4) can be approximated as follows: (17) can be written as follows: where x i denotes the ith column of matrix X, for 1 ≤ i ≤ m. Consider the sequence of factor updates, We can use of Dolittle's method in order to compute the factorsD i ðÞ andL i ðÞ in (9), and it is enough to note that, |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl ffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl ffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl ffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl ffl} After some math simplifications, the next equations are obtained, for 1 ≤ k < n and 1 ≤ j ≤ k À 1. The set of Eqs. (19) and (20) can be used in order to derive an algorithm for rank-one update of Cholesky factors, and the updating process is shown in Algorithm 1.
Algorithm 1. Rank-one update for the factors L iÀ1 ðÞ and D iÀ1 Computed k via Eq. (10).

6.
for Computel kk according to (11).  Once the updating process has been performed, the resulting factors form an estimate of the inverse analysis covariance matrix, From this covariance matrix, the posterior mode of the distribution can be approximated as follows: where Note that the linear system (23) involves lower and upper triangular matrices, and therefore, x a can be estimated without the need of matrix inversion. Once the posterior mode is computed, the analysis ensemble is built about it. Note hi ÀT , and therefore, a square root of b A can be approximated as follows: which can be utilized in order to build the analysis ensemble, where ΔX a ∈ R nÂN is given by the solution of the linear system, In addition, the columns of W ∈ R nÂN are formed by samples from a multivariate standard normal distribution. Again, since L m ðÞ is lower triangular, the solution of (26) can be obtained readily.

Posterior ensemble Kalman filter deterministic (PEnKF-D)
The deterministic posterior ensemble Kalman filter (PEnKF-D) is a square root formulation of the PEnKF-S. The main difference between them lies on the use of synthetic data. In both methods, the matrices A À1 and B À1 are computed similarly (i.e., by using Dolittle's method and the algorithms described in the previous section).
In the PEnKF-D, the computation of b B À1 is performed (13) based on the empirical moments of the ensemble. The analysis update is performed by using the perturbations which are consistent with the dynamics of the numerical model, for instance, samples from (27) are driven by the physics and the dynamics of the numerical model. Finally, the analysis equation of the PEnKF-D where and Since the moments of the posterior distribution are unchanged, we expect both methods PEnKF-S and PEnKF-D to perform equivalently in terms of errors during the estimation process.

Markov Chain Monte Carlo-based filters
Definitely, the EnKF represents a breakthrough in the data assimilation context, perhaps its biggest appeal is that we can obtain a closed form expression for the analysis members. Nevertheless, as can be noted, during the derivation of the analysis equations, some constrains are imposed, for instance, the observation operator is assumed linear, and therefore, the likelihood P yjx ðÞ obeys a Gaussian distribution. Of course, in practice, this assumption can be easily broken, and therefore, bias can be introduced on the posterior prediction; this is notoriously significant if one considers the fields in which data assimilation lives are submerged. From a statistical point of view, more specifically, under the Bayesian framework, this situation can be summarized as follows: • The prediction is obtained from the posterior distribution: P xjy ðÞ .
• The information from numerical model is known like the prior of background: P b x ðÞ : • The information from the observations is incorporated through the likelihood: P yjx ðÞ .
• If P b x ðÞand P yjx ðÞ are Gaussian, P yjx ðÞ is Gaussian too, and the analysis is equal to the mean of P yjx ðÞ .
A significant number of approaches have been proposed in the current literature in order to deal with these constraints; for instance, the particle filters (PFs) and the maximum likelihood ensemble filter (MLEF) are ensemble-based methods, which deal with nonlinear observation operators. Unfortunately, in the context of PF, its use is questionable under realistic weather forecast scenarios since the number of particles (ensemble members) increases exponentially regarding the number of components in the model state. Anyways, an extended analysis of these methods exceeds the scope of this document, but its exploration is highly recommended.
Taking samples directly from the posterior distribution is a strategy that can help to remove the bias induced by wrong assumptions on the posterior distribution (i.e., normal assumptions). We do not put the sights on finding the mode of the posterior distribution; instead of this, we want a set of state vectors that allow us to create a satisfactory representation of the posterior error distribution, then, based on these samples, it is possible to estimate moments of the posterior distribution from which the analysis ensemble can be obtained [9].
Hereafter, we will construct a modification of the sequential scheme of data assimilation; first, we will describe how to compute the analysis members by using variations in Markov Chain Monte Carlo (MCMC)-based methods, and then, we will include the modified Cholesky decomposition in order to estimate a precision background covariance.
An overview of the proposed method is as follows: 1. The forecast step is unchanged; the forecast ensemble is obtained by applying the model to the ensemble of states of the previous assimilation cycle.
2. The analysis step is modified, so that the analysis is not obtained anymore by, for instance (28), but we perform k iterations of an algorithm from the Markov Chain Monte Carlo (MCMC) family in order to obtain samples from the posterior distribution.
In order to be more specific in the explanation, let us define Metropolis-Hastings (MH) as the selected algorithm from the MCMC family. Now let us focus on MCMC in the most intuitive way possible. Let us define Jx ; y ðÞ equal to the posterior pdf obtained from the expression P xjy ðÞ ∝ P b x ðÞÁP yjx ðÞ . MH explores the state space in order to include model states in a socalled Markov Chain. The selection of candidates is based on the condition x c ðÞ ; y ÀÁ > Jx tÀ1 ðÞ ; y ÀÁ , that is, if the value of J Á ðÞfor a candidate x c ðÞ is greater than that for a previous vector x tÀ1 ðÞ . Candidates are generated from a pre-defined proposal distribution Q Á ðÞ; generally, a Gaussian multivariate distribution with mean x tÀ1 ðÞ is chosen, and a covariance matrix is defined in order to handle the step sizes. Concisely, MCMC methods proceed as follows: at first iteration, the chain is started with x b vector. At end, the sample is obtained by extracting the last t vectors on the chain, and the other vectors are dismissed or "burned".I n (32), we can see the expression to calculate the acceptance probability that relates the target pdf, J Á The terms that involve Q Á ðÞ can be canceled if a symmetric distribution is chosen as the proposal one, for instance, in the Gaussian distribution case. The procedure for the calculation of the analysis applying MH is described below: 1. Initialize the Markov Chain, C, assigning the background value x b to C 0 ðÞ .
5. Repeat Steps 2 through 4, for t ¼ 1 until t ¼ k À 1 6. Remove the first p vectors of the chain, the burned ones.
The analysis is computed over the sample: MCMC methods are straightforward to implement; when an enough number of iterations is performed, the behavior of the target function is captured on the chain [10]. This is true for even, complex density functions such as those with multiple modes. Briefly, let us focus on the fact that, generally, simulation-based methods such as MCMC explore a discretized grid, and as the mesh is refined, a huge number of iterations are needed before a high probability zone of the posterior distribution is reached [11]. Concretely, Hu et al. [12] proposed a family of modified MCMC dimension-independent algorithms under the name of preconditioned Crank-Nicolson (pCN) MCMC. These methods are robust regarding the curse of dimensionality in the statistical context. Initially, the Crank-Nicolson discretization is applied to a stochastic partial differential equation (SPDE) in order to obtain a new expression for the proposal distribution: where L is the precision matrix of B, K is a preconditioning matrix, e 0 is white noise, if K ¼ B and δ ∈ 0; 2 ½ in (33), we get the pCN proposal described in (34): where ωeN 0; B ðÞ and β ¼ 8δ= 2 þ δ ðÞ 2 . Of course, we use P b like a computationally efficient estimation of B. The acceptance probability is defined in (35): The procedure for the calculation of the analysis applying pCN Metropolis-Hastings is described as follows: 1. Initialize the Markov Chain, C, assigning the background value x b to C 0 ðÞ .

6.
Remove the first p vectors of the chain, the burned ones.
7. The analysis is calculated over the sample: Finally, in the data assimilation context, by using the modified Cholesky decomposition described in the earlier sections, the pCN-MH filter reads: 1. The forecast step remains unchanged; the forecast ensemble is obtained by applying the model to the ensemble of states of the previous iteration, but the estimation of background covariance matrix is calculated by modified Cholesky.
2. The update step is modified, so that the analysis is obtained by run k iterations of pCN-MH, to obtain a sample of the posterior error distribution. Now we are ready to numerically test the methods discussed in this chapter.

Stochastic filter PEnKF-S
We assess the accuracy of the PEnKF-S and compare it against that of the LETFK implementation proposed by Hunt [13]. The numerical model is the Lorenz 96 model [11], which mimics the behavior of the atmosphere: where n is the number of components in the model and F is an external force; with the value of the parameter, F ¼ 8, the model presents a great entropy.
The experimental settings are described below: 1. An initial random solution x þ À3 is propagated for a while in time by using Lorenz 96 model and a fourth-order Runge Kutta method in order to obtain a vector state x þ À2 whose physics are consistent with the dynamics of such numerical model. This vector state serves as our reference solution.

2.
The reference solution is perturbed by using samples from a normal distribution with parameters N 0 n ; σ B Á I ðÞ . Three different values for σ B are considered during the numerical experiments σ B ∈ 0:05; 0:10; 0:15 fg . This perturbed state is propagated in time in order to make it consistent with the physics and dynamics of the numerical model. From here, an initial background state x b À1 is obtained.

3.
A similar procedure is performed in order to build a perturbed ensemble about x b À1 . The ensemble members are propagated in time from where an ensemble of model realizations of model is obtained.

4.
The assimilation windows consist of 15 equidistant observations. The frequency of observations is 0.5 time units, which represents 3.5 days in the atmosphere.

5.
The dimension of the vector state is n ¼ 40. The external force of the numerical model is set to F ¼ 80.
6. The number of observed components is 50% of the dimension of the vector state.
8. As a measure of quality, the L 2 norm of the analysis state and the reference solution are computed across assimilation steps.

9.
A total of 100 runs are performed for each pair N; σ B ðÞ . For each run, a different initial random vector is utilized in order to build the initial perturbed reference solution x þ À3 (before the model is applied x * À2 ). This yields to different initial ensembles as well as synthetic data for the different runs of each configuration (pair).
The average of the error norms of each pair N; σ B ðÞ for the LETKF and the PEnKF implementations is shown in Table 1 performance of the proposed EnKF implementation outperforms that of the LETKF in terms of L 2 norm of the error. Even more, the PEnKF-S seems to be invariant to the initial background error σ B since, in all cases, when the ensemble size is increased a better estimation of the reference state x * at different observation times is obtained. This can also obey to the estimation of background error correlations via the modified Cholesky decomposition since it is drastically improved whenever the ensemble size is increased as pointed out by Bickel and Levina in [1]. In such case, the error decreases by O log n ðÞ =N ðÞ . This is crucial in the PEnKF-S formulation since estimates of the precision analysis covariance matrix are obtained by rankone updates on the inverse background error covariance matrix. On the other hand, in the LETKF context, increasing the ensemble size can improve the accuracy of the method but that is not better than the one shown by the PEnKF.
Some plots of the L 2 norm of error for the PEnKF and the LETKF across different configurations and runs are shown in Figure 1. Note that the error of the PEnKF decreases aggressively since the earlier iterations. In the LETKF context, the accuracy is similar to that of the PEnKF only at the end of the assimilation window.

Deterministic filter PEnKF-D
Holding the settings from the previous section, experiments are performed by using the PEnKF-D. The results have similar behavior to that of the LETFK and the PEnKF-S implementation proposed by Niño et al. in [8]. This can be appreciated in Figures 1-3. As can be seen, the error distributions reveal similar behavior across all compared filters.

MCMC filter based on modified Cholesky decomposition
We will describe experiments of our proposed filter based on a modified Cholesky decomposition and the preconditioned Crank-Nicolson Metropolis-Hastings. Again, the numerical model is the Lorenz 96 model. In this section, we are mainly interested on describing the behavior of the method, especially when the observational operator is nonlinear.
The experimental settings are as follows: 1. The observational operator is quadratic: H x ðÞ fg i ¼ x 2 i , where x i denotes the ith component of the vector state x.
2. The vector of states has n ¼ 40 components.
3. The ensemble size is N ¼ 20.
4. The length of the Markov Chain is 1000.

5.
The proposal is pCN with β ¼ 0:09.  Note that, not in all cases, the actual values of the model components were located at the peaks of the histogram, but most of them were within or near to zones of high probability described by the sample. This result is important if we take into account that probably the posterior distribution is not normal as is the case of quadratic observation operators.

Conclusions
In this chapter, efficient EnKF implementations were discussed. All of them are based on a modified Cholesky decomposition wherein a precision background covariance is obtained in terms of Cholesky factors. In the first filter, the PEnKF-S, synthetic data are utilized in order to compute the posterior members, as done in stochastic formulations of the filter. Even more, a sequence of rank-one updates can be applied over the factors of the prior precision matrix in order to estimate those of the posterior precision. In the second filter, the PEnKF-D, synthetic data are avoided by using perturbations obtained from the physics and the dynamics of the numerical model. Finally, a MCMC-based filter is obtained in order to reduce the impact of bias when Gaussian assumptions are broken during the assimilation of observations, for instance, when the observation operator is nonlinear. Numerical experiments with the Lorenz 96 model reveal that the proposed filters are comparable to filters from the specialized literature.