Open access peer-reviewed chapter

Efficient Matrix-Free Ensemble Kalman Filter Implementations: Accounting for Localization

By Elias David Niño Ruiz, Rolando Beltrán Arrieta and Alfonso Manuel Mancilla Herrera

Submitted: April 28th 2017Reviewed: November 14th 2017Published: February 21st 2018

DOI: 10.5772/intechopen.72465

Downloaded: 827


This chapter discusses efficient and practical matrix-free implementations of the ensemble Kalman filter (EnKF) in order to account for localization during the assimilation of observations. In the EnKF context, an ensemble of model realizations is utilized in order to estimate the moments of its underlying error distribution. Since ensemble members come at high computational costs (owing to current operational model resolutions) ensemble sizes are constrained by the hundreds while, typically, their error distributions range in the order of millions. This induces spurious correlations in estimates of prior error correlations when these are approximated via the ensemble covariance matrix. Localization methods are commonly utilized in order to counteract this effect. EnKF implementations in this context are based on a modified Cholesky decomposition. Different flavours of Cholesky-based filters are discussed in this chapter. Furthermore, the computational effort in all formulations is linear with regard to model resolutions. Experimental tests are performed making use of the Lorenz 96 model. The results reveal that, in terms of root-mean-square-errors, all formulations perform equivalently.


  • ensemble Kalman filter
  • modified Cholesky decomposition
  • sampling methods

1. Introduction

The ensemble Kalman filter (EnKF) is a sequential Monte Carlo method for parameter and state estimation in highly nonlinear models. The popularity of the EnKF owes to its simple formulation and relatively easy implementation. In the EnKF, an ensemble of model realizations is utilized in order to, among other things, estimate the moments of the prior error distribution (forecast distribution). During this estimation and the assimilation of observations, many challenges are present in practice. Some of them are addressed in this chapter: the first one is related to the proper estimation of background error correlations, which has a high impact on the quality of the analysis corrections while the last one is related to the proper estimation of the posterior ensemble when the observation operator is nonlinear. In all cases, background error correlations are estimated based on the Bickel and Levina estimator [1].


2. Preliminaries

We want to estimate the state of a dynamical system xRn×1, which (approximately) evolves according to some (imperfect) numerical model operators xk=Mtk1tkxk1, for instance, a model which mimics the behavior of the atmosphere and/or the ocean where nis the model dimension (typically associated with the model resolution). The estimation process is based on two sources of information: a prior estimate of x, xb=x+ξwith ξN0nB, and a noisy observation y=Hx+εwith εN0mR, where mis the number of observed model components, Rm×mis the estimated data error covariance matrix, H:Rn×1Rm×1is the (nonlinear) observation operator that maps the information from the model domain to the observation space, 0nis the nth dimensional vector whose components are all zeros. In practice, mnor m<n. The assimilation process can be performed by using sequential data assimilation methods, and one of the most widely used methods is the ensemble Kalman filter, which is detailed in the following paragraphs.

2.1. The ensemble Kalman filter

In the ensemble Kalman filter (EnKF) [2], an ensemble of model realizations


is utilized in order to estimate the moments of the background error distribution,


via the empirical moments of the ensemble (1) and therefore,




where Nis the ensemble size, xbiRn×1is the ith ensemble member, for 1iN, xbRn×1is well known as the background state, BRn×nstands for the background error covariance matrix, x¯bis the ensemble mean, and Pbis the ensemble covariance matrix. Likewise, the matrix of member deviations ΔXbRn×Nreads,


A variety of EnKF implementations have been proposed in the current literature, most of them rely on basic assumptions over the moments of two probability distributions: the background (prior) and the analysis (posterior) distributions. In most of EnKF formulations, normal assumptions are done over both error distributions.

In general, depending on how the assimilation process is performed, the EnKF implementation will fall in one of two categories: stochastic filter [3] or deterministic filter [4]. In the context of stochastic filters, for instance, the assimilation of observations can be performed by using any of the next formulations,




where the analysis covariance matrix ARn×nis given by A=Pb1+HTR1H1,the matrix of innovations (on observations) is denoted by ΔY=YsHXbRn×N,and the matrix of perturbed observations reads Ys=y1NT+ERm×N, where the columns of ERm×Nare formed by samples from a normal distribution with moments N0R. In practice, one of the most utilized formulations is given by (8).

2.2. Cholesky and modified Cholesky decomposition

The forms of the Cholesky and the modified Cholesky decomposition were discussed by Golub and Van Loan in [5]. If ARn×nis symmetric positive definite, then there exist a unique lower triangular LRn×nwith positive diagonal entries such that A=LLT. If all the leading principal submatrices of ARn×nare non-singular, then there exist unique lower triangular matrices Land Mand a unique diagonal matrix D=diagd1d2dnsuch that A=LDMT.If Ais symmetric, then L=Mand A=LDLT. Golub and Van Loan provide proofs of those decompositions, as well as various ways to compute them. We can use the Cholesky factor of a covariance matrix to quickly solve linear systems that involve a covariance matrix. Note that if we compute the Cholesky factorization and solve the triangular system Ly=band LTx=y, then


For a symmetric matrix ARn×n, the modified Cholesky decomposition involves finding a non-negative diagonal matrix Esuch that A+Eis positive definite. In particular, E=0whenever Ais positive definite, otherwise Eshould be sufficiently small. This allows applying the usual Cholesky factorization to A+E, i.e. finding matrices L,DRn×nsuch that A+E=LDLT.

Recall ΔXb=Xbx¯b1NTRn×N. Thus, ΔxbiN0nBfor 1in. Let xbiRN×1, the vector holding the ith row across all columns of ΔXbfor 1in. Bickel and Levina in [1] discussed a modified Cholesky decomposition for the estimation of precision covariance matrix, and this allows in our context to estimate of B1by fitting regressions of the form:


then, Lij=βi,j, for 1i<jn.

3. Filters based on modified Cholesky decomposition

3.1. Ensemble Kalman filter based on modified Cholesky

In the ensemble Kalman filter based on a modified Cholesky decomposition (EnKF-MC) [6], the analysis ensemble is estimated by using the following equations,




In this context, error correlations are estimated via a modified Cholesky decomposition. This provides an estimate of the inverse background error covariance matrix of the form,


or equivalently


where LRn×nis a lower-triangular matrix whose diagonal elements are all ones, and DRn×nis a diagonal matrix. Even more, when only local effects are considered during the estimation of B̂1, sparse estimators of the precision background can be obtained. Furthermore, the matrix Lcan be sparse with only a few non-zero elements per row. Typically, the number of non-zero elements depends on the radius of influence during the estimation of background error correlations. For instance, in the one-dimensional case, the radius of influence denotes the maximum number of non-zero elements, per row, in L. The EnKF-MC is then obtained by plugging in the estimator (3) in (6). Given the structure of the Cholesky factors, the EnKF-MC can be seen as a matrix-free implementation of the EnKF.

Recall that the precision analysis covariance matrix reads,


moreover, since HTR1HRn×ncan be written as a sum of mrank-one matrices, the factors (3) can be updated in order to obtain an estimate of the inverse analysis covariance matrix. The next section discussed such approximation in detail.

3.1.1. Posterior ensemble Kalman filter stochastic (PEnKF-S)

In the stochastic posterior ensemble Kalman filter (PEnKF-S) [7, 8], we want to estimate the moments of the analysis distribution,


based on the background ensemble Xb, where xais the analysis state and ARn×nis the analysis covariance matrix. Consider the estimate of the inverse background error covariance matrix Eq. (3), the precision analysis covariance matrix Eq. (4) can be approximated as follows:


where X=HTR12Rn×m. The matrix (17) can be written as follows:


where xidenotes the ith column of matrix X, for 1im. Consider the sequence of factor updates,


where Li1pi=xiRn×1, for 1im, B̂1=L0TD0L0and


We can use of Dolittle’s method in order to compute the factors D˜iand L˜iin (9), and it is enough to note that,


After some math simplifications, the next equations are obtained,




for 1k<nand 1jk1. The set of Eqs. (19) and (20) can be used in order to derive an algorithm for rank-one update of Cholesky factors, and the updating process is shown in Algorithm 1.

Algorithm 1. Rank-one update for the factors Li1and Di1

1. functionUPD_CHOLESKY_FACTORS Li1Di1xi

2.     Compute pifrom LiTpi=xi

3.     fork=n1do

4.       Compute d˜kvia Eq. (10).

5.       Set lkk1.

6.      forj=1k1do

7.         Compute l˜kkaccording to (11).

8.      end for

9.     end for

10.   Set LiLi1Li1and DiD˜i.

11.   return Li, Di

12.  end function

Algorithm 1 can be used in order to update the factors of B̂1for all column vectors in X, and this process is detailed in Algorithm 2.

Algorithm 2. Computing the factors Lmand Dmof LmTDmLm


14.     Set XHTR12.

15.     fori=1mdo

16.        Set LiDiUPD_CHOLESKY_FACTORSLi1Di1xi

17.     end for

18.     returnLm, Dm

19. end function

Once the updating process has been performed, the resulting factors form an estimate of the inverse analysis covariance matrix,


From this covariance matrix, the posterior mode of the distribution can be approximated as follows:




with q=HTR1yHx¯bRn×1. Note that the linear system (23) involves lower and upper triangular matrices, and therefore, x¯acan be estimated without the need of matrix inversion. Once the posterior mode is computed, the analysis ensemble is built about it. Note that Âreads Â=Lm1Dm1LmT, and therefore, a square root of Âcan be approximated as follows:


which can be utilized in order to build the analysis ensemble,


where ΔXaRn×Nis given by the solution of the linear system,


In addition, the columns of WRn×Nare formed by samples from a multivariate standard normal distribution. Again, since Lmis lower triangular, the solution of (26) can be obtained readily.

3.1.2. Posterior ensemble Kalman filter deterministic (PEnKF-D)

The deterministic posterior ensemble Kalman filter (PEnKF-D) is a square root formulation of the PEnKF-S. The main difference between them lies on the use of synthetic data. In both methods, the matrices A1and B1are computed similarly (i.e., by using Dolittle’s method and the algorithms described in the previous section).

In the PEnKF-D, the computation of B̂1is performed (13) based on the empirical moments of the ensemble. The analysis update is performed by using the perturbations


which are consistent with the dynamics of the numerical model, for instance, samples from (27) are driven by the physics and the dynamics of the numerical model. Finally, the analysis equation of the PEnKF-D






Since the moments of the posterior distribution are unchanged, we expect both methods PEnKF-S and PEnKF-D to perform equivalently in terms of errors during the estimation process.

3.2. Markov Chain Monte Carlo-based filters

Definitely, the EnKF represents a breakthrough in the data assimilation context, perhaps its biggest appeal is that we can obtain a closed form expression for the analysis members. Nevertheless, as can be noted, during the derivation of the analysis equations, some constrains are imposed, for instance, the observation operator is assumed linear, and therefore, the likelihood Pyxobeys a Gaussian distribution. Of course, in practice, this assumption can be easily broken, and therefore, bias can be introduced on the posterior prediction; this is notoriously significant if one considers the fields in which data assimilation lives are submerged. From a statistical point of view, more specifically, under the Bayesian framework, this situation can be summarized as follows:

  • The prediction is obtained from the posterior distribution: Pxy.

  • The information from numerical model is known like the prior of background: Pbx.

  • The information from the observations is incorporated through the likelihood: Pyx.

  • The posterior distribution is calculated: PxyPbxPyx.

  • If Pbxand Pyxare Gaussian, Pyxis Gaussian too, and the analysis is equal to the mean of Pyx.

A significant number of approaches have been proposed in the current literature in order to deal with these constraints; for instance, the particle filters (PFs) and the maximum likelihood ensemble filter (MLEF) are ensemble-based methods, which deal with nonlinear observation operators. Unfortunately, in the context of PF, its use is questionable under realistic weather forecast scenarios since the number of particles (ensemble members) increases exponentially regarding the number of components in the model state. Anyways, an extended analysis of these methods exceeds the scope of this document, but its exploration is highly recommended.

Taking samples directly from the posterior distribution is a strategy that can help to remove the bias induced by wrong assumptions on the posterior distribution (i.e., normal assumptions). We do not put the sights on finding the mode of the posterior distribution; instead of this, we want a set of state vectors that allow us to create a satisfactory representation of the posterior error distribution, then, based on these samples, it is possible to estimate moments of the posterior distribution from which the analysis ensemble can be obtained [9].

Hereafter, we will construct a modification of the sequential scheme of data assimilation; first, we will describe how to compute the analysis members by using variations in Markov Chain Monte Carlo (MCMC)-based methods, and then, we will include the modified Cholesky decomposition in order to estimate a precision background covariance.

An overview of the proposed method is as follows:

  1. The forecast step is unchanged; the forecast ensemble is obtained by applying the model to the ensemble of states of the previous assimilation cycle.

  2. The analysis step is modified, so that the analysis is not obtained anymore by, for instance (28), but we perform kiterations of an algorithm from the Markov Chain Monte Carlo (MCMC) family in order to obtain samples from the posterior distribution.

In order to be more specific in the explanation, let us define Metropolis-Hastings (MH) as the selected algorithm from the MCMC family. Now let us focus on MCMC in the most intuitive way possible. Let us define Jxyequal to the posterior pdf obtained from the expression PxyPbxPyx. MH explores the state space in order to include model states in a so-called Markov Chain. The selection of candidates is based on the condition xcy>Jxt1y, that is, if the value of Jfor a candidate xcis greater than that for a previous vector xt1. Candidates are generated from a pre-defined proposal distribution Q; generally, a Gaussian multivariate distribution with mean xt1is chosen, and a covariance matrix is defined in order to handle the step sizes. Concisely, MCMC methods proceed as follows: at first iteration, the chain is started with xbvector. At end, the sample is obtained by extracting the last tvectors on the chain, and the other vectors are dismissed or “burned”. In (32), we can see the expression to calculate the acceptance probability that relates the target pdf, J, with the proposal Q


The terms that involve Qcan be canceled if a symmetric distribution is chosen as the proposal one, for instance, in the Gaussian distribution case. The procedure for the calculation of the analysis applying MH is described below:

  1. Initialize the Markov Chain, C, assigning the background value xbto C0.

  2. Generate a candidate vector state xc, from Q.

  3. Obtain Ufrom a uniform (0, 1) distribution.

  4. If UαThen: Ct+1=xcElse: Ct+1=Ct

  5. Repeat Steps 2 through 4, for t=1until t=k1

  6. Remove the first pvectors of the chain, the burned ones.

The analysis is computed over the sample:


MCMC methods are straightforward to implement; when an enough number of iterations is performed, the behavior of the target function is captured on the chain [10]. This is true for even, complex density functions such as those with multiple modes. Briefly, let us focus on the fact that, generally, simulation-based methods such as MCMC explore a discretized grid, and as the mesh is refined, a huge number of iterations are needed before a high probability zone of the posterior distribution is reached [11]. Concretely, Hu et al. [12] proposed a family of modified MCMC dimension-independent algorithms under the name of preconditioned Crank-Nicolson (pCN) MCMC. These methods are robust regarding the curse of dimensionality in the statistical context. Initially, the Crank-Nicolson discretization is applied to a stochastic partial differential equation (SPDE) in order to obtain a new expression for the proposal distribution:


where Lis the precision matrix of B, Kis a preconditioning matrix, ε0is white noise, if K=Band δ02in (33), we get the pCN proposal described in (34):


where ω~N0Band β=8δ/2+δ2. Of course, we use Pblike a computationally efficient estimation of B. The acceptance probability is defined in (35):


The procedure for the calculation of the analysis applying pCN Metropolis-Hastings is described as follows:

  1. Initialize the Markov Chain, C, assigning the background value xbto C0.

  2. Generate a candidate vector state using (3): xc=1β212xt1+βω.

  3. Obtain Ufrom a uniform (0, 1) distribution.

  4. If Uα, then Ct+1=xc. Else: Ct+1=Ct.

  5. Repeat Steps 2 through 4, for t=1until t=k1.

  6. Remove the first pvectors of the chain, the burned ones.

  7. The analysis is calculated over the sample: xa=1kpi=kpkCi.

Finally, in the data assimilation context, by using the modified Cholesky decomposition described in the earlier sections, the pCN-MH filter reads:

  1. The forecast step remains unchanged; the forecast ensemble is obtained by applying the model to the ensemble of states of the previous iteration, but the estimation of background covariance matrix is calculated by modified Cholesky.

  2. The update step is modified, so that the analysis is obtained by run k iterations of pCN-MH, to obtain a sample of the posterior error distribution.

Now we are ready to numerically test the methods discussed in this chapter.

4. Experimental results

4.1. Stochastic filter PEnKF-S

We assess the accuracy of the PEnKF-S and compare it against that of the LETFK implementation proposed by Hunt [13]. The numerical model is the Lorenz 96 model [11], which mimics the behavior of the atmosphere:


where nis the number of components in the model and Fis an external force; with the value of the parameter, F=8, the model presents a great entropy.

The experimental settings are described below:

  1. An initial random solution x3+is propagated for a while in time by using Lorenz 96 model and a fourth-order Runge Kutta method in order to obtain a vector state x2+whose physics are consistent with the dynamics of such numerical model. This vector state serves as our reference solution.

  2. The reference solution is perturbed by using samples from a normal distribution with parameters N0nσBI. Three different values for σBare considered during the numerical experiments σB0.050.100.15. This perturbed state is propagated in time in order to make it consistent with the physics and dynamics of the numerical model. From here, an initial background state x1bis obtained.

  3. A similar procedure is performed in order to build a perturbed ensemble about x1b. The ensemble members are propagated in time from where an ensemble of model realizations x0bii=1Nof model is obtained.

  4. The assimilation windows consist of 15 equidistant observations. The frequency of observations is 0.5 time units, which represents 3.5 days in the atmosphere.

  5. The dimension of the vector state is n=40. The external force of the numerical model is set to F=80.

  6. The number of observed components is 50% of the dimension of the vector state.

  7. Three ensemble sizes are tried during the experiments 204060.

  8. As a measure of quality, the L2norm of the analysis state and the reference solution are computed across assimilation steps.

  9. A total of 100 runs are performed for each pair NσB. For each run, a different initial random vector is utilized in order to build the initial perturbed reference solution x3+(before the model is applied x2). This yields to different initial ensembles as well as synthetic data for the different runs of each configuration (pair).

The average of the error norms of each pair NσBfor the LETKF and the PEnKF implementations is shown in Table 1 . As can be seen, on average across 100 runs, the performance of the proposed EnKF implementation outperforms that of the LETKF in terms of L2norm of the error. Even more, the PEnKF-S seems to be invariant to the initial background error σBsince, in all cases, when the ensemble size is increased a better estimation of the reference state xat different observation times is obtained. This can also obey to the estimation of background error correlations via the modified Cholesky decomposition since it is drastically improved whenever the ensemble size is increased as pointed out by Bickel and Levina in [1]. In such case, the error decreases by Ologn/N. This is crucial in the PEnKF-S formulation since estimates of the precision analysis covariance matrix are obtained by rank-one updates on the inverse background error covariance matrix. On the other hand, in the LETKF context, increasing the ensemble size can improve the accuracy of the method but that is not better than the one shown by the PEnKF.


Table 1.

Average of L-2 norm of errors for 100 runs of each configuration σBNfor the compared filter implementations.

Some plots of the L2norm of error for the PEnKF and the LETKF across different configurations and runs are shown in Figure 1 . Note that the error of the PEnKF decreases aggressively since the earlier iterations. In the LETKF context, the accuracy is similar to that of the PEnKF only at the end of the assimilation window.

Figure 1.

Local ensemble transform Kalmar filter (LETKF).

4.2. Deterministic filter PEnKF-D

Holding the settings from the previous section, experiments are performed by using the PEnKF-D. The results have similar behavior to that of the LETFK and the PEnKF-S implementation proposed by Niño et al. in [8]. This can be appreciated in Figures 1 3 . As can be seen, the error distributions reveal similar behavior across all compared filters.

Figure 3.

Posterior ensemble Kalmar filter deterministic (PEnKF-D).

Figure 2.

Posterior ensemble Kalmar filter stochastic (PEnKF-S).

4.3. MCMC filter based on modified Cholesky decomposition

We will describe experiments of our proposed filter based on a modified Cholesky decomposition and the preconditioned Crank-Nicolson Metropolis-Hastings. Again, the numerical model is the Lorenz 96 model. In this section, we are mainly interested on describing the behavior of the method, especially when the observational operator is nonlinear.

The experimental settings are as follows:

  1. The observational operator is quadratic: Hxi=xi2, where xidenotes the ith component of the vector state x.

  2. The vector of states has n=40components.

  3. The ensemble size is N=20.

  4. The length of the Markov Chain is 1000.

  5. The proposal is pCN with β=0.09.

Figure 4 describes the distribution of six of the 40 components of the state vector after 1000 iterations of the algorithm, attempting to visualize if the actual value is within or near the region of the highest probability density described by the sample contained in the Markov Chain.

Figure 4.

Histogram of six components of the state vector after 1000 iterations of the algorithm, the dashed lines indicate the position of the true value for the respective component.

Note that, not in all cases, the actual values of the model components were located at the peaks of the histogram, but most of them were within or near to zones of high probability described by the sample. This result is important if we take into account that probably the posterior distribution is not normal as is the case of quadratic observation operators.

5. Conclusions

In this chapter, efficient EnKF implementations were discussed. All of them are based on a modified Cholesky decomposition wherein a precision background covariance is obtained in terms of Cholesky factors. In the first filter, the PEnKF-S, synthetic data are utilized in order to compute the posterior members, as done in stochastic formulations of the filter. Even more, a sequence of rank-one updates can be applied over the factors of the prior precision matrix in order to estimate those of the posterior precision. In the second filter, the PEnKF-D, synthetic data are avoided by using perturbations obtained from the physics and the dynamics of the numerical model. Finally, a MCMC-based filter is obtained in order to reduce the impact of bias when Gaussian assumptions are broken during the assimilation of observations, for instance, when the observation operator is nonlinear. Numerical experiments with the Lorenz 96 model reveal that the proposed filters are comparable to filters from the specialized literature.



This work was supported by the Applied Math and Computational Science Lab (AML-CS) at Universidad del Norte in Barranquilla, Colombia.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Elias David Niño Ruiz, Rolando Beltrán Arrieta and Alfonso Manuel Mancilla Herrera (February 21st 2018). Efficient Matrix-Free Ensemble Kalman Filter Implementations: Accounting for Localization, Kalman Filters - Theory for Advanced Applications, Ginalber Luiz de Oliveira Serra, IntechOpen, DOI: 10.5772/intechopen.72465. Available from:

chapter statistics

827total chapter downloads

1Crossref citations

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Kalman Filters for Reference Current Generation in Shunt Active Power Filter (APF)

By Ahmad Shukri Bin Abu Hasim, Syed Mohd Fairuz Bin Syed Mohd Dardin and Zulkifilie Bin Ibrahim

Related Book

First chapter

Highlighted Aspects From Black Box Fuzzy Modeling For Advanced Control Systems Design

By Ginalber Luiz de Oliveira Serra

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us