Open access peer-reviewed chapter

A Review of the EnKF for Parameter Estimation

Written By

Neil K. Chada

Submitted: 25 July 2022 Reviewed: 21 September 2022 Published: 01 November 2022

DOI: 10.5772/intechopen.108218

From the Edited Volume

Inverse Problems - Recent Advances and Applications

Edited by Ivan I. Kyrchei

Chapter metrics overview

77 Chapter Downloads

View Full Metrics

Abstract

The ensemble Kalman filter is a well-known and celebrated data assimilation algorithm. It is of particular relevance as it used for high-dimensional problems, by updating an ensemble of particles through a sample mean and covariance matrices. In this chapter we present a relatively recent topic which is the application of the EnKF to inverse problems, known as ensemble Kalman Inversion (EKI). EKI is used for parameter estimation, which can be viewed as a black-box optimizer for PDE-constrained inverse problems. We present in this chapter a review of the discussed methodology, while presenting emerging and new areas of research, where numerical experiments are provided on numerous interesting models arising in geosciences and numerical weather prediction.

Keywords

  • ensemble Kalman filter
  • Kalman filter
  • inverse problems
  • parameter estimation
  • data assimilation
  • optimization

1. Introduction

Inverse problems [1, 2, 3] are a class of mathematical problems which have gained significant attention of recent. Simply put, inverse problems are concerned with the recovery of some parameter of interest from noisy unstructured data. Mathematically we can express an inverse problem as the recovery of uX from noisy measurements of data yY, expressed as

y=Gu+η,E1

where G:XY is the forward operator, and ηN0Γ is some form of additive Gaussian noise. Specifically N0Γ denotes a normal distribution with mean 0 and variance Γ. Commonly the covariance can be taken to be some form of the identity, i.e. Γ=γ2I, where γR is some constant and I is the identity. Inverse problems are of high interest due to the amount of relevant problems that arise in wide variety of applications, most notably geophysical sciences, medical imaging and numerical weather prediction [4, 5, 6]. The classical approach to solving inverse problems, which is the theme of this chapter, is to construct a least-squares functional, and the solution is represented as a minimizer of some functional of the form

uargminuX12yGuΓ2+λRu,E2

where λ>0 is a regularization parameter and Ru is some regularization term, usually required to prevent the overfitting of the data. A common example is Tikhonov regularization, i.e. Ru=12u2. Traditional methods for solving (1) include optimization schemes such as the Gauss–Newton method, or Levenburg–Marquardt method which require derivative information of G, which can prove costly and cumbersome. Therefore a motivation for solving inverse problems is to provide gradient-free optimizers which can reduce this computational burden, while attaining a good level of accuracy. The methodology that we motivate, which alleviates these issues, is that of ensemble Kalman inversion (EKI). EKI can be viewed as the application of the ensemble Kalman filter (EnKF) to inverse problems, which is a natural way to solve inverse problems given the connections between data assimilation and inverse problems. The EnKF is a Monte-Carlo version of the celebrated Kalman filter, which is more favorable in high-dimensions. It operates by updating an ensemble of particles through sample mean and covariances. In particular we will take the viewpoint of EKI which acts as PDE-constrained derivative-free optimizer. Therefore EKI can be viewed as a black-box solver where no derivative information is required. Since this method was proposed for inverse problems, it has seen wide applications to various engineering-based applications, as well as developments related to both theory and methodology. In this chapter we discuss some of these keys concepts and insights, while briefly mentioning particular directions with EKI.

The general outline of these chapter is as follows. In Section 2 we provide the necessary background material, which covers the basics of EKI with some intuition and motivation We will discuss the algorithm in both the usual discrete-time setting, but also the continuous-time setting. This will lead onto Section 3 where we discuss one recent direction which is that of regularization theory, and its application to EKI. Furthermore we will also discuss how EKI can be extended to the notion of sampling in statistics within Section 4. Other, less-developed, directions are provided in Section 5. Numerical experiments are provided in basic settings in Section 6 on a number of basic differential equations, before providing some future remarks and a conclusion in Section 7.

Advertisement

2. EKI: background material

In this section we provide the background material related to the understanding and intiution of EKI. This will begin with a discussion on the ensemble Kalman filter, and how it connections with EKI. We will then present EKI in its vanilla form, which is a discrete-time optimizer, before discussing its connections with various existing methods. Finally we will extend the original formulation to the setting of continuous-time where we aim to provide a gradient flow structure of the resulting equations.

2.1 Kalman filtering

The ensemble Kalman filter (EnKF), is a popular methodology based on the celebrated Kalman filter (KF), which was originally developed by Rudolph Kalman in the 1960s [7, 8]. The Kalman filters initial aim was to solve a recursive estimation problem from dynamics processes and systems. Specifically the KF aims to merge data with model, or signal, dynamics where both equations have the form

un+1=Ψun+ξn,ξnn+N0Σ,E3
yn+1=Hun+1+ηn+1,ηn+1n+N0Γ.E4

Here unn+ is our signal which is updated through a forward operator Ψ:RmRm, which when combined with noise, provides the update un+1. Our data is denoted as yn+1 which is produced by sending our updated signal through the operator H:RmRm¯, where m¯>m, which is known as observational operator. Our initial conditions for the system are given as u0Nm0C0. This area of recursive estimation, in this setup, became to be known as data assimilation [9, 10].

In particular in the linear and Gaussian setting, where the dynamics and noise are Gaussian, the KF updates state using the first two moments, which we know are the mean and covariance. Assume that the state-space dimension is dR+, then the cost of the KF has complexity Od2. For high-dimensional examples this can be an issue, therefore an algorithm that was developed to alleviate this is the EnKF, a Monte Carlo version, proposed by Evensen [11, 12].

The EnKF operates by replacing the true covariance by a sample covariance and mean and updates an ensemble of particles unj, with 1jJ particles, using these moments combined with information from the data. The EnKF can be split into a two-step procedure, which is the prediction step

ûn+1j=Ψunj+ξnj,m̂n+1=1Jj=1Jun+1j,Ĉn+1=1J1j=1Jun+1jm̂n+1un+1jm̂n+1T,E5

and update step

Kn+1=Ĉn+1HTHĈn+1HT+Γ,un+1j=IKj+1Hûn+1j+Kn+1yn+1j,yn+1j=yn+1+ηn+1j,E6

where Kn+1 represents the Kalman gain matrix and ξnj and ηn+1j are i.i.d. Gaussian noise. In the EnKF context our prediction step defines a sample mean and covariance from our signal. From this in the analysis step we define our Kalman gain through our sample covariance, which updates our signal, which is given by un+1j. This is aided by aiming to minimize the discrepancy of the data yn+1j and the quantity Hu. To better understand this discrepancy, there is an alternative approach of looking at the EnKF is through a variational approach, where we consider the follow cost function

Inu12yn+1jHuΓ2+12uûn+1jĈn+12,E7

for which we aim to minimize, which is defined as the updated mean

m̂n+1=argminuInu.E8

This minimization procedure relies on the updated covariance Ĉn+1 which is dependent entirely on ûj. As described in the prediction step and update step of filtering, a mapping is presented between distributions. As we related the distributions in the filtering setting, for each step, we can do so similarly for the EnKF, i.e.

unjj=1Jun+1jj=1J,un+1jj=1Jûn+1jj=1J.E9

With the EnKF, compared to KF, the computational complexity associated with it is OJd, where one usually assumes J<d, therefore implying the reduction in cost.

2.2 EnKF applied to inverse problems

Since the formulation of the EnKF, there has been a huge interest from practitioners in various applicable disciplines. Most notably this has been within numerical weather prediction, geophysical sciences and signal processing related to state estimation. In this chapter our focus is on the application of the EnKF to inverse problems, namely to solve (1). We now introduce this application which is known as ensemble Kalman inversion (EKI), which was introduced by Iglesias et al., motivated from Li et al., [13] as a derivative-free optimizer for PDE-constrained inverse problems.

As with the EnKF, we are concerned with updating an ensemble of particles, for which now we modify notation with n now denoting the iteration count. Given an initial ensemble u0j, our aim is to learn a true underlying unknown u. To do so, as done with the EnKF, we first define our sample mean and covariance matrices

u¯nj=1Jj=1Junj,u¯nj=1Jj=1JGunj,Cnuu=1J1j=1Junju¯unju¯T,Cnup=1J1unju¯GunjG¯T.E10

which we can through the update equation

un+1j=unj+hCuphCpp+Γ1ynjGunj,E11
ynj=y+ηnj,E12

where y represents our true data and h>0 denotes a step size related to the level of discretization. Figure 1 provides a pictorial description of the EnKF, which has been described above.

Figure 1.

Dynamics of the ensemble Kalman filter, split into the prediction and update steps.

The update equation of EKI (11) is of interest as it coincides with the update formula for Tikhonov regularization for linear statistical inverse problems. Namely if we consider Ru=12uC02, then the update formula, in the linear G=G and Gaussian setting is given as

uTP=u¯+CGGCG+Γ1yGu¯,E13

where G denotes the derivative of the operator G. This connection is of relevance and was discussed in [14], where it was shown that taking the limit as J, it was shown that uuTP. This is of interest as the minimizing the regularized functional (13) is equivalent to the following maximization procedure in statistics

uargmaxuXuy.E14

known as the MAP formulation, where uy=yuu denotes the posterior distribution. This connection is discussed in [15]. Therefore this provides some insight into EKI and its connection with other known existing methodologies in inverse problems. An important entity to discuss is a property that EKI inherits, which is the subspace property. It is given by the following lemma.

Lemma 1.1 Let A be the linear span of the initial ensemble u0jj=1J, then we that blackunjj=1JA for all n.

The essence of the subspace property states that the updated ensemble of particles is spanned by the initial ensemble. This is important, because it provides a justification on the performance, whether the initial ensemble is a good choice or not. Therefore it can act as an advantage or a disadvantage.

2.3 Continuous-time formulation

The original representation of EKI, as shown in (11), is a discrete-time iterative scheme similar to other optimization methods. However it is of interest to understand EKI in a continuos-time setting, which was considered by Schillings et al. [16, 17]. This is primarily for two reasons; (i) firstly that one can understand more easily how the dynamics of (11) and (12) behaves, and secondly (ii) it provides new numerical schemes for EKI, which is specific in the continuous-time setting. In order to derive such equations, as usual we require to take the step-size to zero, i.e. h0. Once we do this, we have the following set of stochastic differential equations

dujdt=CuwuΓ1yGuj+CuwuΓ1dWjdt,E15

with Wj denoting independent cylindrical Brownian motions. By substituting the form of the covariance operator, we see

dujdt=1Jk=1JGukG¯yGuj+ΓdWjdtΓuku¯.E16

For this we take our forward operator G=A to be bounded and linear. Using this notion and by substituting our linear operator A in (16) we have the following diffusion limit

dujdt=1Jk=1JAuku¯yAujΓuku¯.E17

By defining the empirical covariance operator

Cu=1J1k=1Juku¯uku¯,E18

and taking Γ=0 we can express (17) as

dujdt=CuDuΦujy,Φuy=12Γ1/2yAu2.E19

Thus we note that each particle performs a preconditioned gradient descent for Φy where all the gradient descents are preconditioned through the covariance Cu. Since our covariance operator Cu is semi-positive definite we have that

ddtΦuty=ddt12Γ1/2yAu20.E20

In the context of EKI this is of interest as it is a first result providing some indication of the dynamics, which was not achievable through the discrete-time update formula (11). Indeed given the gradient flow structure, we are able to see that the EKI abides by a usual optimization function, with the dynamics following the direction of the negative gradient, or in other-words towards to minimizer of Φ. Since the continuous-time formulation was derived, there has been different works deriving further analysis, most notably with recent success on the nonlinear setting, and other well-known results. This can be found in [18].

Advertisement

3. Regularization

In this section we discuss the role of regularization in EKI. We will begin with an introduction into iterative regularization schemes, that have been used before discussing Tikhonov regularization, Lp and particular adaptive choices.

As briefly discussed regularization is an important tool in optimization, and inverse problems aimed at preventing the over-fitting, or influence, of the data. We refer the reader to various pieces of literature that give a concise overview on this [19, 20]. The over-fitting of data can cause issues in inverse problems, such as the divergence of the error, therefore careful consideration is needed to prevent this. A cartoon representation of this is given in Figure 2.

Figure 2.

The figure presents two simulations of EKI as the iterations increase. The black curve represents what we aim to achieve, however in certain situations the data is commonly overfitted. Therefore this can cause a divergence in the relative error, as shown by the dashed red curve.

To initiate this chapter, there are two main forms of regularization one can apply for inverse problems. The first is related to iterative regularization, where the regularization is included within the iterative scheme. This can be included directly such as the form

un+1j=unj+hCuphCpp+αnΓ1ynjGunj,E21

or in the presence of a discrepancy principle of the form

Γ1yu¯nj2ϑη,ϑ01,E22

which controls the error between the updated ensemble and the true unknown. The discrepancy principle acts as a stopping rule if the error becomes big, and the the modified update formula contains a sequence of numbers αnn aimed at also preventing the overfitting of the data. This sequence is chosen in such a way that is related to a discrepancy principle. Specifically for EKI this has been considered in numerous work by Iglesias et al. [21, 22].

However more recent work has considered regularization through the least-squares functional (LSF) (2). For EKI the first known form to consider this, is Tikhonov regularization which has the penalty form of Ru=12uC02. This form of regularization is a natural choice, as it very well-known and understood but can view viewed as a Gaussian form of regularization, which smoothes the problems. In the context of EKI this makes sense, as commonly one assumes Gaussian dynamics. The work of Chada et al. [23] first developed this extension, which was done by modifying (1) to the following

y=Gu+η1,u=η2,E23

where η1N0Γ,η2N0λ1C0..

Now we introduce z,η and the mapping F:X×XY×X as follows:

z=y0,Fu=Guu,η=η1η2,E24

and

ηN0Σ,Σ=Γ00λ1C0.E25

Therefore our inverse problem is now reformulated at

z=Fu+η.E26

now from this we can modify EKI to include the above setup, for which we refer to it as Tikhonov ensemble Kalman inversion (TEKI), which takes the following form

un+1j=unj+hBuphBpp+Γ1znjFunj,E27

where we have now modified covariance matrices Bup,Bpp. From this inclusion, the authors of [23] were able to show that analytically, the subspace property still holds, while other such results as observability and controllability and the ensemble collapse. More importantly through the numerical simulations, it was shown that one can prevent the over-fitting phenomenon.

Since this work a number of useful extensions have been considered, such as its understanding in the continuous-case, as well as the new variants in the discrete-time setting [24]. Two recent developments on this have been firstly on the extension to Lp regularization [25, 26], which is to motivate reconstructing edges or lines, where the LSF is modified to

Φuy12yGuΓ2+λup,p1.E28

Finally another direction is related to producing adaptive strategies for TEKI. Adaptive regularization schemes are of importance, as choosing a correct choice of the regularization parameter λ>0 can have a big impact on the reconstruction. Therefore thinking adaptively allows one to evolve the parameter over the iteration count, now denoted as λn. The work of Weissmann et al. [27] provides these developments in an adaptive fashion.

Advertisement

4. Ensemble Kalman sampling

Although the EKI has been introduced through the application of the EnKF to inverse problems and hence sequential sampling method, the trending viewpoint of EKI lies in optimization. So far, we have seen its motivation from the gradient flow structure in the continuous-time formulation in Section 2.3 and the representation as SDE. For applying EKI as a consistent sampling method, we would instead of taking the limit t rather consider the limit t1. For linear forward models EKI is consistent with the posterior distribution, however, it is known to be not consistent with the Bayesian perspective in the nonlinear setting [28].

Building up on this fact, the motivation behind the ensemble Kalman sampler [29] is to modify the time-dynamical system of EKI in a way such that the limiting distribution for t corresponds to the posterior distribution. We will start the discussion with an introductory example.

Example 1.1 Let π be a pdf of the form πxexpΦu with Φu=12yGuΓ2+uC2, i.e. π corresponds to the posterior pdf under Gaussian prior assumption π0=N0C. We consider the Langevin diffusion given by

dut=ulogπutdt+2dWt,u0π0,E29

where Wtt0 denotes a Brownian motion in X=Rnu. The evolution of the distribution ρt of the state ut can then be described through the Fokker–Planck equation

ρt=ρtlogπ+Δρt,ρ0=π0,E30

where under certain assumptions on Φ the underlying Markov process utt0 is ergodic and its unique invariant distribution is given by π [30]. Taking the Fokker–Planck eq. (30) into account the convergence to equilibrium can be described through the Kullback–Leibler (KL) divergence KL=Xq1xlogq1xq2xdx [31]. Assuming a log-Sobolev inequality (e.g. satisfied for log-concave π), it follows that

KLρtπexpλtKLρ0πE31

for some λ>0 [32].

4.1 Interacting Langevin sampler

The interacting Langevin sampler has been introduced, motivated by the preconditioned gradient descent method, as interacting particle system represented by the coupled system of SDEs

dutj=Cutulogπutjdt+2CutdWt,j=1,,J,E32

initialized through an i.i.d. sample u0jπ0. The idea of preconditioning with Cut instead of a fixed preconditioning matrix CRnu×nu is motivated through the corresponding mean-field limit. In the large particle limit, the corresponding SDE is given as

dut=Cρtulogπutdt+2CρtdWt,u0π0,E33

where the macroscopic mean and covariance operator are defined as

mρ=Xxdx,Cρ=Xxmρxmρdx.E34

This connects the interacting Langevin system to its origin Langevin diffusion (29). Hence, in the long-time limit the preconditioning matrix will formally be given by the covariance operator corresponding to the stationary distribution (assuming it exists).

The resulting modified Fokker–Planck equation is given by

ρt=ρtCρtlogπ+TrCρtD2ρt,ρ0=π0.E35

Assuming that CρtαId and the target distribution of the form πuexpΦu, Φu=12yGuΓ2+λuC02, to be log-concave, the solution ρt of (35) converges exponentially fast to equilibrium

KLρtπexpλtKLρ0π,E36

for some λ>0 [29], Proposition 3.1. Furthermore, through the preconditioning with the sample covariance the resulting scheme remains invariant under affine transformations [33].

4.2 Ensemble Kalman sampler

One of the attractive features of the EnKF as well as of EKI is its derivative-free implementation. The basis of the ensemble Kalman sampler (EKS) is to build a modified interacting Langevin sampler avoiding to compute derivatives. Let πuexp12yGuΓ2uC02, then the interacting Langevin system is given by

dutj=CutDGutjTΓ1GutjyCutC01utjdt+2CutdWt,j=1,,J.E37

Motivated by the approximation CuwuCuDGujT the EKS is then formulated as the solution of the system of coupled SDEs

dutj=CuwutΓ1GutjyCutC01utjdt+2CutdWt,j=1,,J.E38

We note that the approximation CuwuCuDGujT is exact for linear forward models and hence, the EKS coincides with the interacting Langevin sampler in the linear setting. However, for nonlinear forward models the approximation of derivatives is only accurate in case the particles are close to each other. Since in the application of EKS the particles are aiming to represent a distribution, the particles are not expected to be close to each other. This fact suggests to formulate a localized version of the preconditioning sample covariance matrix, incorporating more weights on particles close to each other, but reducing the weight between particles far away. Therefore, we define the distance-dependent weights between particle utj and uti

wtji=exp12γutjutiD2l=1Jexp12γutjutlD2,E39

for scaling parameters γ>0 and symmetric positive-definite matrix DRnu×nu. The localized (mixed) sample covariance matrix around particle utj is the defined as

Cutj=i=1Jwtjiutiu¯tjutiu¯tj,Cuwutj=i=1Jwtjiutiu¯tjGutiG¯tj,E40

with localized mean

u¯tj=i=1Jwtjiuti,G¯tj=i=1JwtjiGuti.E41

The localized EKS then reads as

dutj=CuwutjΓ1GutjyCutjC01utjdt+2CutjdWt,j=1,,J.E42

While the original EKS shows promising results for nearly Gaussian target distribution, the considered localized variant helps to extend the scope to multimodal target distributions [34]. Other such related work has aimed to provide further understandings of the EKS. This has included the derivation of providing mean field limits and, but also providing various generalizations [33, 35].

Advertisement

5. Other directions

As we have discussed some of the more recent developments in EKI, we now focus on other, more smaller, extensions. In this section we will discuss these each in turn, which will include machine learning, understanding EKI in the context of nonlinear inverse problems, and finally applications related to engineering such as geophysical sciences.

5.1 Applications in machine learning

The developments of machine learning methodologies has seen a significant increase in the last decade, which have been produced to solve problems related to health-care, imaging, and decision processes. In particular much of the these developments has been to due the advancements in optimizaion theory. As a result, ensemble Kalman methods can be viewed as a natural class of algorithms to be directly applied, as they are derivative-free optimizers.

The first work aimed at characterizing this connection was [36] which demonstrated this. The authors motivated EKI as a replacement to SGD where they initially applied it to supervising learning problems. Given a dataset xjyjj=1N assumed to be i.i.d. samples from a particular distribution, then given the Monte Carlo approximation one has the minimization procedure

argminuΦsuxy,Φsuxy=1Ni=1NLGuxjyj+λ2uC02,E43

where L:Y×YR+, is some positive-definite function. In other words, one is trying to learn xj from the labeled data yj. Supervised learning is used for common ML applications such as image classification and natural language processing. Another related application is that of semi-supervised learning, which aims to learn xj from some of the data yj where do not have access to all of it. This modified the least squares functional given in (43).

Another interesting direction has been the inclusion of EKI for training and learning neural networks [37]. This builds upon the previous work discussed, but with a number of modifications. In particular what the authors show is that they are able to prove convergence of EKI to the minimizer of a strongly convex function. They apply their modified methodology to a nonlinear regression problem of the form

Fθ=+εsin,E44

where θ is the parameter of interest and Fθ is the objective functional of interest. This was also extended to the likes of image classification problems, specifically the well-known MNIST handwritten data set.

A final and more recent direction of EKI and ML, was the work of Guth et al. [38], which provided a way of solving the forward problem, within EKI.

5.2 Extensions to nonlinear convergence analysis

A major challenge with EKI, and the EnKF in general, is establishing convergence analysis and properties in the nonlinear setting. As it is well known in the linear and Gaussian setting, as the the number of particles N, the EnKF coincides with the KBF. However in the nonlinear setting it is has been challenging to derive any such results rigorously. Some ongoing and recent work has aimed to bridge the connections between EKI and nonlinear dynamics. The first paper that provided some form of analysis was the work of Chada et al. [24] which considered a specific form of EKI, in the discrete-time setting.

Namely the update formula is modified to

mn+1=mn+CnppCnup+hn1Γ1zHmn,Cn+1=CnuuCnupCnpp+hn1Γ1Cnpu+αn2Σ,E45

where we adopt an ensemble square root filter formulation, which is known to perform better. As well as this we also include covariance inflation (i.e. inflation factor of αn), and an adaptive step-size hn motivated from stochastic optimization to allow an acceleration for the convergence. However the other underlying contribution, as eluded to, is that given this update form we are able to prove convergence towards both local and global minimizers. In other words for the later, we have the following result

λcmNu2mNuDNα,E46

which the above result establishes polynomial convergence. We note from the above equation, that λc is a convexity constant, is the associated loss function, D is some constant, u is the global minimizer and α is some term, which we refer to [24], for further details.

As one can notice, this convergence analysis was considered for the discrete-time setting, so a natural extension from this is to the continuous-time framework. The work of Blomker et al. [18] provide a first convergence analysis in this direction. However given both these works, a full understanding in the nonlinear setting has not been achieved, where considerable work is still required. Thus these papers provide a first step in doing so, for both settings.

5.3 Engineering applications

As a final direction to discuss in detail, which is very much related to the theme of this book, are applications in particular engineering applications. The advantage of these ensemble Kalman methods, is that they can be viewed as a black box-solver, therefore it is highly applicable. One particular application has been geophysical sciences, related to recovering quantities of interest which are below the surface, or subsurface. Examples include the inverse problem of electrical resistivity tomography (ERT), shown below (Figure 3).

Figure 3.

Image depicting electrical resistivity tomography, where the the electric currents are recorded at the electrodes of the subsurface material.

ERT is concerned with recovering, or characterizing sub-surface materials in terms of their electrical properties, which are recorded through electrodes. It operates very similarly to electrical impedance tomography (EIT), expect the difference being that it is subsurface. This has been also considered for learning permeability of subsurface flow in a range of different settings which can be found in the following papers [39, 40].

Another interesting direction is related to walls, specifically quantifying uncertainty in thermo-physical properties of walls. This work was conducted by Iglesias et al. [41, 42]. Specifically the application is the inverse problem of recovering the thermodynamic property or temperature. Similar work related to the methodology used here has been used in resin transfer modeling [43], based on problems of moving boundaries. This is a difficult problem to model, however it provides a first step in doing so. Aside from these applications other particular applications include mineral exploration scattering problems, numerical climate models and others [44, 45, 46]. It is worth mentioning that, as of now, there is no official online software package for EKI in general. This is currently being developed, but we emphasize to the reader that the methodology presented, with the examples later, are not related to well known softwares that are available in Matlab or Python.

As a side remark, there are more directions beyond what is discussed above. Some others, without going into details, include developing hierarchical approaches, incorporating constrained optimization, and connections with data assimilation strategies [47, 48, 49, 50, 51].

Advertisement

6. Numerical experiments

In this section we provide some numerical experiments highlighting the performance of ensemble Kalman methods for inverse problems. Specifically we will consider EKI as discussed in Section 2. We will compare EKI with its regularized version of TEKI. Both these methodologies will be tested on on two motivating inverse problems arising in geophysical and atmospheric sciences, i.e. a Darcy flow partial differential equation and the Navier–Stokes Equation.

In order to assess a comparison, we will present three different figures. (i) The first being a reconstruction at the end of the iterative scheme; (ii) the error between the approximate solution and the ground truth, and (iii) the data misfit. The equations associated with each are given as.

Reconstruction through EKI: 1Jj=1Junj.

Relative error: uuL22uL2.

Data misfit: Γ1/2(yGu2.

6.1 Darcy flow

Our first model problem is an elliptic partial differential equation (PDE), which has numerous applications. Specifically one of them is subsurface flow in a porous medium. The forward problem is concerned with solving for the pressure pH01Ω, given the permeability κLΩ and source function fLΩ, where the PDE is given as

κp=f,Ω,E47
p=0,onΩ.E48

such that we have prescribed Dirichlet boundary conditions, and Ω=012Rd, for d=2, is a Lipschitz domain. The inverse problem associated to solving p from (47) is the recovery of the permeability κLΩ, from noisy measurements of p, i.e.

y=Gκ+η,ηN0Γ,E49

where recalling that Gκ=p. We consider 64 equidistance observations within the domain, and on the boundary. To numerically solve (47) we employ a centered-finite difference method with a mesh size of h=1/100. For our noisy observations we consider Γ=γI, where γ=0.01. We will use and compare EKI and TEKI, with an ensemble size of J=50 for both methods. We will run both iterative schemes for n=24 iterations. For our initial ensemble u0j=1J we consider modeling it as a Gaussian random field, i.e. uN0C, which can be done via the Karhunen-Loève expansion

u=k+λkϕkξk,ξkN01,E50

where λkϕk are the associated eigenvalues and eigenvectors of the covariance operator C. There are numerous choice of covariance functions one can take, however a popular choice is the Matérn covariance function, which provides much flexibility for modeling. For full details on various covariance functions, or operators, we refer to reader to [52]. The true unknown of interest is taken to be also a Gaussian random field, but one that is smoother than that of that of the initial ensemble.

Our first set of experiments are provided in Figure 4 which shows the truth, the reconstruction from using EKI, and that of using TEKI. As we can observe, is it clear that both methodologies work well at learning the true unknown function. However it is clear that the TEKI induces a smoother reconstruction, which arises from the regularization. However, what is interesting is that if we analyze Figure 5, we notice that the relative error tends to diverge at the end with EKI, and this is due to the overfitting of data. A motivation behind TEKI is to alleviate this. This can be seen vividly as it tends to decrease, and for the data misfit, it remains within the noise level, which is given as

Figure 4.

Reconstruction plots for the Darcy flow PDE example. Left: Truth. Middle: EKI reconstruction. Right: TEKI reconstruction.

Figure 5.

Relative errors and data misfits for the Darcy flow PDE example. We compare EKI with TEKI.

noiselevel=yGu=η.E51

6.2 Navier: stokes equation

Our final test problem is a well-known PDE model arising in numerical weather prediction which is the Navier–Stokes equation (NSE). We consider a 2D NSE defined on a torus T2=012 with periodic boundary conditions. The aim to estimate the velocity v0×T2R2 defined as a vector field from the scalar pressure field p0×T2R2. The NSE is given as

tv+vv+pνΔv=f,0×T2,E52
v=0,0×T2,E53
v=u,0×T2,E54

with initial condition (54) and zero flux (53). From (52) f0×T corresponds to a volume forcing, ν is the associated viscosity of the fluid. For the NSE equation we consider a spectral Fourier solver for (52). The PDE is more challenging to invert than the previous example, therefore we take 100 point-wise observations. The setup is largely the same as the previous example, where we take an initial condition based on a Gaussian random field through the KL expansion (50). We will aim to recover the true underlying function u using both EKI and TEKI. The results are obtained from the experiments are presented in Figures 6 and 7. A similar phenomenon shows, where the reconstructions work well, however there is an additional smoothness induced through the regularization in TEKI. Similarly, as we see with the relative errors and data misfit the overfitting of the data in the end for EKI. We note that this can be avoided depending on the prior form, its hyperparameters, the observations, and the noise. However we specify particular choices to demonstrate it can occur.

Figure 6.

Reconstruction plots for the NSE PDE example. Left: Truth. Middle: EKI reconstruction. Right: TEKI reconstruction.

Figure 7.

Relative errors and data misfits for the NSE PDE example. We compare EKI with TEKI.

Advertisement

7. Conclusion

The ensemble Kalman filter (EnKF) is a simplistic, easy-to-implement and powerful algorithm. This has been particularly the case in numerous data assimilation applications for state estimation, which includes the likes of numerical weather prediction, geosciences and more recently machine learning. A major advantage of the method is that, unlike other filters such as the particle filter, it scales better in high dimensions, and can be significantly cheaper. In this chapter we consider the EnKF and its application to parameter estimation. Such a mathematical procedure also has similar applications to the ones states, where one can exploit such techniques for inverse problems. We provide a review and overview of some of the major contributions in this direction, where the resulting methodology is known as ensemble Kalman inversion (EKI), based largely on the work of Iglesias et al. [13]. We presented various avenues the field of EKI has taken such as regularization, extensions to sampling, and other areas. We demonstrated how EKI can perform on two numerical examples PDE examples.

The EKI methodology is one which builds very naturally from many different fields, which acts a strong motivation. For example being an optimizer, one can naturally apply optimization procedures, but also techniques from data assimilation and uncertainty quantification. As a result, this methodology naturally brings researchers from different fields working towards parameter estimation, and inverse problems. This synergy of areas will hopefully ensure new emerging directions within EKI, from a methodological, theoretical and application perspective.

Advertisement

Acknowledgments

This work was funded by KAUST baseline funding. The author thanks Simon Weissmann for helpful discussions, and for the use of some of the earlier figures, and information on EKS.

Advertisement

Abbreviations

EnKFEnsemble Kalman filter
EKIEnsemble Kalman inversion
EKSEnsemble Kalman sampler
EITElectrical impedance tomography
ERTElectrical restivitiy tomography
MNISTModified National Institute of Standards and Technology
KLKullback-Leibler
KFKalman filter
L96Lorenz 96 model
LSFLeast squares functional
PDEPartial differential equation
SDEStochastic differential equation

References

  1. 1. Kaipio J, Somersalo E. Statistical and Computational Inverse Problems. New York: Springer Verlag; 2004
  2. 2. Stuart AM. Inverse problems: A Bayesian perspective. Acta Numer. 2010;19:451-559
  3. 3. Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia: SIAM; 1987
  4. 4. Lorenc AC. Analysis methods for numerical weather prediction. Quarterly Journal of the Royal Meteorological Society. 1986;112(474):1177-1194
  5. 5. Majda A, Wang X. Non-linear Dynamics and Statistical Theories for Basic Geophysical Flows. Cambridge University Press; 2006
  6. 6. Oliver D, Reynolds AC. Liu N. 1st ed. Cambridge University Press: Inverse Theory for Petroleum Reservoir Characterization and History Matching; 2008
  7. 7. Bucy RS. Nonlinear filtering theory. IEEE Transactions on Automatic Control. 1965;10(198):198
  8. 8. Kalman RE. A new approach to linear filtering and prediction problems. Transactions ASME (Journal of Basic Engineering). 1960;82:35-45
  9. 9. Bain A, Crisan D. Fundamentals of Stochastic Filtering. New York: Springer; 2009
  10. 10. Law KJH, Stuart AM, Zygalakis K. Data assimilation: A mathematical introduction. In: Texts in Applied Mathematics. Springer; 2015
  11. 11. Evensen G. Data Assimilation: The Ensemble Kalman Filter. Springer; 2009
  12. 12. Evensen G. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean dynamics. 2003;53(4):343-367
  13. 13. Iglesias MA, Law KJH, Stuart AM. Ensemble Kalman methods for inverse problems. Inverse Problems. 2013;29
  14. 14. Li G, Reynolds AC. Iterative ensemble Kalman filters for data assimilation. SPE Journal. 2009;14:496-505
  15. 15. Lehtinen MS, Paivarinta L, Somersalo E. Linear inverse problems for generalised random variables. Inverse Problems. 1989;5(4):599-612
  16. 16. Schillings C, Stuart AM. Analysis of the ensemble Kalman filter for inverse problems. SIAM Journal on Numerical Analysis. 2017;55(3):1264-1290
  17. 17. Blomker D, Schillings C, Wacker P, Weissmann S. Well posedness and convergence analysis of the ensemble Kalman inversion. Inverse Problems. 2019;35(8):085007
  18. 18. Blomker D, Schillings C, Wacker P, Weissmann S. Continuous time limit of the stochastic ensemble Kalman inversion: Strong convergence analysis. Preprint arXiv:2107.14508. 2021
  19. 19. Benning M, Burger M. Modern regularization methods for inverse problems. Acta Numer. 2018;27:1-111
  20. 20. Engl HW, Hanke K, Neubauer A. Regularization of Inverse Problems, Mathematics and its Applications. Vol. 375. Dordrecht: Kluwer Academic Publishers Group; 1996
  21. 21. Iglesias MA. A regularising iterative ensemble Kalman method for PDE-constrained inverse problems. Inverse Problems. 2016;32
  22. 22. Iglesias MA, Yang Y. Adaptive regularisation for ensemble Kalman inversion with applications to non-destructive testing and imaging. arXiv preprint arXiv:2006.14980. 2020
  23. 23. Chada NK, Tong XT, Stuart AM. Tikhonov regularization for ensemble Kalman inversion. SIAM Journal on Numerical Analysis. 2020;58(2):1263-1294
  24. 24. Chada NK, Tong XT. Convergence acceleration of ensemble Kalman inversion in nonlinear settings. Mathematics of Computation. 2022;91(335):1247-1280
  25. 25. Lee Y. lp regularization for ensemble Kalman inversion. SIAM Journal on Scientific Computing. 2021;43(5):3417-3437
  26. 26. Schneider T, Stuart AM, Wu J-L. Imposing sparsity within ensemble Kalman inversion. arXiv preprint, arXiv:2007.06175. 2020
  27. 27. Weissmann S, Chada NK, Schillings C, Tong XT. Adaptive Tikhonov strategies for stochastic ensemble Kalman inversion. Inverse Problems. 2022;38(4):045009
  28. 28. Ernst OG, Sprungk B, Starkloff H-J. Analysis of the ensemble and polynomial chaos Kalman filters in Bayesian inverse problems. SIAM/ASA Journal on Uncertainty Quantification. 2015;3:823-851
  29. 29. Garbuno-Inigo A, Hoffmann F, Li W, Stuart AM. Interacting Langevin diffusions: Gradient structure and ensemble Kalman sampler. SIAM Journal on Applied Dynamical Systems. 2020;19(1):412-441
  30. 30. Pavliotis G. Stochastic processes and applications: Diffusion processes, the Fokker–Planck and Langevin equations. In: Texts in Applied Mathematics. New York: Springer; 2014
  31. 31. Kullback R, Leibler S. On information and sufficiency. Annals of Mathematical Statistics. 1951;22:79-86
  32. 32. Markowich PA, Villani C. On the trend to equilibrium for the Fokker Planck equation: An interplay between physics and functional analysis, Physics and Functional Analysis, Matematica Contemporanea (SBM). 19, 1999
  33. 33. Garbuno-Inigo A, Nüsken N, Reich S. Affine invariant interacting Langevin dynamics for Bayesian inference. SIAM Journal on Applied Dynamical Systems. 2020;19(3):1633-1658
  34. 34. Reich S, Weissmann S. Fokker–Planck particle Systems for Bayesian Inference: Computational approaches. SIAM Journal of Uncertainty Quantification. 2021;9(2):446-482
  35. 35. Ding Z, Li Q. Ensemble Kalman sampler: Mean-field limit and convergence analysis. SIAM Journal on Mathematical Analysis. 2021;53(2):1546-1578
  36. 36. Kovachki NB, Stuart AM. Ensemble Kalman inversion: A derivative-free technique for machine learning tasks. Inverse Problems. 2019;35(9):095005
  37. 37. Haber E, Lucka F, Ruthotto L. Never look back - A modified EnKF method and its application to the training of neural networks without back propagation. arxiv preprint. 1805;08034:2018
  38. 38. Guth PA, Schillings C, Weissmann S. Ensemble Kalman filter for neural network based one-shot inversion. arXiv preprint. 2020
  39. 39. Tso CM, Iglesias M, Wilkinson P, Kuras O, Chambers J, Binley A. Efficient multiscale imaging of subsurface resistivity with uncertainty quantification using ensemble Kalman inversion. Geophysical Journal International. 2021;25(2)
  40. 40. Muir JB, Tsai VC. Geometric and level set tomography using ensemble Kalman inversion. Geophysical Journal International. 2020;220:967-980
  41. 41. Iglesias MA, Sawlan Z, Scavino TR, Wood C. Bayesian inferences of the thermal properties of a wall using temperature and heat flux measurements. International Journal of Heat and Mass Transfer. 2018;116:417-431
  42. 42. De Simon L, Iglesias MA, Jones B, Wood C. Quantifying uncertainty in thermal properties of walls by means of Bayesian inversion. Energy and Buildings. 2018;177(2):177
  43. 43. Iglesias M, Park M, Tretyakov MV. Bayesian inversion in resin transfer modelling. Inverse Problems. 2018;34(10)
  44. 44. Sungkono S, Apriliani E, Saifuddin N, Fajriani F, Srigutomo W. Ensemble Kalman inversion for determining model parameter of self-potential data in the mineral exploration. In: Biswas A, editor. Self-Potential Method: Theoretical Modeling and Applications in Geosciences. Cham: Springer Geophysics. Springer; 2021
  45. 45. Dunbar ORA, Garbuno-Inigo A, Schneider T, Stuart AM. Calibration and uncertainty quantification of convective parameters in an idealized GCM. Journal of Advances in Modeling Earth Systems. 2021
  46. 46. Huang J, Li Z, Wang B. A Bayesian level set method for the shape reconstruction of inverse scattering problems in elasticity. Computers & Mathematics with Applications. 2021;97(1):18-27
  47. 47. Albers DJ, Blancquart PA, Levine ME, Seylabi EE, Stuart AM. Ensemble Kalman methods with constraints. Inverse Problems. 2019;35(9):095007
  48. 48. Chada NK, Chen Y, Sanz-Alonso D. Iterative ensemble Kalman methods: A unified perspective with some new variants. Foundations of Data Science. 2021;3(3):331-369
  49. 49. Chada NK, Iglesias MA, Roininen L, Stuart AM. Parameterizations for ensemble Kalman inversion. Inverse Problems. 2018;34
  50. 50. Chada NK, Schillings C, Weissmann S. On the incorporation of box-constraints for ensemble Kalman inversion. Foundations of Data Science. 2019;1(4):433-456
  51. 51. Tong XT, Morzfield M. Localized ensemble Kalman inversion. arXiv preprint arXiv:2201.10821. 2022
  52. 52. Lord G, Powell CE, Shardlow T. An Introduction to Computational Stochastic PDEs. Cambridge Texts in Applied Mathematics; 2014

Written By

Neil K. Chada

Submitted: 25 July 2022 Reviewed: 21 September 2022 Published: 01 November 2022