Bayesian Inference for Inverse Problems

Ali Mohammad-Djafari

doi:10.5772/intechopen.104467

Abstract

Inverse problems arise everywhere we have indirect measurement. Regularization and Bayesian inference methods are two main approaches to handle inverse problems. Bayesian inference approach is more general and has much more tools for developing efficient methods for difficult problems. In this chapter, first, an overview of the Bayesian parameter estimation is presented, then we see the extension for inverse problems. The main difficulty is the great dimension of unknown quantity and the appropriate choice of the prior law. The second main difficulty is the computational aspects. Different approximate Bayesian computations and in particular the variational Bayesian approximation (VBA) methods are explained in details.

Keywords

inverse problems
hidden variable
hierarchical models
approximate Bayesian computation
variational Bayesian approximation

Author Information

Show +

Ali Mohammad-Djafari*
- CNRS, France
- ISCT, France

*Address all correspondence to: djafari@ieee.org

1. Introduction

Inverse problems arise in many scientific and engineering applications. In fact, almost always we want to infer on quantities, variables, distributions and functions which are not directly observable. Inferring on a hidden variable f via the observation of another variable g is the main objective in many scientific area [1, 2, 3].

Classically, the Bayesian methods have been developed for direct observation of a quantity, its parametric modeling and the estimation of the parameters of the model. Description and development of the Bayesian inference for the case of inverse problems is the main objective of this chapter. The chapter is organized as follows: First a few inverse problems are mentioned, mainly in two categories: those described by Ordinary Differential Equations (ODE) and Partial Differential Equations (PDE)’s and those described with integral equations. Then, we will see that two main problems arise: parameter estimation and inversion. First the Bayesian parameter estimation is described and then the Bayesian inversion.

2. Inverse problems

To see easily the two categories of inverse problems, first a very simple example is given. Consider the electrical circuit of Figure 1.

Figure 1.
A simple electrical circuit example to show two different expressions of inverse problems modeling: Ordinary differential equation (ODE) or Integral equation (IE).

Using the notations used on the figure, we can easily obtain the following ODE:

gt+θ∂gt∂t=ftE1

Then, using the Fourier transform (FT), we obtain easily the following integral equation:

ft=∫fτht−τdτE2

These two simple equations describe the same linear inverse problem, where we can distinguish the following mathematical problems:

Forward problem: Given the parameter θ of the system and the input, f (t) predict the output g(t).
Parameter estimation: Given the input f (t) and the output g(t), estimate the parameter θ.
System identification: Given the input f (t) and the output g(t), estimate the impulse response (IR) of the system h(t).
Inverse problems:
- Simple: Given the characteristics of the system (either the parameter θ or equivalently the impulse response h(t)) and the output g(t) estimate the input f (t);
- Blind: Given the output g(t) estimate both the system, parameter θ or the impulse response h(t), and input f (t).

For general vocabulary and examples, see [2, 4, 5].

2.1 Examples of linear inverse problems

Here, a few examples of classical inverse problems are listed.

2.1.1 Deconvolution

When the forward problem is a convolution operation:

gt=∫fτht−τdτ,E3

the inverse problem is called Deconvolution (Figure 2). Figure 3 shows an example of deconvolution problem which arise in radio astronomy.

Figure 3.
Image deconvolution or restoration inverse problem in sattelite imaging.

2.1.2 Image restoration

In many imaging systems, such as visual cameras, microscopes, telescopes or Infra Red cameras, due to some limitations such as limited aperture or limited resolution, the forward problem can be approximated by a 2D convolution equation:

gxy=∬fx′y′hx−x′y−y′dxdy.E4

The corresponding inverse problem is called image deconvolution or more often image restoration. The example given in Figure 4, is the case of satellite imaging [6, 7].

Figure 4.
Image reconstruction in CT. On the left, the projections grϕ and on the right the object f (x, y).

2.1.3 Image reconstruction in X ray computed tomography (CT)

In X-ray CT, assuming parallel geometry, where a ray is characterized by its angle ϕ and its distance r from the center of the object f (x, y) the relation between the data grϕ, called projections at angle ϕ and the function f (x, y), called object, is given by the Radon transform:

grϕ=∬fxyδr−xcosϕ−ysinϕdxdy.E5

The inverse problem here is called Image reconstruction. A simulated example is shown in Figure 4.

2.1.4 Fourier synthesis

In many imaging systems, when using FT, it is possible to model the inverse problem with the following forward FT relation:

guv=∬fxyexp−jux+vydxdy,E6

where the data, after an appropriate FT, can fill partially the Fourier domain g(u, v) of the unknown interested function f (x, y) [8, 9]. Figure 5 shows the case of X-ray CT.

Figure 5.
Fourier synthesis (FS) inverse problems arising in many imaging systems. Here is illustrated the FS problem in X-ray CT.

2.1.5 General linear inverse problems

All the examples of the linear inverse problems listed above, can be summarized in the following general form:

gs=∫frhsrdrE7

where s can be either t,xy,rϕ or (u, v) and r, respectively τ,x′y′,xy and (x, y).

3. Bayesian parameter estimation

To introduce, in a very simple way, the Bayes rule for parameter estimation, we consider the case where we have a set of data: g=g1⋯gn where we assign them a probability law pgiθ with a set of unknown parameters θ. The question now is how to infer θ from those data. We can immediately use the Bayes rule:

pθg=pgθpθpg∝lθpθE8

where:

lθ≜pgθ=Πipgiθ is called the likelihood, representing the uncertainty in the data knowing the parameters;
pθ is called the prior or a priori, a probability law assigned to the parameters to represent the prior knowledge (to the observation data) we may have on those parameters;
the denominator p(g)

pg=∫pgθpθdθE9

is called the evidence.

So, the process of using the Bayes rule for parameter estimation can be summarized as follows:

Write the expression of the likelihood pgθ
Assign the prior pθ to translate all we know about θ before observing the data g
Apply the Bayes rule to obtain the expression of the posterior law pθg
Use the posterior pθg to do any inference on θ. For example:
- Compute its expected value, called Expected A Posteriori (EAP) or Posterior Mean (PM):
  θ̂PM=∫θpθgdθE10
- Compute the value of θ for which the pθg is maximum; Maximum A Posteriori (MAP):
  θ̂MAP=argmaxθpθgE11
- Sampling and exploring [Monte Carlo methods]

θ∼pθg

which gives the possibility to obtain any statistical information we want to know about θ. For example, if we generate N samples θ1⋯θN, for large enough N, we have:

Eθ≃1N∑n=1Nθn.E12

3.1 One parameter case

When θ is a scalar quantity, then, we can also do the following computations:

Compute the value of θMed such that:

Pθ>θMed=Pθ<θMedE13

which is called the median value. Its computation needs integration:

∫−∞θMedpθgdθ=∫θMed∞pθgdθE14

Compute the value θα, called α quantile, for which

Pθ>θα=1−Pθ<θα=∫θ̂α∞pθgdθ=1−αE15

Region of high probabilities: [needs integration methods]

θ̂1θ̂2:∫θ̂1θ̂2pθgdθ=1−α

Bayes rule and Bayesian estimation can be illustrated as follows:

Two main points are of great importance:

How to assign the prior pθ in the second step; and
How to do the computations in the last step.

This last problem becomes more serious with multi parameter case.

3.2 Multi-parameter case

If we have more than one parameter, then θ=θ1⋯θn′. The Bayes rule still holds:

pθg=pgθpθgpgE16

Now, again, we can compute:

The Expected A Posteriori (EAP):

θ̂PM=∫θpθgdθ,E17

but this needs efficient integration methods.

The Maximum A Posteriori (MAP):

θ̂MAP=argmaxθpθgE18

but this needs efficient optimization methods.

Sampling and exploring [Monte Carlo methods]

θ∼pθg

but this needs efficient sampling methods.

We may also try to localize the region of the highest probability:

Pθ∈Θ=∫Θpθgdθ=1−αE19

for a given small α, but this problem may not have a unique solution.

4. Bayesian inference for inverse problems

As described before, in inverse problems, the unknown f is a function (of time, space, wavelength, …) and the observable quantity g is also another function which is related to f via an operator g=Hf+ϵ. When discretized, they can be represented by the great dimensional vectors f, g and g=Hf+ϵ, where ϵ represents all the errors (measurement, model and discretization). When the operator is a linear one, we have:

g=Hf+ϵ,E20

where f is a vector of length n, H the known forward model matrix of size m×n, and g and ϵ two vectors of size m.

The Bayes rule for this case is written as:

pfgM=pgfMpfMpgME21

where we introduce M to represent the model, pgfM, called commonly the likelihood, is obtained using the forward model (20) and the assigned probability law of the noise pϵ,pfM is the assigned prior model and pfgM the posterior probability law. Figure 6 shows in a schematic way the main ingredients of the Bayesian inference for inverse problems.

Figure 6.
Illustration of the Bayesian inference for inverse problems.

This even very simple linear model has been used in many areas: linear inverse problems, compressed sensing, curve fitting and linear regression, machine learning, etc.

In inverse problems such as deconvolution, image restoration, f represent the input or original image, g represents the blurred and noisy image and H is the convolution operator matrix. In image reconstruction in Computed Tomography (CT), f represents the distribution of some internal property of an object, for example the density of the material and g represents the radiography data and H is the radiographic projection operator (discretized Radon transform operator).

In Compress Sensing, g is the compressed data, f is the uncompressed image and H the compressing matrix. In machine learning, g are the data, H is a dictionary and f represents the sparse coefficients of the projections of the data on that dictionary.

5. Hyperparameter estimation

When applying the Bayes rule, the main terms which are the likelihood and prior depend on parameters, which cannot be fixed in practical situation. We may thus want to estimate them from the data. In the Bayesian approach, this can be done easily:

pfθ1θ2g=pgfθ1pfθ2pθ1pθ2pgE22

where pθ1 and pθ2 are the prior probability laws assigned to θ1 and θ2 and often pθ=pθ1pθ2. We can then write more succinctly:

pfθgθ0=pgfθ1pfθ2pθpgE23

The scheme of this situation is illustrated in Figure 7.

Figure 7.
Illustration of the Bayesian approach for inverse problems with unknown hyperparameters.

From here, we have different directions for doing estimation:

5.1 Joint maximum a posteriori (JMAP)

Rewriting the expression of the joint posterior law:

pfθg=pgfθ1pfθ2pθpg∝pgfθ1pfθ2pθE24

where ∝ means equal up to a constant factor which is 1/pg. In this case, we can try to optimize it with respect to its two arguments:

f̂θ̂=argmaxfθpfθ̂gE25

This can be done, for example, by alternate optimization:

{f̂k+1=argmaxfpfθ̂kgθ̂k+1=argmaxθpfkθ̂gE26

When the optimization algorithm is successful, we have the optimal values of f̂ and θ̂. This method can be summarized as follows:

5.2 Marginalization over θ

The main idea here is to consider θ as a nuisance parameter. Thus, integrating it out, we get

pfg=∫pfθgdθE27

which can be used to infer on f. Also, if we still want to get estimates of θ, we can first obtain an estimate f̂ for f and then, if needed, to use it as it is illustrated in the following scheme:

5.3 Marginalization over f

The main idea here is first find a good estimate for the parameters θ and then use it for the inference on f. So, first obtain:

pθg=∫pfθgdfE28

which can be used to first estimate θ and then use it. For example, the method which is related to the Second type Maximum likelihood, first estimate θ̂ by

θ̂=argmaxθpθ̂gE29

and then use it with pfθ̂g to infer on f̂. For a flat prior model, pθg∝pgθ which is called the likelihood and the estimator

θ̂=argmaxθpθ̂g=argmaxθpgθ̂E30

is called Maximum Likelihood (ML) and the whole approach is called ML of second type. This method can be summarized as follows:

The main difficulty in this approach is that, rarely we can have an analytical expression for the first marginalization. To overcome this difficulty, many algorithms have been proposed to compute f. One of them is called Expectation- Maximization (EM) and its generalization (GEM). The main idea of these algorithms are summarized in the following subsections:

5.3.1 EM and GEM algorithms

To summarize these methods, we use the vocabulary of the main authors of EM method, where f is considered as hidden variable, g as incomplete data, (g, f) as complete data, lnpgθ incomplete data log-likelihood and lnpgfθ as complete data log-likelihood. Then, the following iterative algorithms describe the EM and GEM algorithms.:

EM Iterative algorithm:

{E‐step:Qθθ̂k=Epfgθ̂klnpgfθM‐step:θ̂k=argmaxθQθθ̂k−1E31

GEM (Bayesian) algorithm:

{E‐step:Qθθ̂k=Epfgθ̂klnpgfθ+lnpθM‐step:θ̂k=argmaxθQθθ̂k−1E32

These methods can be summarized in the following scheme:

pfθg→EM,GEM→θ̂→pfθ̂g→f̂

5.4 Variational Bayesian approximation

VBA is a powerful approach to do approximate Bayesian computation. It starts by first obtaining the expression of the joint pfθg and then by approximating it with a simpler probability law qfθg which can be handled much easily for the computations. VBA can be summarized in the following steps:

Approximate pfθg by qfθg=q1fgq2θg and then continue computations.
To do this approximation, we need a criterion to qualify the approximation. The standard criterion to measure the proximity of two probability laws p and q is the Kullback–Leibler (KL) criterion KLqfθg:pfθg.
It is easy to show that:

KLq:p=∬qlnq/p=∬q1q2lnq1q2p=∫q1lnq1+∫q2lnq2−∬qlnp=−Hq1−Hq2−<lnp>qE33

Alternate optimization of KL(q₁q₂:p) with respect to q₁ and q₂ results to:

{q1f∝explnpgfθMq2θq2θ∝explnpgfθMq1fE34

As KL(q₁q₂:p) is convex as a function of q₁ and q₂, the algorithm converges (locally) to the optimum solution. At the end, we have the expressions of q₁(f) and q2θ which can, then, be used to infer on f and θ. VBA is summarized in the following scheme:

pfθg→VariationBayesianApproximation→q1f→f̂→q2θ→θ̂

In real applications, we choose parametric probability law for q₁(f) q2θ, and so, the iterations will be done on the parameters. What is interesting is that, choosing appropriate parametric models for q₁(f) and q2θ we obtain either JMAP and GEM as special cases.

Case 1: Deterministic or degenerate expressions → Joint MAP

{q̂1ff∼=δf−f∼q̂2θθ∼=δθ−θ∼→{f∼=argmaxfpfθ∼gMθ∼=argmaxθpf∼θgME35

Case 2: Degenerate expression for θ and marginal expression for f→EM

{q̂1f∝pfθgq̂2θθ∼=δθ−θ∼→{Qθθ∼=lnpfθgMq1fθ∼θ∼=argmaxθQθθ∼E36

Case 3: q₁ and q₂ are chosen proportional to the marginals pfθ∼gM and pθf∼gM. This is a very appropriate choice for inverse problems, in particular cases where we use the exponential families and conjugate priors.

{q̂1f∝pfθ∼gMq̂2θ∝pθf∼gM→{Accounts for the uncertainties ofθ∼forf̂and vise versa.E37

In the following schemes these three cases are illustrated for comparison.

JMAP Alternate optimization Algorithm:

θ0→θ∼→f∼=argmaxfpfθ∼g→f∼→f̂↑↓θ̂←θ∼←θ∼=argmaxθpf∼θg←f∼

EM:

θ0→θ∼→q1f=pfθ∼g→q1f→f̂↑↓θ̂←θ∼←Qθθ∼=⟨lnp(f,θ∣g)⟩q1fθ∼=argmaxθQθθ∼←q1f

VBA:

θ0→q2θ→q1f∝explnpfθgq2θ→q1f→f̂↑↓θ̂←q2θ←q2θ∝explnpfθgq1f←q1f

5.5 Hierarchical priors

One last extension is the case where f, itself depends on another hidden variable z. So that we have:

pfzθg∝pgfθ1pfzθ2pzθ3pθ,E38

where θ=θ1θ2θ3. This situation is shown in Figure 8. Again, here, we may only be interested to f or (f, z) or to all the three variables (f, z, θ). Here too, we can either use methods of JMAP, marginalization or VBA to infer on these unknowns.

Figure 8.
Illustration of advanced Bayesian approach with hierarchical prior modeling with hidden variables.

6. Linear forward models and Gaussian case

Linear models are of importance. Gaussian prior laws are the most common and the easiest ones to handle. Also, many non-linear problems can be approximated by equivalent linear ones. Linear models with Gaussian prior laws are the easiest and powerful tools for a great number of scientific problems. In this section, an overview and some main properties are given.

6.1 Simple supervised case

Let consider the linear forward model we considered in previous section

g=Hf+ϵ,E39

and assign Gaussian laws to ϵ and f which leads to:

{pgf=NgHfvϵI∝exp−12vϵg−Hf2pf=Nf0vfI∝exp−12vff2E40

Using these expressions, we get:

{pfg∝exp−12vϵg−Hf2−12vff2∝exp−12vϵJfwithJf=g−Hf2+λf2,λ=vϵvfE41

which can be summarized as:

pfg=Nff̂Σ̂withf̂=H′H+λI−1H′gandΣ̂=vϵH′H+λI−1,E42

where λ=vϵvf. This case is summarized in Figure 9.

Figure 9.
Supervised linear Gaussian case.

This is the simplest case where we know exactly the expression of the posterior law and all the computations can be done explicitly. However, for great dimensional problems, where the vectors f and g are very great dimensional, we may even not be able to keep in memory the matrix H and surely not be able to compute the inverse of the matrix H′H+λI. In Section 9 on Bayesian computation, We will see how to do these computations.

6.2 Unsupervised case or hyperparameter estimation

In the previous section, we considered the linear models with Gaussian priors with known parameters vϵ and v_f. In many practical situations these parameters are not known, and we want to estimate them too. For this, we can assign them too prior laws. As the variances are positive quantities and using the concept of conjugate priors, we can assign then Inverse Gamma priors:

{pvϵ=IGvfαϵ0βϵ0pvf=IGvfαf0βf0E43

and using the likelihood pgfvϵ=NgHfvϵI and the prior pfvf=Nf0vfI, we can easily obtain the expressions of the following conditional posterior laws:

{pfgv̂ϵv̂f=Nff̂Σ̂with:f̂=HtH+λ̂I−1HtgΣ̂=v̂ϵHtH+λ̂I−1,λ̂=v̂ϵ/v̂fE44

and

{pvϵgf=IGvϵα∼ϵβ∼ϵpvfgf=IGvfα∼fβ∼fE45

where all the details and in particular the expressions for α∼ϵ,β∼ϵ,α∼f,β∼f can be found in [10].

As we can see, the expressions of f̂ and Σ̂ are the same as in the previous case, except that the values of v̂ϵ,v̂f and λ̂ have to be updated. They are obtained from the conditionals pvϵgf and pvfgf which depend on f. This shows that we can propose an iterative algorithm in two steps: Determine the expression of pfgv̂ϵv̂f and using the values of in the previous iteration, we can propose an estimate for f, and then, using pvϵgf and pvfgf, we can give estimates for v̂ϵ and v̂f which can again be used in the first step. It is interesting to know that all the three approaches of JMAP, GEM and VBA for this cas follow exactly this same iterative algorithm. The only differences will be in the update values of α∼ϵ,β∼ϵ,α∼f,β∼f and the choice of the estimators (MAP or PM) of v̂ϵ and v̂f.

This case is also summarized in Figure 10.

Figure 10.
Bayesian inference scheme in linear systems and Gaussian priors. The posterior is also Gaussian, and all the computations can be done analytically.

7. Non-Gaussian priors

Very often, assuming that the noise is Gaussian is valid in many applications, but a Gaussian prior may not be adequate. Thus, the case of Non-Gaussian priors is of great importance. A very well known example is the case of Generalized Gaussian:

pf∝exp−γ∑jfjβ.E46

The case of β=2 is the Gaussian case, β>2 gives the Super-Gaussian and β<1 is called Sub-Gaussian. Its particular case β=1 results to what is called Double Exponential (DE) prior law:

pf∝exp−γ∑jfj∝exp−γf1E47

which, when using with a Gaussian likelihood, results to:

{pfg∝exp−12vϵJfwithJf=12g−Hf2+λf1,λ=γvϵ.E48

From this, we can see that the computation of MAP solution needs an appropriate optimization algorithm and the computations of the Posterior Mean (PM) or Posterior Covariance (PCov) or any other expectations become more difficult. However, as we will see later, VBA can be used to do approximate computations.

Another example is the Total Variation (TV) regularization method [11, 12, 13, 14] which can be interpreted as choosing the prior

pf∝exp−γ∑jfj−fj−1∝exp−γDf1E49

where D is the first order difference matrix.

This prior with a Gaussian model for noise results to:

{pfg∝exp−12vϵJfwithJf=12g−Hf2+λDf1,λ=γvϵ.E50

One last example is using the Cauchy or more generally the Student-t distribution as the prior:

pf∝exp−γ∑jln1+fj2ν/2E51

which results to:

{pfg∝exp−12vϵJfwithJf=12g−Hf2+λ∑jln1+fj2ν/2,λ=γvϵ.E52

These three examples are of great importance. They have been used in the framework of MAP estimation and thus the optimization of the criteria J(f) for many linear inverse problems. However, the computation of other Bayesian estimators and uncertainty quantification (UQ) need again specific approximate solutions.

8. Hierarchical prior models

Even if simple Gaussian and non-Gaussian priors used in previous sections are of great importance and use in many applications, still they have, in many cases, limitations. For example, when we know that the signals have impulsive shapes or discontinuous or are piecewise continuous. The same limitations when we know, for example, that the images are composed of homogeneous regions with specified contours, or even, that the object under the test is composed of a limited number of homogeneous materials. Hierarchical models push farther these limitations of simple prior models. In the following, we consider three families of such hierarchical models: Sparsity aware models, Scaled Mixture models and Gauss-Markov-Potts models [10, 15, 16, 17].

8.1 Sparsity awarded hierarchical models

An easy way to consider the hierarchical sparsity awarded priors is to introduce a hidden variable, z and so consider the following Forward and prior models:

{g=Hf+ϵ,f=Dz+ζ,zsparse modeledbyDoubleExpDEE53

with

{pgf=NgHfvϵIpfz=NfDzvξI→pz=DEfγ∝exp−γz1E54

Then, we have to find the expression of the joint posterior law pfzg:

{pfzg∝exp−JfzwithJfz=12vϵg−Hf22+12vξf−Dz22+γz1E55

from which we can infer on f and z [10, 16, 18, 19, 20, 21, 22].

For the unsupervised case, we can add the appropriate priors:

{pγ=IGγαγzβγzpvϵ=IGvϵαϵ0βϵ0pvξ=IGvξαξzβξzE56

and thus obtain:

{pfzγvϵvξg∝exp−JfzγvϵvξwithJfzvϵvξγ=12vϵg−Hf22+12vξf−Dz22+γz1+αγz+n/2lnγ+βγz/γ+αϵ0+m/2lnvϵ+βϵ0/vϵ+αξz+n/2lnvξ+βξz/vξE57

It is interesting to note that the alternate optimization of this criterion gives the ADMM like algorithms [23, 24, 25] with the main advantage that here we have direct updates of the hyperparameters.

8.2 Scaled mixture models

Scaled Gaussian Mixture (SGM) models have been used in many applications to model rare events by their heavier tails with respect to Gaussian. They are also used in sparse signals modeling. A general SGM is defined as follows:

Sf=∫Nf0vpmvθdvE58

where the variance of the Gaussian model Nf0v is assumed to follow the mixing probability law pmvθ. Between many possibilities for this mixing pdf is Inverse-Gamma which results to Student-t:

Sfν=∫Nf0vIGvννdvE59

which have been extended to more general case:

Sfαβ=∫Nf0vIGvαβdvE60

This pdf models have been used with success in many developments in Bayesian approach for inverse problems by:

pfαβ=Πj∫Nfj0vjIGvjαβdvjE61

or

pfαβ=∫Nf0vΣIGvαβdvE62

Scaled Gaussian mixture models have been used extensively for modeling sparse signals. However, it happens very often that the signals or images are not sparse directly, but their gradients are, or more generally in a transformed domain such as Fourier or Wavelet domains. We have used these models extensively in hierarchical way:

{g=Hf+ϵ,f=Dz+ζ,zsparse Student→{pzjvzj=Nzj0vzj,pvzj=IGvzjαz0βz0E63

where D represents any linear transformations and D–1 applied of f transforms it to a sparse vector z.

The whole relations of the likelihood and priors are summarized in below:

{pgf=NgHfvϵIpfz=NfDzvξIpzυz=Nz0Vzpυz=ΠjIGvzjαz0βz0pvϵ=IGvϵαϵ0βϵ0pvξ=IGvξαξzβξzE64

and the corresponding joint posterior of all the unknowns writes:

{pfzυzvϵvξg∝exp−JfzυzvϵvξJfzυzvϵvξ=12vϵg−Hf22+12vξf−Dz22+Vz−12z22+∑jαz0+1lnvzj+βz0/vzj+αϵ0+m/2lnvϵ+βϵ0/vϵ+αξz+n/2lnvξ+βξz/vξE65

Looking at this expression, we see that we have:

Quadratic optimization with respect to f and z;
Direct analytical expressions for the updates of the hyperparameters vϵ and vξ;
Possibility to compute posterior mean and quantify uncertainties analytically via VBA.

A final case we consider is the case of Non-stationary noise and sparsity enforcing prior in the same framework.

{g=Hf+ϵ,ϵnon stationary→pϵivϵi=Nϵi0vϵi,pvϵi=IGvϵiαϵ0βϵ0f=Dz+ζ,zsparse Student→pzjvzj=Nzj0vzj,pvzj=IGvzjαz0βz0E66

Again here, all the expressions of likelihood and priors can be summarized as follows:

{pgf=NgHfVϵpfz=Nf|Dz,vξIpzυz=Nz0Vzpυz=ΠjIGvzjαz0βz0pυϵ=ΠiIGvϵiαϵ0βϵ0pvξ=IGvξαξzβξzE67

and the joint posterior of all the unknowns become:

{pfzυzυϵvξg∝exp−JfzυzυϵvξJfzυzvϵvξ=Vϵ−12g−Hf22+12vξf−Dz22+Vz−12z22+∑jαz0+1lnvzj+βz0/vzj+∑jαϵ0+1lnvϵi+βϵ0/vϵi+αξz+n/2lnvξ+βξz/vξE68

The following scheme shows graphically this case.

8.3 A four level hierarchical model

To account separately for the measurement and forward modeling error, a more detailed and four level hierarchical model has been proposed:

{g=g0+ϵ,measurement errorg0=Hf+ξ,modeling errorf=Dz+ζ,Prior modelE69

which accounts for two terms of errors (variable splitting) and Sparsity enforcing in Transformed domain prior: f=Dz+ζ with z sparse, modeled itself by Normal-IG.

This model is presented graphically here.

In this model, there are three error terms: ϵ the observation error, ζ the forward modeling error and ζ the transform domain modeling error. These are detailed in the following:

g=g0+ϵ,: ϵ is assumed to be Gaussian:

pgg0vϵ=Ngg0vϵI,pvϵ=IGvϵαϵ0βϵ0,

g0=Hf+ξ,ξ is assumed to be Student-t:

{pg0fvξ=Ng0HfVξ,Vξ=diagvξ,pvξ=Πi=1Mpvξi=Πi=1MIGvξiαξzβξz,

f=Dz+ζ,ζ is assumed to be Gaussian:

pfzvζ=NfDzvζI,pvζ=IGvζαζ0βζ0,

z is assumed to be sparse and thus modeled via Normal-IG:

{pzvz=Nz0Vz,Vz=diagvzpvz=Πj=1Npvzj=Πj=1NIGvzjαz0βz0

which results in:

pfg0zvϵvξvzg∝exp−Jfg0zvϵvξvzE70

with

Jfg0zvϵvξvz=12vϵg−g022+αϵ0+1lnvϵ+βϵ0vϵ+12Vξ−1/2g0−Hf22+∑i=1Mαξz+1lnvξi+βξzvξi+12vζf−Dz22+αζ0+1lnvξ+βζ0vξ+12Vz−1/2z22+∑j=1Nαz0+1lnvzj+βz0vzjE71

Using then the JMAP approach with an alternate optimization strategy needs the following optimization steps:

with respect to f: Jf=12Vξ−1/2g0−Hf22+12vζf−Dz22
with respect to g₀: Jg0=12vϵg−g022+12Vξ−1/2g0−Hf22
with respect to z: Jz=12vζf−Dz22+12Vz−1/2z22
with respect to vϵ:Jvϵ=12vϵg−g022+αϵ0+1lnvϵ+βϵ0vϵ
with respect to vξi:Jvξi=12Vξ−1/2g0−Hf22+∑i=1Mαξz+1lnvξi+βξzvξi
with respect to vζ:Jvζ=12vζf−Dz22+αζ0+1lnvξ+βζ0vξ
to vzj:Jvzj=12Vz−1/2z22+∑j=1Nαz0+1lnvzj+βz0vzj

This approach has the following main advantages and limitations.

Advantages:

All the optimization are either quadratic or explicit
Quadratic optimizations can be done efficiently
For great dimensional problems, the needed operators H, H′, D and D′ can be implemented on GPU
For Computed Tomography, efficient GPU implementation of these operators have been done in our group for 2D and 3D CT.

Limitations:

Huge amount of memory is needed for f, g₀, vξ and v_z
No easy way to study the global convergence of the algorithm.
The number of hyper-hyperparameters αϵ0βϵ0,αξzβξz,αζ0βζ0,αz0βz0 to be fixed is important. However, the results are not so sensitive to these parameters.

8.4 Gauss-Markov-Potts models

To introduce the Gauss-Markov-Potts model, let us have a look at the images in Figure 11.

Figure 11.
Different images with different characteristics in different imaging systems.

The question we want to answer is: which prior model can be more appropriate for these images? One way, to answer to this question is either look at the histogram of the pixels or the pixels of the gradient images. Then, if we take one typical line of the gradient or one typical line of the image itself and draw them as a 1D-signal, we obtain the cases in Figure 12.

Figure 12.
Different possible prior modeling in relation to the different images of Figure 11.

From these two figures, we see that for some cases, a Gaussian or generalized Gaussian may be very good models. But, for other cases, if we want to explicitly account for the presence of the contours, we can introduce a binary hidden variable to represent it. Finally, for the last example of the image in Figure 10 and its corresponding typical line in Figure 11, we need to introduce a hidden variable z which encodes the following fact that:

In NDT applications of CT, the objects are, in general, composed of a finite number of materials, and the voxels corresponding to each material are grouped in compact regions.

How to model this prior information?

To answer to this question, first consider such an image f (r) with its segmentation z(r) and contours q(r) as shown in Figure 13.

Figure 13.
An image of an object composed of homogeneous compact regions, its segmentation and the contours of thoses regions.

As it can be seen, we introduced two hidden variables z(r) and q(r), the first representing the segmentation and the second the contours of the image. z(r) takes the integer values k=1⋯K, each presented by a different color and q(r) a binary value {0, 1}. The second can easily be obtained from the first. So, from now, we consider only z(r).

As each value of z represents a homogeneous material, we can translate this by:

pfrzr=kmkvk=NmkvkE72

encoding the fact that inside each homogeneous material, i.e.; all the pixels having zr=k, represent a homogenous material characterized by the two parameters f (m_k, v_k). This results to:

pfr=∑kPzr=kNmkvkMixture of GaussiansE73

which shows the mixture of Gaussian model of the pixel values. See also Figure 14.

Figure 14.
A metalic object with a default area inside it: Black pixels represent air, white pixels metal and gray pixels the defaults area. On left image these are codes by colors (z=1 represents air, z=2 represents metal and z=3 represents default area.

The next step is to propose a probability distribution for z. As we want a compactness of the regions, a Markov modeling is appropriate:

pzrzr′r′∈Vr∝exp−γ∑r′∈Vrδzr−zr′E74

A Potts Markov model is still more appropriate:

pzrr∈Ω∝exp−γ∑r∈Ωr′∑∈Vrδzr−zr′E75

where Ω represents all pixels of the image.

Thus, to each pixel of the image is associated 2 variables f (r) and z(r) with the following possible properties:

f∣z Gaussian iid, z iid: Mixture of Gaussians
f∣z Gauss-Markov, z iid: Mixture of Gauss-Markov
f∣z Gaussian iid, z Potts-Markov: Mixture of Independent Gaussians, (MIG with Hidden Potts)
f∣z Markov, z Potts-Markov: Mixture of Gauss-Markov, (MGM with hidden Potts)

From these four different cases, we consider two which are illustrated in Figure 15.

Figure 15.
Two proposed gauss-Markov-Potts models used in many NDT applications.

Using the notations on this figure, and noting by f all the pixels of the image, by z all the pixels of the segmented image, and by θ all the parameters vϵαkmkvkk=1⋯K, we can write:

pfzθg∝pgfvϵpfzmvpzγαpθE76

where

m=mk,k=1,⋅,K,v=vk,k=1,⋅,K,α=αk,k=1,⋅,K,θ=vϵ,m,v,alphab

The expressions of pgfvϵ,pfzmυ and pzγα have been given before. We need to define pθ which can be chosen as the conjugate priors: Dirichlet for α, Gaussian for m and Inverse-Gamma for all the variances.

Direct computation and use of pfzθgM is too complex, because we do not have analytical expression for the proportionality term of the joint probability law:

pfzθg∝pgfzθpfzθpzpθE77

As we have three sets of variables f, z and θ, we can use different schemes, for example a Gibbs sampling scheme:

f̂∼pfẑθ̂g→ẑ∼pzf̂θ̂g→θ̂∼θf̂ẑgE78

with:

Sample f from pfẑθ̂g∝pgfθpfẑθ̂

Needs optimisation of a quadratic criterion.

Sample z from pzf̂θ̂g∝pgf̂ẑθ̂pz

Needs sampling of a Potts Markov field.

Sample θ from pθf̂ẑg∝pgf̂σϵ2Ipf̂ẑmkvkpθ

More details and other schemes such as JMAP and VBA can be found in Refs. [26].

To illustrate an example of application, we considered a NDT application, where a metalic object is tested to detect a default inside it. As, the problem was, not only to detect the default, but also to characterize its shape and size, an X-ray computed tomography (CT) with only two projections is proposed and used. This problem is illustrated in Figure 16.

Figure 16.
A non destructive testing (NDT) application where f (x, y) has to be reconstructed from its marginals g₁(x) and g₂(y).

The mathematical part of this very ill-posed inverse problem is the following:

Given the functions g₁(x) and g₂(y) find the image f (x, y).

This problem also arise in probability theory and statistics, where f (x, y) is a joint distribution and g₁(x) and g₂(y) its two marginals. We know that this problem has infinite number of solutions: fxy=g1xg2yΩxy where Ωxy is called a Copula:

∫Ωxydx=1and∫Ωxydy=1

So, any arbitrary copula function defines a solution. The problem is ill-posed and we need to use any possible prior information to try to obtain a unique or acceptable solution. The probabilistic solution we proposed is illustrated in Figure 17.

Figure 17.
Probabilistic Bayesian method for the NDT image resonstruction problem.

Unsupervised Bayesian estimation:

pfzθg∝pgfzθpfzθpθ

A summary of the results is given in Figure 18 where the proposed method result.

Figure 18.
Probabilistic Bayesian method for the NDT image resonstruction problem. a) Shows the original image f, b) is the result of Back-projection, c) is the result of filtered Back-projection, d) and e) are the result of a Markov model with hidden line process, and f), g) and h) show the results of the gauss-Markov-Potts method.

9. Bayesian computation

As we could see, very often, we can find the expression of the posterior law pfg, sometimes exactly as is the case of the linear models with Gaussian priors in the previous section, but often up to the normalization constant (the evidence term) p(g) in:

pfg=1pgpgfpf=1pgpgf.E79

This term is not necessary for Maximum A Posteriori (MAP) but it is needed for Expected A Posteriori (EAP) and for doing any other expectation computation.

This is the case, almost in all Non-Gaussian prior models or Non-Gaussian noise models or the Non-Linear forward models. In this chapter, a few cases are considered more in detail. Even in the Gaussian and linear case which is the simplest case, and we have analytical expressions for almost everything, the computational cost for large scale problems brings us to search for approximate but fast solutions.

9.1 Large scale linear and Gaussian models

As we could see in previous chapter, the linear forward model g=Hf+ϵ with Gaussian noise and Gaussian prior is the simplest case where we can do all the computations analytically.

{pgf=NgHfvϵIpf=Nff0vfI→{pg=NgHf0vfHH′+vϵI,pfg=Nff̂Σ̂with:f̂=f0+H′H+λI−1H′g−Hf0Σ̂=vϵH′H+λI−1λ=vϵvfE80

The trick here is, for example, for computing f̂ to use the fact that

{pgf∝exp−12vϵg−Hf22pf∝exp−12vff22pfg∝exp−12vϵJfwithJf=12g−Hf2+λf22,λ=vϵvfE81

and, as the mean and the mode of a Gaussian probability law are the same, we can use:

f̂=argmaxfJfwithJf=g−Hf2+λf2E82

and so the problem can be cast as an optimization problem of a quadratic criterion for which there are a great number of algorithms. Let here to show the simplest one which is the gradient based and so needs the expression of the gradient:

∇Jf=−2H′g−Hf+2λfE83

which can be summarized as follows:

{f0=0fk+1=fk+αH′g−Hfk+2λfkE84

As we can see, at each iteration, we need to be able to compute the forward operationH f and the backward operationH′δg where δg=g–Hf. This optimization algorithm needs to write two programs:

Forward operation H f
Adjoint operation H′δg

These two operations can be implemented using High Performance parallel processors such as Graphical Processor Units (GPU).

The computation of the posterior covariance is much more difficult. There are a few methods: The first category is the methods which use the particular structure of the matrix H and H′H or HH′ as we can use the matrix inversion lemma and see that

Σ̂=vϵH′H+λI−1=vϵI−H′HH′+λ−1I−1E85

For example, in a signal deconvolution problem, the matrix H has a Toeplitz structure and so have the matrices H′H and HH′ which can be approximated by Circulant matrices and be diagonalized using the Fourier Transform.

The second, more general, is to approximate Σ̂ by a diagonal matrix, which can also be interpreted as to approximate the posterior law pfg by a separable qf=Πjqfj. This brings us naturally to the Approximate Bayesian Computation (ABC). But, before going to the details of ABC methods, let consider the case where the hyperparameters of the problem (parameters of the prior laws) are also unknown.

To be able to do the computation, we need mainly to compute the determinant of the matrix vfHH′+vϵI for p(g) and the inverse of the matrices H′H+λI or HH′+λ−1I.

9.2 Large scale computation of the posterior covariance

Computing the determinant of the matrix vfHH′+vϵI for p(g) and the inverse of the matrices H′H+λI or, HH′+λ−1I which are needed for uncertainty quantification, are between the greatest subjects of open research for Big Data problems. Here, we consider a few cases.

9.2.1 Structured matrices

One solution is to use the particular structure of these matrices when possible. This is the case for deconvolution or image restoration, where these matrices have Toeplitz or Bloc-Toeplitz structures which can be well approximated by Circulant or Bloc-Circulant matrices and diagonalized using Fourier Transform (FT) and Fast FT (FFT). The main idea here is using the properties of the circulant matrices: If H is a circulant matrix, then

H=FΛF′E86

where F is the DFT or FFT matrix and F′ the IDFT or IFFT and Λ is a diagonal matrix whose elements are the FT of the first line of the circulent matrix. As the first line of that circulent matrix contains the samples of the impulse response, the vector of the diagonal elements represents the spectrum of the impulse response (transfer function). Using this property, we have:

H′H+λI−1=F′ΛFF′Λ+λ−1=F′Λ2F+λI−1=FΛ2+λI−1F′E87

9.2.2 Sampling based methods

Second solution is generating samples from the posterior law and use them to compute the variances and covariances. So, the problem is how to generate a sample from the posterior law

{pfg=Nff̂Σ̂with:f̂=f0+H′H+λI−1H′g−Hf0Σ̂=vϵH′H+λI−1,λ=vϵvfE88

One solution is to compute the Cholesky decomposition of the covariance matrix Σ̂=AA′, generate a vector, u∼Nu0I and then generate a sample f=Au+f̂ [27]. We can compute f̂ by optimizing

Jf=12g−Hf2+λf−f022,λ=vϵvf,E89

but the main computational cost is the Cholesky factorization.

Another approach, called Perturbation-Optimization [28, 29] is based on the following property:

If we note x=f+H′H+λI−1H′g−Hf and look for its expected and covariance matrix, it can be shown that:

{Ex=f̂Covx=Σ̂E90

So, to generate a sample from the posterior law, we can do the following:

Generate two random vectors ϵf∼Nϵf0vfI and ϵg∼Nϵg0vϵI;
Define g∼=g+ϵg and f∼=f+ϵf and optimize

Jf∼=12g∼−Hf2+λf∼−f022E91

The obtained solution fn=argminf∼Jf∼ is a sample from the desired posterior law.

By repeating this process for a great number of times, we can use them to obtain good approximations for the posterior mean f̂ and the posterior covariance Σ̂ by computing their empirical mean values. We need however fast and accurate optimization algorithms.

10. References to examples of applications

The above mentioned methods have been used with success in different applications:

Medical imaging and Computed tomography (CT) [30, 31, 32, 33, 34, 35, 36].
Diffraction tomography and Microwave imaging [37, 38, 39, 40, 41, 42].
3D Computed Tomography [18, 22, 43]
Acoustical imaging [44, 45, 46, 47, 48, 49]
Hyperspectral imaging [50]
Spectrometry [51]
Eddy current tomography [52]
Non destructive testing applications [53]
Emission Tomography [54]
SAR imaging [55]
Chronobiological time series [56]

11. Conclusions

Mainly, in this chapter, first we described inverse problems and gave a few classical examples such as deconvolution, image restoration, computed tomography X-ray image reconstruction, Fourier synthesis inversion problem which arise in many imaging systems. Then, we mentioned that there are two classes of methods for inverse problems: deterministic regularization and Bayesian inference methods. Then, we started by describing the Bayesian parameter estimation. The main parts of the chapter is focused on Bayesian inference for inverse problems. We saw that the main difficulty is the great dimension of unknown quantities and the appropriate choice of the prior law. For this, first we described many simple and hierarchical prior models which are used in real applications. For the second main difficulty, which is the computational aspects, we described different approximate Bayesian computations (ABC) and in particular the variational Bayesian approximation (VBA) methods and showed how to use these methods, for example for hyperparameter estimation or for large scale inverse problems.

References

1. Idier J. Approche bay´esienne pour les probl`emes inverses. Herm`es Science Publications; 2001
2. Mohammad-Djafari A. Inverse Problems in Vision and 3D Tomography. ISTE-WILEY; 2010
3. Mohammad-Djafari A. Efficient scalable variational bayesian approximation methods for inverse problems. In: SIAM Uncertainty Quantification UQ16. EPFL; April 2016
4. Idier J. Bayesian Approach to Inverse Problems. John Wiley & Sons; 2008
5. Mohammad-Djafari A. Probl`emes inverses en imagerie et en vision en deux volumes ins´eparables. In: Trait´e Signal et Image, IC2. ISTE-WILEY; 2009
6. Carasso AS. Direct blind deconvolution. SIAM Journal on Applied Mathematics. 2001;61(6):1980-2007
7. Chan T, Wong C-K. Total variation blind deconvolution. IEEE Transactions on Image Processing. 1998;7(3):370-375
8. Kak AC, Slaney M. Principles of Computerized Tomographic Imaging. SIAM; 2001
9. Jackson JI, Meyer CH, Nishimura DG, Macovski A. Selection of a convolution function for Fourier inversion using gridding [computerized tomography application]. IEEE Transactions on Medical Imaging. 1991;10(3):473-478
10. Chapdelaine C, Mohammad-Djafari A, Gac N, Parra E. A 3D Bayesian computed tomography reconstruction algorithm with Gauss-Markov-Potts prior model and its application to real data. Fundamenta Informaticae. [submitted]
11. Osher S, Burger M, Goldfarb D, Xu J, Yin W. An iterative regularization method for total variation-based image restoration. Multiscale Modeling & Simulation. 2005;4(2):460-489
12. Wang Y, Yang J, Yin W, Zhang Y. A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences. 2008;1(3):248-272
13. Goldstein T, Osher S. The split Bregman method for L1-regularized problems. SIAM Journal on Imaging Sciences. 2009;2(2):323-343
14. Bertocchi C, Chouzenoux E, Corbineau M-C, Pesquet J-C, Prato M. Deep unfolding of a proximal interior point method for image restoration. Inverse Problems. 2019;36
15. Mohammad-Djafari A. Gauss-Markov-Potts priors for images in computer tomography resulting to joint optimal reconstruction and segmentation. International Journal of Tomography and Statistics (IJTS). 2008;11:76-92
16. Ayasso H, Mohammad-Djafari A. Joint NDT image restoration and segmentation using Gauss–Markov–Potts prior models and variational Bayesian computation. IEEE Transactions on Image Processing. 2010;19(9):2265-2277
17. Feron O, Duchene B, Mohammad-Djafari A. Microwave imaging of inhomogeneous objects made of a finite number of dielectric and conductive materials from experimental data. Inverse Problems. 2005;21(6):S95
18. Wang L, Gac N, Mohammad-Djafari A. Bayesian 3D X-ray computed tomography image reconstruction with a scaled Gaussian mixture prior model. AIP Conference Proceedings. 2015;1641:556-563
19. Chapdelaine C, Mohammad-Djafari A, Gac N, Parra E. A joint segmentation and reconstruction algorithm for 3D Bayesian computed tomography using Gauss-Markov-Potts prior model. In: The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017
20. Chapdelaine C, Gac N, Mohammad-Djafari A, Parra E. New GPU implementation of separable footprint projector and backprojector : First results. In: The 5th International Conference on Image Formation in X-Ray Computed Tomography. 2018
21. Chapdelaine C. Variational Bayesian approach and Gauss-Markov-Potts prior model. arXiv:1808.09552. 2018
22. Wang L, Mohammad-Djafari A, Gac N. X-ray computed tomography using a sparsity enforcing prior model based on haar transformation in a Bayesian framework. Fundamenta Informaticae. 2017;155(4):449-480
23. Bioucas-Dias JM, Figueiredo MAT. An iterative algorithm for linear inverse problems with compound regularizers. In: 15th IEEE International Conference on Image Processing, 2008 (ICIP 2008). IEEE; 2008. pp. 685-688
24. Florea MI, Vorobyov SA. A robust fista-like algorithm. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2017. pp. 4521-4525
25. Yu S, Wu Z, Xu X, Wohlberg B, Kamilov US. Scalable plug-and-play ADMM with convergence guarantees. IEEE Transactions on Computational Imaging. 2021;7:849-863
26. Chapdelaine C. Reconstruction 3D par rayons X pour le Contrˆole Non Destructif de pi`eces a´eronautique [Th`ese]. Orsay, France: Universit´e de Paris–Sud; 2019
27. Gilavert C, Moussaoui S, Idier J. Efficient Gaussian sampling for solving large-scale inverse problems using MCMC. IEEE Transactions on Signal Processing. 2015;63(1):70-80
28. Giovannelli J-F. Estimation of the Ising field parameter thanks to the exact partition function. In: ICIP. 2010. pp. 1441-1444
29. Orieux F, Feron O, Giovannelli J-F. Sampling high dimensional Gaussian distributions for general linear inverse problems. IEEE Signal Processing Letters. 2012;19(5):251-254
30. Mohammad-Djafari A, Demoment G. Maximum entropy image reconstruction in X-ray and diffraction tomography. IEEE Transactions on Medical Imaging. 1988;7(4):345-354
31. Soussen C, Mohammad-Djafari A. Polygonal and polyhedral contour reconstruction in computed tomography. IEEE Transactions on Image Processing. 2004;13(11):1507-1523
32. Wang L, Mohammad-Djafari A, Gac N. Bayesian method with sparsity enforcing prior of dual-tree complex wavelet transform coefficients for X-ray CT image reconstruction. In: 2017 25th European Signal Processing Conference (EUSIPCO). 2017. pp. 478-482
33. Mohammad-Djafari A. Hierarchical markov modeling for fusion of X ray radiographic data and anatomical data in computed tomography. In: Proceedings IEEE International Symposium on Biomedical Imaging. July 2002. pp. 401-404
34. Mohammad-Djafari A, Sauer K, Khagu Y, Cano E. Reconstruction of the shape of a compact object from few projections. In: Proceedings of International Conference on Image Processing. Vol. 1. Oct 1997. pp. 165-168
35. Wang L, Mohammad-Djafari A, Gac N. X-ray computed tomography simultaneous image reconstruction and contour detection using a hierarchical markovian model. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). March 2017. pp. 6070-6074
36. Wang L, Mohammad-Djafari A, Gac N, Dumitru M. Computed tomography reconstruction based on a hierarchical model and variational Bayesian method. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; March 2016. pp. 883-887
37. Carfaatan H, Mohammad-Djafari A, Idier J. A single site update algorithm for nonlinear diffraction tomography. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 4. Apr 1997. pp. 2837-2840
38. Nguyen MK, Mohammad-Djafari A. Bayesian approach with the maximum entropy principle in image reconstruction from microwave scattered field data. IEEE Transactions on Medical Imaging. 1994;13(2):254-262
39. Gharsalli L, Duchˆene B, Mohammad-Djafari A, Ayasso H. Microwave tomography for breast cancer detection within a variational Bayesian approach. In: 21st European Signal Processing Conference (EUSIPCO 2013). Sept 2013. pp. 1-5
40. Gharsalli L, Duchˆene B, Mohammad-Djafari A, Ayasso H. A gauss markov mixture prior model for a variational bayesian approach to microwave breast imaging. In: 2014 IEEE Conference on Antenna Measurements Applications (CAMA). Nov 2014. pp. 1-4
41. Gharsalli L, Duchˆene B, Mohammad-Djafari A, Ayasso H. A gradient-like variational Bayesian approach: Application to microwave imaging for breast tumor detection. In: 2014 IEEE International Conference on Image Processing (ICIP). Oct 2014. pp. 1708-1712
42. Ayasso H, Duchˆene B, Mohammad-Djafari A. A variational Bayesian approach for frequency diverse non-linear microwave imaging. In: 2012 19th IEEE International Conference on Image Processing. Sept 2012. pp. 2069-2072
43. Gac N, Vabre A, Mohammad-Djafari A, Rabanal A, Buyens F. GPU implementation of a 3D Bayesian CT algorithm and its application on real foam reconstruction. In: The First International Conference on Image Formation in X-Ray Computed Tomography. 2010. pp. 151-155
44. Chu N, Mohammad-Djafari A, Picheral J. A Bayesian sparse inference approach in near-field wideband aeroacoustic imaging. In: 2012 19th IEEE International Conference on Image Processing. Sept 2012. pp. 2529-2532
45. Chu N, Mohammad-Djafari A, Gac N, Picheral J. A variational Bayesian approximation approach via a sparsity enforcing prior in acoustic imaging. In: 2014 13th Workshop on Information Optics (WIO). July 2014. pp. 1-4
46. Chu N, Picheral J, Mohammad-Djafari A. A robust super-resolution approach with sparsity constraint for near-field wideband acoustic imaging. In: 2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). Dec 2011. pp. 310-315
47. Chu N, Picheral J, Mohammad-Djafari A, Gac N. A robust super-resolution approach with sparsity constraint in acoustic imaging. Applied Acoustics. 2014;76(1):197-208
48. Chu N, Zhao H, Yu L, Huang Q, Ning Y. Fast and high-resolution acoustic beamforming: A convolution accelerated deconvolution implementation. IEEE Transactions on Instrumentation and Measurement. 2020;99:1
49. Chu N, Mohammad-Djafari A, Picheral J. Robust Bayesian superresolution approach via sparsity enforcing a priori for near-field aeroacoustic source imaging. Journal of Sound & Vibration. 2013;332(18):4369-4389
50. Bali N, Mohammad-Djafari A. Bayesian approach with hidden markov modeling and mean field approximation for hyperspectral data analysis. IEEE Transactions on Image Processing. 2008;17(2):217-225
51. Perenon R, Sage E, Mohammad-Djafari A, Duraffourg L, Hentz S, Brenac A, et al. Bayesian inversion of multi-mode NEMS mass spectrometry signal. In: 21st European Signal Processing Conference (EUSIPCO 2013). Sept 2013. pp. 1-5
52. Premel D, Mohammad-Djafari A. Eddy current tomography in cylindrical geometry. IEEE Transactions on Magnetics. 1995;31(3):2000-2003
53. Mohammad-Djafari A, Robillard L. Hierarchical markovian models for 3D computed tomography in non destructive testing applications. In: 2006 14th European Signal Processing Conference. Sept 2006. pp. 1-5
54. Fall MD, Barat E, Comtat C, Dautremer T, Montagu T, Mohammad-Djafari A. A discrete-continuous bayesian model for emission tomography. In: 2011 18th IEEE International Conference on Image Processing. Sept 2011. pp. 1373-1376
55. Zhu S, You P, Wang H, Li X, Mohammad-Djafari A. Recognition-oriented Bayesian sar imaging. In: 2011 3rd International Asia-Pacific Conference on Synthetic Aperture Radar (APSAR). Sept 2011. pp. 1-4
56. Dumitru M, Mohammad-Djafari A. Periodic components estimation in chronobiological time series via a bayesian approach. In: 2015 23rd European Signal Processing Conference (EUSIPCO). Aug 2015. pp. 2246-2250

[1] 1. Idier J. Approche bay´esienne pour les probl`emes inverses. Herm`es Science Publications; 2001

[2] 2. Mohammad-Djafari A. Inverse Problems in Vision and 3D Tomography. ISTE-WILEY; 2010

[3] 3. Mohammad-Djafari A. Efficient scalable variational bayesian approximation methods for inverse problems. In: SIAM Uncertainty Quantification UQ16. EPFL; April 2016

[4] 4. Idier J. Bayesian Approach to Inverse Problems. John Wiley & Sons; 2008

[5] 5. Mohammad-Djafari A. Probl`emes inverses en imagerie et en vision en deux volumes ins´eparables. In: Trait´e Signal et Image, IC2. ISTE-WILEY; 2009

[6] 6. Carasso AS. Direct blind deconvolution. SIAM Journal on Applied Mathematics. 2001;61(6):1980-2007

[7] 7. Chan T, Wong C-K. Total variation blind deconvolution. IEEE Transactions on Image Processing. 1998;7(3):370-375

[8] 8. Kak AC, Slaney M. Principles of Computerized Tomographic Imaging. SIAM; 2001

[9] 9. Jackson JI, Meyer CH, Nishimura DG, Macovski A. Selection of a convolution function for Fourier inversion using gridding [computerized tomography application]. IEEE Transactions on Medical Imaging. 1991;10(3):473-478

[10] 10. Chapdelaine C, Mohammad-Djafari A, Gac N, Parra E. A 3D Bayesian computed tomography reconstruction algorithm with Gauss-Markov-Potts prior model and its application to real data. Fundamenta Informaticae. [submitted]

[11] 11. Osher S, Burger M, Goldfarb D, Xu J, Yin W. An iterative regularization method for total variation-based image restoration. Multiscale Modeling & Simulation. 2005;4(2):460-489

[12] 12. Wang Y, Yang J, Yin W, Zhang Y. A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences. 2008;1(3):248-272

[13] 13. Goldstein T, Osher S. The split Bregman method for L1-regularized problems. SIAM Journal on Imaging Sciences. 2009;2(2):323-343

[14] 14. Bertocchi C, Chouzenoux E, Corbineau M-C, Pesquet J-C, Prato M. Deep unfolding of a proximal interior point method for image restoration. Inverse Problems. 2019;36

[15] 15. Mohammad-Djafari A. Gauss-Markov-Potts priors for images in computer tomography resulting to joint optimal reconstruction and segmentation. International Journal of Tomography and Statistics (IJTS). 2008;11:76-92

[16] 16. Ayasso H, Mohammad-Djafari A. Joint NDT image restoration and segmentation using Gauss–Markov–Potts prior models and variational Bayesian computation. IEEE Transactions on Image Processing. 2010;19(9):2265-2277

[17] 17. Feron O, Duchene B, Mohammad-Djafari A. Microwave imaging of inhomogeneous objects made of a finite number of dielectric and conductive materials from experimental data. Inverse Problems. 2005;21(6):S95

[18] 18. Wang L, Gac N, Mohammad-Djafari A. Bayesian 3D X-ray computed tomography image reconstruction with a scaled Gaussian mixture prior model. AIP Conference Proceedings. 2015;1641:556-563

[19] 19. Chapdelaine C, Mohammad-Djafari A, Gac N, Parra E. A joint segmentation and reconstruction algorithm for 3D Bayesian computed tomography using Gauss-Markov-Potts prior model. In: The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017

[20] 20. Chapdelaine C, Gac N, Mohammad-Djafari A, Parra E. New GPU implementation of separable footprint projector and backprojector : First results. In: The 5th International Conference on Image Formation in X-Ray Computed Tomography. 2018

[21] 21. Chapdelaine C. Variational Bayesian approach and Gauss-Markov-Potts prior model. arXiv:1808.09552. 2018

[22] 22. Wang L, Mohammad-Djafari A, Gac N. X-ray computed tomography using a sparsity enforcing prior model based on haar transformation in a Bayesian framework. Fundamenta Informaticae. 2017;155(4):449-480

[23] 23. Bioucas-Dias JM, Figueiredo MAT. An iterative algorithm for linear inverse problems with compound regularizers. In: 15th IEEE International Conference on Image Processing, 2008 (ICIP 2008). IEEE; 2008. pp. 685-688

[24] 24. Florea MI, Vorobyov SA. A robust fista-like algorithm. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2017. pp. 4521-4525

[25] 25. Yu S, Wu Z, Xu X, Wohlberg B, Kamilov US. Scalable plug-and-play ADMM with convergence guarantees. IEEE Transactions on Computational Imaging. 2021;7:849-863

[26] 26. Chapdelaine C. Reconstruction 3D par rayons X pour le Contrˆole Non Destructif de pi`eces a´eronautique [Th`ese]. Orsay, France: Universit´e de Paris–Sud; 2019

[27] 27. Gilavert C, Moussaoui S, Idier J. Efficient Gaussian sampling for solving large-scale inverse problems using MCMC. IEEE Transactions on Signal Processing. 2015;63(1):70-80

[28] 28. Giovannelli J-F. Estimation of the Ising field parameter thanks to the exact partition function. In: ICIP. 2010. pp. 1441-1444

[29] 29. Orieux F, Feron O, Giovannelli J-F. Sampling high dimensional Gaussian distributions for general linear inverse problems. IEEE Signal Processing Letters. 2012;19(5):251-254

[30] 30. Mohammad-Djafari A, Demoment G. Maximum entropy image reconstruction in X-ray and diffraction tomography. IEEE Transactions on Medical Imaging. 1988;7(4):345-354

[31] 31. Soussen C, Mohammad-Djafari A. Polygonal and polyhedral contour reconstruction in computed tomography. IEEE Transactions on Image Processing. 2004;13(11):1507-1523

[32] 32. Wang L, Mohammad-Djafari A, Gac N. Bayesian method with sparsity enforcing prior of dual-tree complex wavelet transform coefficients for X-ray CT image reconstruction. In: 2017 25th European Signal Processing Conference (EUSIPCO). 2017. pp. 478-482

[33] 33. Mohammad-Djafari A. Hierarchical markov modeling for fusion of X ray radiographic data and anatomical data in computed tomography. In: Proceedings IEEE International Symposium on Biomedical Imaging. July 2002. pp. 401-404

[34] 34. Mohammad-Djafari A, Sauer K, Khagu Y, Cano E. Reconstruction of the shape of a compact object from few projections. In: Proceedings of International Conference on Image Processing. Vol. 1. Oct 1997. pp. 165-168

[35] 35. Wang L, Mohammad-Djafari A, Gac N. X-ray computed tomography simultaneous image reconstruction and contour detection using a hierarchical markovian model. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). March 2017. pp. 6070-6074

[36] 36. Wang L, Mohammad-Djafari A, Gac N, Dumitru M. Computed tomography reconstruction based on a hierarchical model and variational Bayesian method. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; March 2016. pp. 883-887

[37] 37. Carfaatan H, Mohammad-Djafari A, Idier J. A single site update algorithm for nonlinear diffraction tomography. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 4. Apr 1997. pp. 2837-2840

[38] 38. Nguyen MK, Mohammad-Djafari A. Bayesian approach with the maximum entropy principle in image reconstruction from microwave scattered field data. IEEE Transactions on Medical Imaging. 1994;13(2):254-262

[39] 39. Gharsalli L, Duchˆene B, Mohammad-Djafari A, Ayasso H. Microwave tomography for breast cancer detection within a variational Bayesian approach. In: 21st European Signal Processing Conference (EUSIPCO 2013). Sept 2013. pp. 1-5

[40] 40. Gharsalli L, Duchˆene B, Mohammad-Djafari A, Ayasso H. A gauss markov mixture prior model for a variational bayesian approach to microwave breast imaging. In: 2014 IEEE Conference on Antenna Measurements Applications (CAMA). Nov 2014. pp. 1-4

[41] 41. Gharsalli L, Duchˆene B, Mohammad-Djafari A, Ayasso H. A gradient-like variational Bayesian approach: Application to microwave imaging for breast tumor detection. In: 2014 IEEE International Conference on Image Processing (ICIP). Oct 2014. pp. 1708-1712

[42] 42. Ayasso H, Duchˆene B, Mohammad-Djafari A. A variational Bayesian approach for frequency diverse non-linear microwave imaging. In: 2012 19th IEEE International Conference on Image Processing. Sept 2012. pp. 2069-2072

[43] 43. Gac N, Vabre A, Mohammad-Djafari A, Rabanal A, Buyens F. GPU implementation of a 3D Bayesian CT algorithm and its application on real foam reconstruction. In: The First International Conference on Image Formation in X-Ray Computed Tomography. 2010. pp. 151-155

[44] 44. Chu N, Mohammad-Djafari A, Picheral J. A Bayesian sparse inference approach in near-field wideband aeroacoustic imaging. In: 2012 19th IEEE International Conference on Image Processing. Sept 2012. pp. 2529-2532

[45] 45. Chu N, Mohammad-Djafari A, Gac N, Picheral J. A variational Bayesian approximation approach via a sparsity enforcing prior in acoustic imaging. In: 2014 13th Workshop on Information Optics (WIO). July 2014. pp. 1-4

[46] 46. Chu N, Picheral J, Mohammad-Djafari A. A robust super-resolution approach with sparsity constraint for near-field wideband acoustic imaging. In: 2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). Dec 2011. pp. 310-315

[47] 47. Chu N, Picheral J, Mohammad-Djafari A, Gac N. A robust super-resolution approach with sparsity constraint in acoustic imaging. Applied Acoustics. 2014;76(1):197-208

[48] 48. Chu N, Zhao H, Yu L, Huang Q, Ning Y. Fast and high-resolution acoustic beamforming: A convolution accelerated deconvolution implementation. IEEE Transactions on Instrumentation and Measurement. 2020;99:1

[49] 49. Chu N, Mohammad-Djafari A, Picheral J. Robust Bayesian superresolution approach via sparsity enforcing a priori for near-field aeroacoustic source imaging. Journal of Sound & Vibration. 2013;332(18):4369-4389

[50] 50. Bali N, Mohammad-Djafari A. Bayesian approach with hidden markov modeling and mean field approximation for hyperspectral data analysis. IEEE Transactions on Image Processing. 2008;17(2):217-225

[51] 51. Perenon R, Sage E, Mohammad-Djafari A, Duraffourg L, Hentz S, Brenac A, et al. Bayesian inversion of multi-mode NEMS mass spectrometry signal. In: 21st European Signal Processing Conference (EUSIPCO 2013). Sept 2013. pp. 1-5

[52] 52. Premel D, Mohammad-Djafari A. Eddy current tomography in cylindrical geometry. IEEE Transactions on Magnetics. 1995;31(3):2000-2003

[53] 53. Mohammad-Djafari A, Robillard L. Hierarchical markovian models for 3D computed tomography in non destructive testing applications. In: 2006 14th European Signal Processing Conference. Sept 2006. pp. 1-5

[54] 54. Fall MD, Barat E, Comtat C, Dautremer T, Montagu T, Mohammad-Djafari A. A discrete-continuous bayesian model for emission tomography. In: 2011 18th IEEE International Conference on Image Processing. Sept 2011. pp. 1373-1376

[55] 55. Zhu S, You P, Wang H, Li X, Mohammad-Djafari A. Recognition-oriented Bayesian sar imaging. In: 2011 3rd International Asia-Pacific Conference on Synthetic Aperture Radar (APSAR). Sept 2011. pp. 1-4

[56] 56. Dumitru M, Mohammad-Djafari A. Periodic components estimation in chronobiological time series via a bayesian approach. In: 2015 23rd European Signal Processing Conference (EUSIPCO). Aug 2015. pp. 2246-2250

Bayesian Inference for Inverse Problems

Bayesian Inference - Recent Advantages

Abstract

Keywords

Author Information

Ali Mohammad-Djafari*

1. Introduction

2. Inverse problems

Figure 1.

2.1 Examples of linear inverse problems

2.1.1 Deconvolution

Figure 2.

Figure 3.

2.1.2 Image restoration

Figure 4.

2.1.3 Image reconstruction in X ray computed tomography (CT)

2.1.4 Fourier synthesis

Figure 5.

2.1.5 General linear inverse problems

3. Bayesian parameter estimation

3.1 One parameter case

3.2 Multi-parameter case

4. Bayesian inference for inverse problems

Figure 6.

5. Hyperparameter estimation

Figure 7.

5.1 Joint maximum a posteriori (JMAP)

5.2 Marginalization over θ

5.3 Marginalization over f

5.3.1 EM and GEM algorithms

5.4 Variational Bayesian approximation

5.5 Hierarchical priors

Figure 8.

6. Linear forward models and Gaussian case

6.1 Simple supervised case

Figure 9.

6.2 Unsupervised case or hyperparameter estimation

Figure 10.

7. Non-Gaussian priors

8. Hierarchical prior models

8.1 Sparsity awarded hierarchical models

8.2 Scaled mixture models

8.3 A four level hierarchical model

8.4 Gauss-Markov-Potts models

Figure 11.

Figure 12.

Figure 13.

Figure 14.

Figure 15.

Figure 16.

Figure 17.

Figure 18.