Open access peer-reviewed chapter

Bayesian Inference and Compressed Sensing

By Solomon A. Tesfamicael and Faraz Barzideh

Submitted: December 5th 2016Reviewed: July 4th 2017Published: November 2nd 2017

DOI: 10.5772/intechopen.70308

Downloaded: 387

Abstract

This chapter provides the use of Bayesian inference in compressive sensing (CS), a method in signal processing. Among the recovery methods used in CS literature, the convex relaxation methods are reformulated again using the Bayesian framework and this method is applied in different CS applications such as magnetic resonance imaging (MRI), remote sensing, and wireless communication systems, specifically on multiple-input multiple-output (MIMO) systems. The robustness of Bayesian method in incorporating prior information like sparse and structure among the sparse entries is shown in this chapter.

Keywords

  • Bayesian inference
  • compressive sensing
  • sparse priors
  • clustered priors
  • convex relaxation

1. Introduction

In order to estimate parameters in a signal, one can apply wisdoms of the two schools of thoughts in statistics called the classical (also called the frequentist) and the Bayesian. These methods of computing are competitive with each other at times. The definition of probability is where the basic difference arises from. The frequentist define P(A) as a long-run relative frequency with which A occurs in identical repeats of an experiment, whereas Bayesian defines P(A|B) as a real number measure of the probability of a proposition A, given the truth of the information represented by proposition B. So under Bayesian theory, probability is considered as an extension of logic [1, 2]. Probabilities represent the investigator degree of belief—hence it is subjective. But this is not acceptable under classical theory, making it to be not flexible. To add on the differences, under the classical inference, parameters are not random, they are fixed and prior information is absent. But under the Bayesian, parameters are random variables, and prior information is an integral part, and the Bayesian has no excuse for that. Since one is free to invent new estimators or confidence intervals or hypothesis test, adhockery exists and hence frequentist approach lack consistency whereas Bayesian theory is flexible and consistent [19]. Therefore, Bayesian inference is our main focus applied to a special paradigm in signal processing in this chapter.

After presenting the theoretical frameworks, Bayesian theory, CS, and convex relaxation methods in Section 2, the use of Bayesian inference in CS problem by considering two priors modeling the sparsity and clusteredness is shown in Section 3. In Section 4, we present three examples of applications that show the connection of the two theories, Bayesian and compressive sensing. In Section 5, the conclusion is given briefly.

2. Theoretical framework

2.1. Bayesian framework

For two random variables A and B, the product rule gives

PAB=PA|BPB=PB|APAE1

and the famous Bayes’ theorem provides

PB|A=PA|BPBPA.E2

Using the same framework, consider model Mj and a vector of parameter θ. We infer what the model’s parameter θ might be, given the data, D, and a prior information I. Using Bayes’ theorem, the probability of the parameters θ given model Mj, data D, and information I is given by

Pθ|D,Mj,I=PD|θ,Mj,IPθ|Mj,IPD|Mj,I,E3

where P(θ|D, Mj, I), is posterior probability, P(θ|Mj, I) is the non data information about θ, called prior probability distribution function, while P(D|θ, Mj, I) is the density of the data conditional on the parameters of the model, called likelihood. P(D|Mj, I) is called the evidence of model Mj, or the normalizing constant, given by:

PD|Mj,I=θPθ|Mj,IPD|θ,Mj,Idθ.E4

P(θ|D, Mj, I) is the fundamental interest for the first level of inference called model fitting. It is the task of inferring what the model parameters might be given the model and the data. Further, we can do inference on higher level, which is comparing models Mj. In the light of prior information I and data D, a given set of models {M1, M2,⋯, Mn} is most likely to be the correct one. Now focusing on the first level of inference, we can ignore the normalizing constant in (3) since it has no relevance at this level of inference about the parameters θ. Hence we get:

Pθ|D,Mj,IPD|θ,Mj,IPθ|Mj,I.E5

The posterior probability is proportional to the Prior probability times the Likelihood. Eq. (5) is called Updating Rule [1, 3], in which the data allow us to update our prior views about θ, and as a result, we get the posterior which combines both the data and non-data information of θ. As an example for a binomial trial, let us have beta distribution as a prior and as a result we get posterior distribution which is beta distribution. Figure 1 shows that the posterior density is taller and narrower than the prior density. It therefore favors strongly a smaller range of θ values, reflecting the fact that we now have more information. That is why inference based on the posterior distribution is superior to the one only based on the likelihood.

Figure 1.

Figure showing the updating rule: the posterior synthesizes and compromises by favoring values between the maximum of the prior density and likelihood. The prior we had is challenged to shift by the arrival of little amount of data.

Now, we first find the maximum of the posterior distribution called maximum a posteriori (MAP). It defines the most probable value for the parameters denoted θ^MP. MAP is related to Fisher’s methods of maximum likelihood estimation (MLE), θ^ML. If f is the sampling distribution of D, then the likelihood function of D:θfD|θand the maximum likelihood estimation of θ:

θMLD=arg maxθfD|θ.E6

But under Bayesian inference, let g be a prior distribution of θ, then the posterior distribution of θ becomes

θMLfD|θgθfDE7

and the maximum a posteriori estimation of θ:

θ^MP=fD|θgθϑfD|ϑgϑdϑ=arg maxθfD|θgθE8

Inference based on the posterior is not an easy task since it involves multiple integral, which are cumbersome to solve at times. However, it can be computed in several ways: Numerical optimization (like Conjugate gradient method, Newton method,…), modification of an expectation-maximization algorithm and others. As we can see it from (22) and (8), the difference between MLE and MAP is the prior distribution. The latter can be considered as a regularization of the former. Here we can summarize the posterior distribution by the value of the best fit parameters θMP and error bars (confidence intervals) on the best fit parameters. Error bars can be found from the curvatures of the posterior. To proceed further, we replace the random variable D and θ by vectors y and x and we assume prior distributions on x in the next section.

2.2. Compressive sensing

Compressive sensing (CS) is a paradigm to capture information at lower rate than the Nyquist-Shannon sampling rate when signals are sparse in some domain [1013]. CS has recently gained a lot of attention due to its exploitation of signal sparsity. Sparsity, an inherent characteristic of many natural signals, enables the signal to be stored in a few samples and subsequently be recovered accurately.

As a signal processing scheme, CS follows a similar fashion: encoding, transmission/storing, and decoding. Focusing on the encoding and decoding of such a system with noisy measurement, the block diagram is given in Figure 2. At the encoding side, CS combines the sampling and compression stages of a traditional signal processing into one step by measuring few samples that contain maximum information about the signal. This measurement/sampling is done by linear projections using random sensing transformations as shown in the landmark papers by the authors mentioned above. Having said this, let us define the CS problem formally as follows:

Figure 2.

Blockdiagram for CS-based reconstruction.

Definition 1. (The standard CS problem)

Find the k-sparse signal vector x ∈ RN provided the measurement vector y ∈ RM, the measurement matrix A ∈ RM×N and the under-determined set of linear equations as

y=Ax,E9

where k ≪ M ≪ N.

One can ask again two of the questions here, in relation to the standard CS problem. First, how should we design the matrix A to ensure that it preserves the information in the signal x? Second, how can we recover the original signal x from measurements y [14]? To address the first question, the solution for the CS problem presented here is dependent on the design of A. This matrix can be considered as a transformation of the signal from the signal space to the measurement space, Figure 3 [15]. There have been different criteria that matrix A should satisfy to have meaningful reconstruction. One of the main criteria is given in [11]. The authors defined the sufficient condition that matrix A should satisfy for the reconstruction of the signal x. It is called the Restricted Isometric Property (RIP) and it is defined below.

Figure 3.

Transformation from the signal-space to the measurement-space.

Definition 2. (Restricted Isometry Property)

For all x ∈ RN so that ∥ x ∥0 ≤ k, if there exists 0 ≤ δk < 1 such that

1δkx22Ax221+δkx22E10

is satisfied, then A fulfills RIP of order k with radius δk.

An equivalent description of the RIP is to say that all subsets of k columns taken from A are nearly orthogonal (the columns of A cannot be exactly orthogonal since we have more columns than rows) [16]. For example, if a matrix A satisfies the RIP of order 2k, then we can interpret (10) saying that A approximately preserves the distance between any pair of k-sparse vectors. For random matrix A, the following theorem is one of the results in relation to RIP for the noiseless CS problem, provided that the entries of the random matrix A are drawn from some distributions which are given later.

Theorem 1. (Perfect Recovery Condition, Candes and Tao [13])

If A satisfies the RIP of order 2k with radius δ2k, then for any k-sparse signal x sensed by y=Ax, x is with high probability perfectly recovered by the ideal program

x^=argminxx0subjecttoy=AxE11

and it is unique, where ∥ x ∥0 = k≡ # {i ∈ {1, 2, ⋯, N}|xi ≠ 0}.

This means, if A satisfies the RIP of order k with radius δk, then for any k′< k, A satisfies the RIP of order k′ with constant δk<δk[?]. Note that, this theorem is stated for the noiseless CS problem and it is possible to extend it for the noisy CS system. The proof of these theorems is deferred to the literature mentioned, [13], in for the sake of space.

Under conventional sensing paradigm, the dimension of the original signal and the measurement should be at least equal. But in CS, the measurement vector can be far less than the original. While at the decoding side, reconstruction is done using nonlinear schemes. Eventually, the reconstruction is more cumbersome than the encoding which was only projections from a large space to a smaller space. On the other hand, finding a unique solution that satisfies the constraint that the signal itself is sparse or sparse in some domain is complex in nature. Fortunately, there are many algorithms to solve the CS problem, such as iterative methods such as greedy iterative algorithms [17] and iterative thresholding algorithms [18]. This chapter focuses merely on the convex relaxations methods [12, 13]. The regularizing terms in these methods can be reinterpreted as prior information under Bayesian inference. We consider a noisy measurement and apply convex relaxation algorithms for robust reconstruction.

2.3. Convex relaxation methods for CS

Various methods for estimating x may be used. We have the least square (LS) estimator in which no prior information is applied:

x^=ATA1ATy,E12

which performs very badly for the CS estimation problem we are considering. In order to incorporate the methods called convex relaxation, let us define an important concept first.

Definition 3. (Unit Ball)

A unit ball in lp-space of dimension N can be defined as

BpxRN:xp1.E13

Unit balls corresponding to p = 0, p = 1/2, p = 1, p = 2, p = ∞, and N = 2, the balls are shown in Figure 4.

Figure 4.

Different lp-balls in different lp-spaces for N=2, only balls with p≥1 are convex.

The exact solution for the noiseless CS problem is given by

minxx0,suchthaty=Ax.E14

However, minimizing l0-norm is a non-convex optimization problem which is NP-hard [19]. By relaxing the objective function to convexity, it is possible to get good approximation. That is, replacing the l0-norm by the l1-norm, one can find a problem which is tractable. Note that it is also possible to use other lp-norms to relax the condition given by l0. However, keeping our focus on l1-norm, consider the minimization problem instead of (14).

minxx1,suchthaty=AxE15

The solution of the relaxed problem (15) gives the same as that of (14) and this equivalence was provided by Donoho and Huo in [20].

Theorem 2. (l0l1 Equivalence [13])

If A satisfies the RIP of order 2k with radius δ2k<21, then

x^=argminxx1subjecttoy=AxE16

is equivalent to (11) and will find the same unique x^.

Justified by this theorem, (15) is an optimization problem which can be solved in polynomial time and the fact that it gives the exact solution for the problem (14) under some circumstance has been one of the main reasons for the recent developments in CS. There is a simple geometric intuition on why such an approach gives good approximations. Among the lp-norms that can be used in the construction of CS related optimization problems, only those which are convex give rise to a convex optimization problem which is more feasible than the non-convex counter parts, which means lp-norms with only p ≥ 1 satisfy such a condition. On the other hand, lp-norms with p > 1 do not favor sparsity, for example, l2-norm minimization tends to spread reconstruction across all coordinates even if the true solution is sparse. But l1-norm is able to enforce sparsity. The intuition is that l1-minimization solution is most likely to occur at corners or edges, not faces [21, 45]. That is why l1-norm became famous for CS. Further, in CS literature, convex relaxation is presented as either l2-penalized l1-minimization called Basis Pursuit Denoising (BPDN) [22] or l1-penalized l2-minimization called least absolute shrinkage and selection operator (LASSO) [45], which are equivalent and effective in estimating a high-dimensional data.

Usually real world systems are contaminated with noise, w, and in this chapter, the focus is on such problems. The noisy recovery problem becomes a simple extension of (15),

minxx1,suchthatyAx2ϵE17

where ϵ is a bound on ||w||2. The real problem for (17) is stability. Introducing small changes in the observations should result in small changes in the recovery. We can visualize this using the balls shown in Figure 5.

Figure 5.

lp-norm approximations: the constraints for the noise-free CS problem is given by the bold line while the shaded region is for the noisy one.

Both the l0 and l1-norms give exact solutions for the noise-free CS problem while giving a close solution for the noisy problem. However, the l2-norm gives worst approximation in both cases compared to the other lp-norms with p < 2 (see Figure 5). Moreover, (17) is equivalent to an unconstrained quadratic programming problem as

minx 12yAx22+γx1,E18

as it will be shown later as LASSO, where γ is a tuning parameter. The equivalency of (17) and (18) is shown in [23, 24]. In this chapter, the generalized form of the minimization problem in (18) with different lp-norm regularization is considered, that is,

minx 12yAx22+γxp.E19

Further, this chapter provides the use of Bayesian framework in compressive sensing by incorporating two different priors modeling the sparsity and the possible structure among the sparse entries in a signal. Basically, it is the summary of the recent works [2, 2527].

3. Bayesian inference used in CS problem

Under Bayesian inference, consider two random variables x and y with probability density function (pdf) p(x) and p(y), respectively. The product rule gives us p(x,y) = p(x|y)p(y) = p(y|x)p(x) and Bayes’ theorem provides

px|y=py|xpxpy.E20

Further, the maximum a posteriori (MAP), x^MP, is defined as

x^MP=arg maxxpy|xpxx˜p(y|x˜)p(x˜)dx˜=argmaxxpy|xpxE21

MAP is related to Fisher’s methods of maximum likelihood estimation (MLE), x^ML:

x^ML=argmaxxpy|x.E22

As we can see it from (21) and (22), the difference between MAP and MLE is the prior distribution. The former can be considered as a regularized form of the latter. Since we apply Bayesian inference we assume further different prior distributions on x.

3.1. Sparse prior

The estimators of x resulting from (19) for the sparse problem we consider in this chapter, can be presented as a maximum a posteriori (MAP) estimator under the Bayesian framework as in [28]. We show this by defining a prior probability distribution for x on the form

px=eufxxRNeufxdxE23

where the regularizing function f: χR is some scalar-valued, non negative function with χ ⊆ Rwhich can be expanded to a vector argument by

fx=i=1Nfxi,E24

such that for sufficiently large u,

xRNexpufxdxE00

is finite. Furthermore, let the assumed variance of the noise be given by σ2=λu,where λ is the system parameter which can be taken as λ = σ2u.

Since the pdf of the noise w is gaussian, the likelihood function of y given x is given by

py|xy|x=12πσN/2e12σ2yAx22.E25

Together with (20) and (23), this now gives

px|yx|y;A=eu12yAx22+λfx2πσN/2xRNeu12λyAx22+λfxdx.E000

The MAP estimator, (21), becomes

x^MP=argminxRN12yAx22+λfx.E26

Now, as we choose different regularizing function, we get different estimators as listed below [28]:

  1. Linear estimators: when fx=x22(26) reduces to

    x^Linear=ATAAT+λI1y,E27

which is the LMMSE estimator. But we ignore this estimator in our analysis since the results are not sparse. However, the following two estimators are more interesting for CS problems since they enforce sparsity into the vector x.

  • LASSO estimator: when f(x)=||x||1 we get the LASSO estimator and (26) becomes,

    x^LASSO=argminxRN12yAx22+λx1.E28

  • Zero-norm regularization estimator: when fx=∥x0, we get the zero-norm regularization estimator and (26) becomes

    x^ZeroNorm=argminxRN12yAx22+λx0.E29

  • As mentioned earlier, (29) is the best solution for estimation of the sparse vector x, but is NP-complete. The worst approximation for the sparse problem considered is the L2-regularization solution given by (27). However, the best approximation is given by Eq. (28) and its equivalent forms. We have used some of the algorithms in literature in our simulation which are considered as equivalent to this approximations such as Bayesian compressive sensing (BCS) [29] and L1-norm regularized least-squares (L1-LS) [1113].

    3.2. Clustering prior

    The entries of the sparse vector x may have some special structure (clusteredness) among themselves. This can be modeled by modifying the previous prior distribution.[1] - We use another penalizing parameter γ to represent clusteredness in the data. For that we define the clustering using the distance between the entries of the sparse vector x by

    Di=1N|xixi1|,E39

    and we use a regularizing parameter γ. Hence, we define the clustering prior to be

    qx=eγDxxRNeγDxdx.E30

    The new posterior involving this prior under the Bayesian framework is proportional to the product of the three pdfs:

    px|ypy|xpxqx.E31

    By similar argument s as used in 3.1, we arrive at the clustered LASSO estimator

    x^CluLasso=argminxRN12yAx22+λx1+γi=1N|xixi1|.E32

    Here λ, γ are our tuning parameters for the sparsity in x and the way the entries are clustered, respectively.

    4. Bayesian inference in CS applications

    Compressed sensing paradigm has been applied to many signal processing areas [3141]. However, at this time, building the hardware that can translate the CS theory into practical use is very limited. Nonetheless, the demand for cheaper, faster, and efficient devices will motivate the use of CS paradigm in real-time systems in the near future.

    So far, in image processing, one can mention the single-pixel imaging via compressive sampling [31], in magnetic resonance imaging (MRI) for reducing scan time and improved image quality [32], in seismic images [33], and in radar systems for simplifying hardware design and to obtain high resolution [34, 35]. In communication and networks, CS theory has been studied for sparse channel estimation [36], for under water acoustic channels which are inherently sparse [37], spectrum sensing in cognitive radio networks [38], for large wireless sensor networks (WSNs) [39], as a channel coding scheme [40], localization [41] and so on. A good CS application literature review is provided in [21], which basically is the summary of the bulk of literatures given athttp://dsp.rice.edu/cs.

    In this chapter, there are examples of CS theory applications using Bayesian inference in imaging, like magnetic resonance imaging (MRI) and, in communication, i.e., multiple-input multiple-output (MIMO) systems, and in remote sensing. First, let us see the impact of the estimators derived above, LASSO and clustered LASSO, in MRI.

    4.1. Magnetic resonance imaging (MRI)

    MRI images are usually very weak due to the presence of noise and due to the weak nature of the signal itself. Compressed sensing (CS) paradigm can be applied in order to boost such signal recoveries. We applied CS paradigm via Bayesian framework, that is, incorporating the different prior information such as sparsity and the special structure that can be found in such sparse signal recovery method is applied on different MRI images.

    4.1.1. Angiogram image

    Angiogram images are already sparse in the pixel representation. An angiogram image taken from the University Hospital Rechts der Isar, Munich, Germany [42] is used for our analysis. The image we took is sparse and clustered even in the pixel domain. The original signal after vectorization is x of length N = 960. By taking 746 measurements, and maximum number of non-zero elements k = 373, we applied different reconstruction schemes and the results are shown in Figure 6.

    Figure 6.

    Comparison of reconstruction schemes together with performance comparison using mean square error (MSE) in dB: (a) original image x; (b) LMMSE (−35.1988 dB); (c) LASSO (−53.6195 dB); and (d) clustered Lasso (−63.6889 dB).

    4.1.2. Phantom image

    Another MRI image considered is the Shepp-Logan phantom which is not sparse in spatial domain. However, we sparsified it in K-space by zeroing out small coefficients. We then measured the sparsified image and added noise. The original signal after vectorization is x of length N = 200. By taking 94 measurements, that is, y is of length M = 94, and maximum number of non-zero elements k = 47, we applied different reconstruction algorithms used above. The result shows that clustered LASSO does well compared to the others as can be seen in Figure 7.

    Figure 7.

    Comparison of reconstruction schemes together with performance comparison using mean square error (MSE) in dB: (a) original image x; (b) sparcified image; (c) least square (LS) (−21.3304 dB); (d) LMMSE (−27.387 dB); (e) LASSO (−37.9978 dB); and (f) clustered LASSO (−40.0068 dB).

    4.1.3. fMRI image

    Another example to apply the clustered LASSO based image reconstruction using Bayesian framework to medical images is a functional magnetic resonance imaging (fMRI), a non-invasive technique of brain mapping, which is crucial in the study of brain activity. Taking many slices in fMRI data, we saw how these data sets are sparse in the Fourier domain. This is shown in Figure 8. We observed the whole data in this domain for the whole brain image. They all share the characteristics we have based our analysis, i.e., sparsity and clusteredness. Then we took some slices which are consecutive in the slice order and took different N, k, and M=2k, on these slices. We can see the numbers at the top of Figure 9, in which the two numbers represent k and N, respectively.

    Figure 8.

    The five column images represent the real and imaginary parts of the Fourier transform representation of the data set we have chosen to present further, which in general shows that the fMRI image have sparse and clustered representation.

    Figure 9.

    Application of sparse and cluster prior, LASSO and clustered LASSO (CL. LASSO), on a fMRI data analysis for N = 80, k > 50 for σ2 = 0.1 and λ = 0.1, where LMMSE is with L2-regularised one.

    In fMRI, results are compared using image intensity which gives a good ground for a health practitioner to observe and decide in accordance to the available information. The more one have prior knowledge on how the brain regions work in human beings or pets the better priors that one incorporate to analyze the data. So this is an interesting tool for researchers in the future.

    4.2. MIMO systems

    Multiple-input multiple-output (MIMO) systems are integrated in modern wireless communications due to their advantage in improving performance with respect to many performance metrics. One such advantage is the ability to transmit multiple streams using spatial multiplexing, but channel state information (CSI) at the transmitter is needed to get optimal system performance.

    Consider a frequency division duplex (FDD) MIMO system consisting of Nt transmit and Nr receive antennas. Assume that the channel is a flat-fading, temporally correlated channel denoted by a matrix HnCNr×Ntwhere n indicates a channel feedback time index with block fading assumed during the feedback interval. The singular value decomposition (SVD) of H[n] gives

    Hn=UnΣnVHn,E37

    where UCNr×rand VCNt×rare unitary matrices and Σ ∈ Cr×r is a diagonal matrix consisting of r = min(Nt, Nr) singular values. In the presence of perfect channel state information (CSI), a MIMO system model can be given by the equation

    y˜=UHnHnVnx˜+UHnnE33

    where x˜Cr×1is transmitted vector, V[n] is used as precoder at the transmitter, UH[n] is used as decoder at the receiver, nCNr×1denotes a noise vector whose entries are i.i.d. and distributed according to CN01and y˜CNr×1is the received vector.

    Channel adaptive transmission requires knowledge of channel state information at the transmitter. In temporally correlated MIMO channels, the correlation can be utilized to reduce feedback overhead and improve performance. CS methods and rotative quantization are used to compress and feedback the CSI for MIMO systems [43]. This was done as an extension work of [44]. It is shown that the CS-based method reduces feedback overhead while delivering the same performance as the direct quantization scheme, using simulation.

    Three methods are compared in the simulations, perfect CSI, without CS and with CS using matched filter (MF) and minimum mean square error estimator (MMSE) receivers for different total feedback bits B = 10 and B = 5. In Figure 10, sum rates are compared against signal-to-noise-ratio (SNR). Using CS, half of the number of bits can be saved. In Figure 11, where the bit-error-rate is plotted against SNR, the CS method has a better bit error rate performance using same number of bits for the CS and without CS cases. These two figures demonstrate the clear advantage of using CS in feedback of singular vectors in rotative based method. The detail is deferred to [43].

    Figure 10.

    Sum rate vs. SNR for a 2×2 MIMO system with and without CS with two streams. We can observe that the performance of the CS method is almost equal to that of the method without using CS while saving half the number of bits.

    Figure 11.

    Bit error rate vs. SNR using matched filter receiver for a 2×2 MIMO system with one stream.

    4.3. Remote sensing

    Remote sensing satellites provide a repetitive and consistent view of the Earth and they offer a wide range of spatial, spectral, radiometric, and temporal resolutions. Image fusion is applied to extract all the important features from various input images. These images are integrated to form a fused image which is more informative and suitable for human visual perception or computer processing. Sparse representation has been applied to fuse image to improve the quality of fused image [45].

    To improve the quality of the fused image, a remote sensing image fusion method based on sparse representation is proposed in [46]. In these methods, the source images were represented with sparse coefficients first. Then, the larger values of sparse coefficients of panchromatic (Pan) image are set to 0. Thereafter, the coefficients of panchromatic (Pan) and multispectral (MS) image are combined with the linear weighted averaging fusion rule. Finally, the fused image is reconstructed from the combined sparse coefficients and the dictionary. The proposed method is compared with intensity-hue-saturation (IHS), Brovey transform (Brovey), discrete wavelet transform (DWT), principal component analysis (PCA) and fast discrete curvelet transform (FDCT) methods on several pairs of multifocus images. The proposed method using sparse representation outperforms, see Figure 12, better than the usual methods listed here. We believe that our method of clustered compressed sensing can also further improve this result.

    Figure 12.

    Comparison of image fusion methods for remote sensing applications using Brovey, DWT, PCA, FDCT, and the sparse representation methods [46].

    5. Conclusions

    In this chapter, a Bayesian way of analyzing data on CS paradigm is presented. The method assumes prior information like the sparsity and clusteredness of signals in the analysis of the data. Among the different reconstruction methods, the convex relaxation methods are redefined using Bayesian inference. Further, three CS applications are presented: MRI imaging, MIMO systems, and remote sensing. For MRI imaging, the two different priors are incorporated, while for MIMO systems and remote sensing, only the sparse prior is applied in the analysis. We suggest that including the special structure among the sparse elements of the data can be included in the analysis to further improve the results.

    Notes

    • In [30] a hierarchical Bayesian generative model for sparse signals is found in which they have applied full Bayesian analysis by assuming prior distributions to each parameter appearing in the analysis. We follow a different approach.

    How to cite and reference

    Link to this chapter Copy to clipboard

    Cite this chapter Copy to clipboard

    Solomon A. Tesfamicael and Faraz Barzideh (November 2nd 2017). Bayesian Inference and Compressed Sensing, Bayesian Inference, Javier Prieto Tejedor, IntechOpen, DOI: 10.5772/intechopen.70308. Available from:

    Embed this chapter on your site Copy to clipboard

    <iframe src="http://www.intechopen.com/embed/bayesian-inference/bayesian-inference-and-compressed-sensing" />

    Embed this code snippet in the HTML of your website to show this chapter

    chapter statistics

    387total chapter downloads

    More statistics for editors and authors

    Login to your personal dashboard for more detailed statistics on your publications.

    Access personal reporting

    Related Content

    This Book

    Next chapter

    Sparsity in Bayesian Signal Estimation

    By Ishan Wickramasingha, Michael Sobhy and Sherif S. Sherif

    Related Book

    First chapter

    Bayesian Networks for Supporting Model Based Predictive Control of Smart Buildings

    By Alessandro Carbonari, Massimo Vaccarini and Alberto Giretti

    We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

    More about us