This chapter provides the use of Bayesian inference in compressive sensing (CS), a method in signal processing. Among the recovery methods used in CS literature, the convex relaxation methods are reformulated again using the Bayesian framework and this method is applied in different CS applications such as magnetic resonance imaging (MRI), remote sensing, and wireless communication systems, specifically on multiple-input multiple-output (MIMO) systems. The robustness of Bayesian method in incorporating prior information like sparse and structure among the sparse entries is shown in this chapter.
- Bayesian inference
- compressive sensing
- sparse priors
- clustered priors
- convex relaxation
In order to estimate parameters in a signal, one can apply wisdoms of the two schools of thoughts in statistics called the classical (also called the frequentist) and the Bayesian. These methods of computing are competitive with each other at times. The definition of probability is where the basic difference arises from. The frequentist define
After presenting the theoretical frameworks, Bayesian theory, CS, and convex relaxation methods in Section 2, the use of Bayesian inference in CS problem by considering two priors modeling the sparsity and clusteredness is shown in Section 3. In Section 4, we present three examples of applications that show the connection of the two theories, Bayesian and compressive sensing. In Section 5, the conclusion is given briefly.
2. Theoretical framework
2.1. Bayesian framework
For two random variables
and the famous Bayes’ theorem provides
Using the same framework, consider model
Now, we first find the maximum of the posterior distribution called maximum a posteriori (MAP). It defines the most probable value for the parameters denoted . MAP is related to Fisher’s methods of maximum likelihood estimation (MLE), . If
But under Bayesian inference, let
and the maximum a posteriori estimation of
Inference based on the posterior is not an easy task since it involves multiple integral, which are cumbersome to solve at times. However, it can be computed in several ways: Numerical optimization (like Conjugate gradient method, Newton method,…), modification of an expectation-maximization algorithm and others. As we can see it from (22) and (8), the difference between MLE and MAP is the prior distribution. The latter can be considered as a regularization of the former. Here we can summarize the posterior distribution by the value of the best fit parameters
2.2. Compressive sensing
Compressive sensing (CS) is a paradigm to capture information at lower rate than the Nyquist-Shannon sampling rate when signals are sparse in some domain [10–13]. CS has recently gained a lot of attention due to its exploitation of signal sparsity. Sparsity, an inherent characteristic of many natural signals, enables the signal to be stored in a few samples and subsequently be recovered accurately.
As a signal processing scheme, CS follows a similar fashion: encoding, transmission/storing, and decoding. Focusing on the encoding and decoding of such a system with noisy measurement, the block diagram is given in Figure 2. At the encoding side, CS combines the sampling and compression stages of a traditional signal processing into one step by measuring few samples that contain maximum information about the signal. This measurement/sampling is done by linear projections using random sensing transformations as shown in the landmark papers by the authors mentioned above. Having said this, let us define the CS problem formally as follows:
One can ask again two of the questions here, in relation to the standard CS problem. First, how should we design the matrix
An equivalent description of the RIP is to say that all subsets of
This means, if
Under conventional sensing paradigm, the dimension of the original signal and the measurement should be at least equal. But in CS, the measurement vector can be far less than the original. While at the decoding side, reconstruction is done using nonlinear schemes. Eventually, the reconstruction is more cumbersome than the encoding which was only projections from a large space to a smaller space. On the other hand, finding a unique solution that satisfies the constraint that the signal itself is sparse or sparse in some domain is complex in nature. Fortunately, there are many algorithms to solve the CS problem, such as iterative methods such as
2.3. Convex relaxation methods for CS
Various methods for estimating
which performs very badly for the CS estimation problem we are considering. In order to incorporate the methods called convex relaxation, let us define an important concept first.
The exact solution for the noiseless CS problem is given by
Justified by this theorem, (15) is an optimization problem which can be solved in polynomial time and the fact that it gives the exact solution for the problem (14) under some circumstance has been one of the main reasons for the recent developments in CS. There is a simple geometric intuition on why such an approach gives good approximations. Among the
Usually real world systems are contaminated with noise,
as it will be shown later as LASSO, where
Further, this chapter provides the use of Bayesian framework in compressive sensing by incorporating two different priors modeling the sparsity and the possible structure among the sparse entries in a signal. Basically, it is the summary of the recent works [2, 25–27].
3. Bayesian inference used in CS problem
Under Bayesian inference, consider two random variables
Further, the maximum a posteriori (MAP), , is defined as
MAP is related to Fisher’s methods of maximum likelihood estimation (MLE), :
As we can see it from (21) and (22), the difference between MAP and MLE is the prior distribution. The former can be considered as a regularized form of the latter. Since we apply Bayesian inference we assume further different prior distributions on
3.1. Sparse prior
The estimators of
where the regularizing function
such that for sufficiently large
is finite. Furthermore, let the assumed variance of the noise be given by where
Since the pdf of the noise
The MAP estimator, (21), becomes
Now, as we choose different regularizing function, we get different estimators as listed below :
Linear estimators: when (26) reduces to
which is the LMMSE estimator. But we ignore this estimator in our analysis since the results are not sparse. However, the following two estimators are more interesting for CS problems since they enforce sparsity into the vector
LASSO estimator: when
Zero-norm regularization estimator: when
As mentioned earlier, (29) is the best solution for estimation of the sparse vector
3.2. Clustering prior
The entries of the sparse vector In  a hierarchical Bayesian generative model for sparse signals is found in which they have applied full Bayesian analysis by assuming prior distributions to each parameter appearing in the analysis. We follow a different approach.
In  a hierarchical Bayesian generative model for sparse signals is found in which they have applied full Bayesian analysis by assuming prior distributions to each parameter appearing in the analysis. We follow a different approach.
and we use a regularizing parameter
The new posterior involving this prior under the Bayesian framework is proportional to the product of the three pdfs:
By similar argument s as used in 3.1, we arrive at the clustered LASSO estimator
4. Bayesian inference in CS applications
Compressed sensing paradigm has been applied to many signal processing areas [31–41]. However, at this time, building the hardware that can translate the CS theory into practical use is very limited. Nonetheless, the demand for cheaper, faster, and efficient devices will motivate the use of CS paradigm in real-time systems in the near future.
So far, in image processing, one can mention the single-pixel imaging via compressive sampling , in magnetic resonance imaging (MRI) for reducing scan time and improved image quality , in seismic images , and in radar systems for simplifying hardware design and to obtain high resolution [34, 35]. In communication and networks, CS theory has been studied for sparse channel estimation , for under water acoustic channels which are inherently sparse , spectrum sensing in cognitive radio networks , for large wireless sensor networks (WSNs) , as a channel coding scheme , localization  and so on. A good CS application literature review is provided in , which basically is the summary of the bulk of literatures given at http://dsp.rice.edu/cs.
In this chapter, there are examples of CS theory applications using Bayesian inference in imaging, like magnetic resonance imaging (MRI) and, in communication, i.e., multiple-input multiple-output (MIMO) systems, and in remote sensing. First, let us see the impact of the estimators derived above, LASSO and clustered LASSO, in MRI.
4.1. Magnetic resonance imaging (MRI)
MRI images are usually very weak due to the presence of noise and due to the weak nature of the signal itself. Compressed sensing (CS) paradigm can be applied in order to boost such signal recoveries. We applied CS paradigm via Bayesian framework, that is, incorporating the different prior information such as sparsity and the special structure that can be found in such sparse signal recovery method is applied on different MRI images.
4.1.1. Angiogram image
Angiogram images are already sparse in the pixel representation. An angiogram image taken from the University Hospital Rechts der Isar, Munich, Germany  is used for our analysis. The image we took is sparse and clustered even in the pixel domain. The original signal after vectorization is
4.1.2. Phantom image
Another MRI image considered is the Shepp-Logan phantom which is not sparse in spatial domain. However, we sparsified it in K-space by zeroing out small coefficients. We then measured the sparsified image and added noise. The original signal after vectorization is
4.1.3. fMRI image
Another example to apply the clustered LASSO based image reconstruction using Bayesian framework to medical images is a functional magnetic resonance imaging (fMRI), a non-invasive technique of brain mapping, which is crucial in the study of brain activity. Taking many slices in fMRI data, we saw how these data sets are sparse in the Fourier domain. This is shown in Figure 8. We observed the whole data in this domain for the whole brain image. They all share the characteristics we have based our analysis, i.e., sparsity and clusteredness. Then we took some slices which are consecutive in the slice order and took different
In fMRI, results are compared using image intensity which gives a good ground for a health practitioner to observe and decide in accordance to the available information. The more one have prior knowledge on how the brain regions work in human beings or pets the better priors that one incorporate to analyze the data. So this is an interesting tool for researchers in the future.
4.2. MIMO systems
Multiple-input multiple-output (MIMO) systems are integrated in modern wireless communications due to their advantage in improving performance with respect to many performance metrics. One such advantage is the ability to transmit multiple streams using spatial multiplexing, but channel state information (CSI) at the transmitter is needed to get optimal system performance.
Consider a frequency division duplex (FDD) MIMO system consisting of
where and are unitary matrices and
where is transmitted vector,
Channel adaptive transmission requires knowledge of channel state information at the transmitter. In temporally correlated MIMO channels, the correlation can be utilized to reduce feedback overhead and improve performance. CS methods and rotative quantization are used to compress and feedback the CSI for MIMO systems . This was done as an extension work of . It is shown that the CS-based method reduces feedback overhead while delivering the same performance as the direct quantization scheme, using simulation.
Three methods are compared in the simulations, perfect CSI, without CS and with CS using matched filter (MF) and minimum mean square error estimator (MMSE) receivers for different total feedback bits
4.3. Remote sensing
Remote sensing satellites provide a repetitive and consistent view of the Earth and they offer a wide range of spatial, spectral, radiometric, and temporal resolutions. Image fusion is applied to extract all the important features from various input images. These images are integrated to form a fused image which is more informative and suitable for human visual perception or computer processing. Sparse representation has been applied to fuse image to improve the quality of fused image .
To improve the quality of the fused image, a remote sensing image fusion method based on sparse representation is proposed in . In these methods, the source images were represented with sparse coefficients first. Then, the larger values of sparse coefficients of panchromatic (Pan) image are set to 0. Thereafter, the coefficients of panchromatic (Pan) and multispectral (MS) image are combined with the linear weighted averaging fusion rule. Finally, the fused image is reconstructed from the combined sparse coefficients and the dictionary. The proposed method is compared with intensity-hue-saturation (IHS), Brovey transform (Brovey), discrete wavelet transform (DWT), principal component analysis (PCA) and fast discrete curvelet transform (FDCT) methods on several pairs of multifocus images. The proposed method using sparse representation outperforms, see Figure 12, better than the usual methods listed here. We believe that our method of clustered compressed sensing can also further improve this result.
In this chapter, a Bayesian way of analyzing data on CS paradigm is presented. The method assumes prior information like the sparsity and clusteredness of signals in the analysis of the data. Among the different reconstruction methods, the convex relaxation methods are redefined using Bayesian inference. Further, three CS applications are presented: MRI imaging, MIMO systems, and remote sensing. For MRI imaging, the two different priors are incorporated, while for MIMO systems and remote sensing, only the sparse prior is applied in the analysis. We suggest that including the special structure among the sparse elements of the data can be included in the analysis to further improve the results.
- In  a hierarchical Bayesian generative model for sparse signals is found in which they have applied full Bayesian analysis by assuming prior distributions to each parameter appearing in the analysis. We follow a different approach.