Open access

Introductory Chapter: A Strong Come-Back of Bayesian Inference

Written By

İhsan Ömür Bucak

Submitted: 30 October 2023 Published: 17 January 2024

DOI: 10.5772/intechopen.1003754

From the Edited Volume

Bayesian Inference - Recent Trends

İhsan Ömür Bucak

Chapter metrics overview

38 Chapter Downloads

View Full Metrics

1. Introduction

The Oxford Reference dictionary defines Bayesian inference as a statistical inference technique that estimates the probability of an event taking place in terms of the frequency of occurrence of the event, and also notes that it is based on Bayes’ theorem [1].

Bayesian methods allow uncertainty to be evaluated on the basis of performed predictions. They can also show superior performance in utilizing prior knowledge and bringing robustness into the model, in principle, due to their inherent effective containment strategy in handling uncertainties related to parameter estimates [2].

Bayesian methods as a generative approach can learn not only related priors, but also adopted priors especially for hyper-parameters through training data without requiring expensive cross-validation techniques, by allowing uncertainty assessment about the (performed) predictions and the parameter estimates [2].

Answers to a state-of-the-art question of how fast, accurate, reliable and scalable inference algorithms can be strengthened in terms of design with current optimization and stochastic approximation techniques have been the subject of works conducted in recent years [2].

Bayes theorem addresses the more difficult inverse problem (posterior) that is related to any machine learning process, as opposed to the easy one which is forward problem (likelihood). In other words, the objective is to estimate or infer a model given the observed data. This is exactly the task of machine learning. Its update is based on the assumption of a prior (knowledge) and the availability of the estimated distribution of the data represented by the denominator in the Bayes theorem which is not taken into consideration in the discriminative models at all [2]. However, in general, overfitting remains one of the main challenges between machine learning and the inverse problems [2].

A prior can be placed on both parametric models consisting of a fixed number of unknown parameters and non-parametric models consisting of functions and/or a non-constant number of unknown parameters, the number of which varies only with the size of the data set. Practically, a specific prior is determined by the balancing choice between the prior itself which takes inspiration from its own strength, and the posed difficulty in the derivation of inference algorithms [2].

An important feature of Bayesian learning is its posterior distribution, which can be used to generate a point prediction and provide the amount of uncertainty of the point prediction.

The number of parameters used in non-parametric models and learned directly from the data during training is often initially very large, and then the algorithm takes on the task of reducing them to a limited set of parameters. Here, the priors referring to an infinite number of parameters take place in the scene, and the “true” number of parameters is tried to be recovered by the respective posteriors [2].

Advertisement

2. The place and importance of Bayesian techniques in machine learning

Machine learning is just one part of the field of artificial intelligence, and today there are many developments using machine learning and its models. Bayesian and statistical algorithms are a subsection of machine learning.

Since measurements of dependent and independent variables are inherently noisy and imprecise, and the relationship between them is invariably non-deterministic, it proves to us that a consistent and principled way that will enable us to make meaningful reasoning in the presence of this uncertainty is only possible through probability theory and especially Bayes’ rule, which explains the logic of uncertainty [3]. One of the main tasks of machine learning is to approximate the conditional probability P(A | B) with a suitably specified model based on given sets of examples corresponding to these measurements, if we call the dependent and independent variable measurements A and B respectively.

Bayesian inference is more prominent in the modeling procedure. We have a type of parameterized model defined as follows, where, w represents a vector of all “tunable” parameters in the model as a conditional probability:

P(B|A)=f(Aw).E1

The task of predicting the unknown B based on A is done by evaluating f(A; w) such that the w parameters are optimum. Bayesian inference puts parameters, such as w, in the same category of random variables as A and B. However, if the f model is made too complex, we may run the risk of overtraining the observed D data and, as a result, realizing a poor P(B | A) distribution model [3]. Here, a given set D consists of N examples of variables A and B. Overtraining or overspecializing occurs in training algorithms, where a limited number of training examples are available. Since the training data set, which contains a large number of training samples, will perfectly represent the test set, no difference should be expected between the performance of the training set and the test set. On the other hand, since the data sets are limited in practice, it is normal for the test set performance to be worse than the training set performance. This is because the model is very specialized and cannot generalize well [4].

Machine learning methods have been extensively used for over the decades to extract useful information out of data and then use it to perform predictions. Such methods, at most, count on employing a parametric model to describe the data given and then estimates to describe the unknown model parameters are driven through an inference on an estimation technique.

Machine learning from the Bayesian point of view is receiving a lot of attention in recent years, due to its relative advantages and the additional knowledge provided by posteriors/posterior distributions.

Advertisement

3. The place of Bayesian techniques in deep neural networks (DNNs)

We have witnessed the return of Bayesian methods as a new source of inspiration in the design of deep neural networks, with strong ties to Bayesian models and unsupervised learning [2].

Bayesian DNN’s inference steps differ from their deterministic peers in two ways: The first is that when unknown synaptic parameters or weights are defined in the form of parameterized distributions, the cost function to be optimized is expressed not by synaptic weights but by hyper-parameters describing the relevant distributions. The second is that the evidence function to be maximized is not traceable.

After 2010, the most important design problem in DNNs that dominate the machine learning field with their tremendous representation capability and their extraordinary predictive achievement for many learning tasks is pruning, which leads to the removal of unnecessary nodes and links and thus plays a major role in reducing the number of associated parameters as well as the DNN size.

Advertisement

4. Applications of Bayesian learning models

A wide variety of applications of Bayesian learning models attract attention today. It is certain that Bayesian learning models are particularly widespread around large and complex wireless communication systems, such as 6G, automated systems that constantly face fast-charging environments and critical decision-making processes.

In this context, we will try to briefly emphasize the diversity of applications in this field by giving some examples of innovative studies that we have selected in recent years from the present to the past.

The first of these studies was chosen from communication theory. Bayesian inference has been conducted for simulating radio channels on a newly proposed stochastic multipath model to approximate the analytically intractable posterior distribution as likelihood function by introducing a novel Markov Chain Monte Carlo technique to improve the efficiency of Monte Carlo computations [5].

Bayesian inference can also be applied in machine learning techniques for system identification. For example, Bayesian inference has found application in performing system identification by modeling posterior distributions of the model parameters. To this end, Bayesian inference has been used to improve the performance of black-box high-precision modeling on a fine-tuning training of neural networks for unknown non-linear systems using informative seismic data to evaluate the methods. Likewise, the constitution of the machine learning structures can also be used as a prediction tool in time series as is its simplest form of Bayesian inference [6].

In another similar application, Bayesian inference has been used as based on data-driven training method to train on-line the historical seismic data of Central Italy between the years 2014 and 2016 as (having) a prior knowledge for predicting three important seismological parameters in future events as likelihood, such as time, magnitude and location, and updating the posterior distribution with more recent data in order to improve forecasting under the existence of such uncertainty [7].

Bayesian statistics supports detailed and rigorous data analysis for experimental software engineering, whereas frequentist statistics has many fundamental issues. In the case of principled data analysis processes, their results are interpreted more naturally and directly related to practically important measures. This points to the closing of the gap between statistical and practical significance, which makes the real difference in practice. Bayesian statistics can help experimental software engineering establish solid foundations and achieve solid results. Some Bayesian analysis techniques can even be used to simulate scenarios that are realistic and different from those measured; this capability accelerates decision-making in real-world software engineering scenarios [8].

Bayesian inference has been used in the semiconductor industry to propose a new version of cell-aware diagnosis flow to precisely identify defect candidates of customer returns, and to confine increasing number of defects in parallel with unending effort of continued pushes for higher density within the cell structures of digital ICs. Cell-Aware test is the only way today to achieve the required low defective-parts-per-million rates for critical applications in an attempt to ensure no similar defect occurrences in the future [9].

Bayesian inference has been used as part of statistical modeling for predicting mean value of theoretical housing prices out of the given dataset, and to validate it in an attempt to apply the model further to foresee more theoretical asset price of similar large items that help economists calculate the difference between the theoretical asset price and the asset price subjected to an abrupt change under the impact of COVID-19 [10].

Rapid development of machine learning has boosted joint effort of computer scientists on workload characterization, particularly on acceleration-related workloads in deep learning, which is a leading algorithm to advance learning patterns from large mass of labeled data, or supervised learning, and on performance bottlenecks that lend themselves to optimization. In deep learning, models are trained using labeled data, while uncertainty or noise is not explicitly modeled. When we do not have enough labeled data or are trying to establish an uncertain relationship between different types of observed data, we can build a Bayesian model and use Bayesian inference to find out what we want from the data. Bayesian inference is a particularly important machine learning technique that complements deep learning in many fields. Bayesian methods often out rate deep learning in supervised learning. In particular, Bayesian inference and modeling are more successful in producing more interpretable models, especially in the case of unlabeled or limited data, using informative priors to its full extent in favor of its own influence. One of the suggested techniques in this context claims to improve Bayesian inference performance by an average of 5.8x times (by 5.8x on average) compared to naive assignment and execution of the workloads. The authors even claim that the proposed techniques can facilitate the deployment of Bayesian inference as a genetic web service [11].

Policy evaluation in on-line reinforcement learning has been considered using Gaussian processes to represent the action value function in terms of probabilistic perspective. What is meant by probabilistic perspective is the sequential update of the action value function with respect to state and reward functions of the observed variables by Bayesian inference during Policy evaluation. Collected samples modeled as the observed variables and the action value function modeled as the latent variables are used to transform the policy iteration into a problem of inferring the latent variables from the observed ones which were based on the probability theory. In other words, the action value function is being updated according to Bayesian inference at the second stage implementation of a Bayesian reinforcement learning algorithm known as SARSA, in which the action value function as a stochastic variable is being modeled through Gaussian processes at its first stage [12].

Advertisement

5. Conclusion

Bayes’ Theorem is extremely successful in providing an intuitively clear alternative method for estimating parameters as well as determining the degree of confidence or uncertainty in these estimates.

Since it is known that (statistical) inference is the process of extracting features about a population or probability distribution from data, it is quite natural that Bayesian inference is defined by making an analogy with the former one as extracting feature about a population or probability distribution from data using Bayes’ theorem.

In Bayesian statistics, the information necessary to make inferences is found in observed data, while in frequentist statistics, it is found in other unobserved quantities.

The main advantages of using these tools include the fact that the understanding of complex Bayesian concepts and procedures is achieved in most cases without mathematical formulas and that these tools allow some aspects of Bayesian inference hidden in the mathematical formula of Bayes Theorem to be revealed by the researcher.

References

  1. 1. Daintith J. Bayesian inference. In: A Dictionary of Physics [Internet]. Oxford: OUP Oxford; 2014. Available from: https://www.oxfordreference.com/display/10.1093/oi/authority.20110810104430938 [Accessed: 27 October 2023]
  2. 2. Cheng L, Yin F, Theodoridis S, Chatzis S, Chang TH. Rethinking Bayesian learning for data analysis: The art of prior and inference in sparsity-aware modeling. IEEE Signal Processing Magazine. 2022;39(6):18-52. DOI: 10.1109/MSP.2022.3198201
  3. 3. Tipping, ME. Bayesian inference: An introduction to principles and practice in machine learning. In: Bousquet O, von Luxburg U, Rätsch G, editors. Advanced Lectures on Machine Learning. ML 2003. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer; 2004. 3176. DOI: 10.1007/978-3-540-28650-9_3
  4. 4. Purnell DW, Botha EC. Improved performance and generalization of minimum classification error training for continuous speech recognition. In: Proceedings of the Sixth International Conference on Spoken Language Processing, ICSLP 2000/INTERSPEECH 2000. Beijing, China: ISCA; 16-20 Oct 2000. pp. 165-168. DOI: 10.21437/ICSLP.2000-777
  5. 5. Hirsch C, Bharti A, Pedersen T, Waagepetersen R. Bayesian inference for stochastic multipath Radio Channel models. IEEE Transactions on Antennas and Propagation. 2023;71(4):3460-3472. DOI: 10.1109/TAP.2022.3215820
  6. 6. Morales J, Yu W. Bayesian inference for neural network based high-precision modeling. In: 2022 IEEE Symposium Series on Computational Intelligence (SSCI). Singapore, Singapore: IEEE; 2022. pp. 442-447. DOI: 10.1109/SSCI51031.2022.10022075
  7. 7. Morales J, Yu W. A novel Bayesian inference based training method for time series forecasting. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC); 2021. Melbourne, Australia: IEEE; 2021. pp. 909-913. DOI: 10.1109/SMC52423.2021.9659009
  8. 8. Torkar R, Furia CA, Feldt R. Bayesian data analysis for software engineering. In: IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion); 2021. Madrid, Spain: IEEE; 2021. pp. 328-329. DOI: 10.1109/ICSE-Companion52605.2021.00140
  9. 9. Mhamdi S, Girard P, Virazel A, Bosio A, Ladhar A. Cell-aware diagnosis of customer returns using Bayesian inference. In: 22nd International Symposium on Quality Electronic Design (ISQED); 2021. Santa Clara, CA, USA: IEEE; 2021. pp. 48-53. DOI: 10.1109/ISQED51717.2021.9424337
  10. 10. Shu H. Inference in census-house dataset. In: International Conference on Signal Processing and Machine Learning (CONF-SPML); 2021. Stanford, CA, USA: IEEE; 2021. pp. 282-285. DOI: 10.1109/CONF-SPML54095.2021.00061
  11. 11. Wang YE, Zhu Y, Ko GG, Reagen B, Wei GY, Brooks D. Demystifying Bayesian inference workloads. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS); 2019. Madison, WI, USA: IEEE; 2019. pp. 177-189. DOI: 10.1109/ISPASS.2019.00031
  12. 12. Xia Z, Zhao D. Online reinforcement learning by Bayesian inference. In: International Joint Conference on Neural Networks (IJCNN); 2015. Killarney, Ireland: IEEE; 2015. pp. 1-6. DOI: 10.1109/IJCNN.2015.7280437

Written By

İhsan Ömür Bucak

Submitted: 30 October 2023 Published: 17 January 2024