1. Introduction
Bayesian inference derives from Bayesian theory, which depicts the probability of occurrence of an event given some prior information. Due to the huge advances in computational and modeling techniques, Bayesian inference has been increasingly become an important tool for data analysis in the Bayesian framework and has widely been applied to various fields, including social science, engineering, philosophy, medicine, sport, law and psychology, for parameter/nonparameter estimation, hypothesis test, and prediction. Various Bayesian methods including Markov chain Monte Carlo, objective Bayesian method, subjective Bayesian method, approximate Bayesian computation, and variational Bayesian methods have been developed to make Bayesian inference on various problems such as large-scale image classification and cluster analysis of microarray, and models including parametric, nonparametric, semiparametric models, and other complicated models such as joint models of survival data and longitudinal data, graphical models, computer models, neural network models, and spatial econometric models. In particular, in the big data era, various Bayesian fronts including theories, methods, and computational algorithms have been developed for accommodating the applications of AI and data science in recent years [1], for example, the prior learning, Bayes factor evaluation, Bayesian variable selection, robust Bayesian inference, variational Bayesian inference, resampling, approximation of posterior distribution, approximate Bayesian computation, and debias methods for high−/ultrahigh-dimensional data, multisource heterogeneous data, imbalanced data, missing data, and data stream. But there are some challenging problems, for example, how to balance the computational times and statistical efficiency, design efficient Bayesian computational algorithm and robust sampling schemes for big/massive data, distributed data and streaming data in the privacy protection and the defense of malicious attacks framework, and modeling so that they can adapt to the development of AI and the requirement of data mining, to be addressed and solved for Bayesian inference. In what follows, we will introduce the recent development and some topics of interest on Bayesian inference.
2. Bayesian estimation
For statistical models, Bayesian estimation is usually obtained from its posterior distribution based on Bayes theory. In general, Bayesian estimation includes Bayesian estimation of parameters and nonparametric functions. For parametric Bayesian estimation, we need first to specify the prior distribution of the parameter and then evaluate its posterior mean/median (i.e., Bayesian estimation of parameter) from its posterior distributions if the quadratic/mean-absolute loss function is used. For nonparametric Bayesian estimation, we need first approximate nonparametric function via some proper method such as B-splines and P-splines [2], i.e., parameterized approximation to nonparametric function, which leads to a parameterized model, and then employ Bayesian idea of parametric models to evaluate Bayesian estimates of parameter and nonparametric function. In what follows, we introduce how to evaluate Bayesian estimates of parameters or nonparametric functions in a relatively complicated model (e.g., random effects model/latent variable model).
In a latent variable model with missing response data, we assume the following form:
where
In general, a simple and standard assumption for the distributions of
To introduce missing data, let
To make Bayesian inference on the considered model (1), we need to specify a prior distribution for unknown parameters or coefficients in approximating unknown nonparametric functions. A standard assumption for unknown parameters is some proper parametric distribution family such as normal distribution, gamma distribution, inverse gamma distribution, inverse Gaussian distribution, Wishart distribution, and Beta distribution in which their hyperparameters are user prespecified. Their misspecification or improper application may lead to unreasonable even misleading parameter estimation. Bayesian inference based on these assumptions did not utilize historical data and limits its popularity in that the usage of historical data may improve the efficiency of parameter estimation. To address this issue, some relaxed priors are considered, for example, see power prior, g-prior, normalized power prior [13], calibrated power prior, dynamic power prior, and the power-expected-posterior prior and the scale transformed power prior [14]. For a high-dimensional sparse parametric model, we can assume a spike and slab prior for the parameter, which can be hierarchically expressed as a mixture of a normal distribution and an exponential distribution.
Let
its posterior mean (i.e., Bayesian estimate) can be evaluated by
where
From Eq. (3), it is easily seen that evaluating
Let
respectively. Their corresponding standard deviations can be computed with their corresponding sample covariance matrices of the observations. The details can refer to the literature [7]. The above argument on Bayesian inference is a classical method. However, for a high-dimensional parametric or nonparametric model, one needs some new approaches to solve the computing time and efficiency and stability of algorithm problem. In fact, when the dimension of covariate matrix is large and the sample size is relatively small, i.e., the well-known “large
To solve this issue for a high-dimensional regression model, there are some novel approaches developed for parameter/nonparametric function estimation in the Bayesian framework, for example, see Bayesian Lasso, Bayesian adaptive Lasso, Bayesian elastic net, and Bayesian
3. Model comparison
Model comparison is widely used to select a plausible model to fit a given dataset among all the considered candidate models. Various methods have been developed to make model comparisons for many models such as linear/nonlinear regression models, structural equation models, multilevel models, machine learning models, and pattern recognition model in the Bayesian framework over the past years.
To select a better model among all the candidate models, we can adopt the well-known best subset selection methods such as Akaike information criterion (AIC), Bayesian information criterion (BIC), deviance information criterion (DIC), generalized information criterion (GIC), minimum description length (MDL), Hannan-Quinn information criterion (HIC), and log scoring criterion (also called the conditional predictive ordinate, i.e., CPO), which trade off a measure of model plausibility and a measure of model complexity. Also, the Bayes factor [17] has been developed to conduct Bayesian model comparison and is widely utilized to investigate the strength of the evidence in favor of one model among two candidate models. The Bayes factor for two competing models
where
One serious defect of the Bayes factor for model comparison is that it is well defined for improper priors of
References
- 1.
Tang N, Liu C, Shi JQ, Huang Y. Editorial: Bayesian inference and Ai. Frontiers in Big Data. 2022; 5 :1-2 - 2.
Tang AM, Tang NS. Semiparametric Bayesian inference on skew-normal joint modeling of multivariate longitudinal and survival data. Statistics in Medicine. 2015; 34 :824-843 - 3.
Lee SY, Tang NS. Bayesian analysis of structural equation models with mixed exponential family and ordered categorical data. British Journal of Mathematical and Statistical Psychology. 2006; 59 :151-172 - 4.
Lee SY, Tang NS. Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data. Statistica Sinica. 2006; 16 :1117-1141 - 5.
Kim S, Dahl DB, Vannucci M. Spiked Dirichlet process prior for Bayesian multiple hypothesis testing in random effects models. Bayesian Analysis. 2009; 4 :707-732 - 6.
Tang N, Wu Y, Chen D. Semiparametric Bayesian analysis of transformation linear mixed models. Journal of Multivariate Analysis. 2018; 166 :225-240 - 7.
Gallant AR, Nychka DW. Semiparametric maximum likelihood estimation. Econometrica. 1987; 55 :363-390 - 8.
Wright WA. Bayesian approach to neural-network modeling with input uncertainty. IEEE Transactions on Neural Network. 1999; 10 :1261-1270 - 9.
Kim JK, Yu CL. A semeiparametric estimation of mean functionals with nonignorable missing data. Journal of the American Statistical Association. 2011; 2011 (106):157-165 - 10.
Tang NS, Zhao PY, Zhu H. Empirical likelihood for estimating equations with nonignorable missing data. Statistica Sinica. 2014; 24 :723-747 - 11.
Wang ZQ, Tang NS. Bayesian quantile regression with mixed discrete and nonignorable missing covariates. Bayesian Analysis. 2020; 15 :579-604 - 12.
Liu M, Zhang Y, Zhou D. Double/debiased machine learning for logistic partially linear model. The Econometrics Journal. 2021; 24 :559-588 - 13.
Ibrahim JG, Chen MH, Sinha D. On optimality properties of the power prior. Journal of the American Statistical Association. 2003; 98 :204-213 - 14.
Nifong B, Psioda MA, Ibrahim JG. The scale transformed power prior for use with historical data from a different outcome model. DOI: 10.48550/arXiv.2105.05157 - 15.
Gelman A. Inference and monitoring convergence. In: Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov Chain Monte Carlo in Practice. London: Chapman and Hall; 1996. pp. 131-143 - 16.
Yi JY, Tang N. Variational Bayesian inference in high-dimensional linear mixed models. Mathematics. 2022; 10 :463 - 17.
Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995; 90 :773-795