Open access peer-reviewed chapter

Bayesian Analysis for Random Effects Models

By Junshan Shen and Catherine C. Liu

Submitted: May 6th 2019Reviewed: September 27th 2019Published: June 16th 2020

DOI: 10.5772/intechopen.88822

Downloaded: 368


Random effects models have been widely used to analyze correlated data sets, and Bayesian techniques have emerged as a powerful tool to fit the models. However, there has been scarce literature that systematically reviews and summarizes the recent advances of Bayesian analyses of random effects models. This chapter reviews the use of the Dirichlet process mixture (DPM) prior to approximate the distribution of random errors within the general semiparametric random effects models with parametric random effects for longitudinal data setting and failure time setting separately. In a survival setting with clusters, we propose a new class of nonparametric random effects models which is motivated from the accelerated failure models. We employ a beta process prior to tact clustering and estimation simultaneously. We analyze a new data set integrated from Alzheimer’s disease (AD) study to illustrate the presented model and methods.


  • beta process
  • Dirichlet process mixture
  • clustered data
  • longitudinal data
  • random effects
  • survival outcome
  • nonparametric transformation model

1. Introduction

Random effects models have been widely used as a powerful tool for analyzing correlated data [1, 2]. The model features a finite number of random terms acting as latent variables to model unobserved factors; see [3] for a comprehensive review. Some authors have further proposed semiparametric mixed effect models by allowing for infinite dimensional random effects [4, 5]. Most of the aforementioned works draw inferences using frequentist approaches, while Bayesian approaches have been largely ignored because of the lack of computational feasibility and expediency. With the advent of the “supercomputer” era, Bayesian analyses have recently sparked much interest in the setting of random effects models for clustered data or longitudinal settings. However, there is scarce literature that has systematically reviewed the Bayesian works in the area.

By extending the traditional random effects models, recent research focus has shifted to study heterogeneous random effects or nonparametric distributions of random effects, which arise because of skewness of data, missing covariates, or unmeasurable subject-specific covariates [6]. The extended random effects models, termed semiparametric random effects models, improve statistical performance with added interpretability. Bayesian techniques, which provide a convenient means to model non-Gaussian distributions, have recently been proposed for semiparametric random effects model in a variety of settings ([7, 8], among others). The discreteness of the Dirichlet process makes it impossible as a prior for estimating a density. However, as a remedy by convolving with a kernel, Dirichlet process mixture plays an important role [9].

For censored outcome data, transformation models, which transform the time-to-event responses using a monotone function and link them to the covariates of interest, have surged as a strong competitor of the Cox model [10]. Moreover, the transformation model framework is fairly general. The Cox model and the proportional odd model [11] can be viewed as nonparametric transformation linear models with some specific error terms; see [12, 13, 14]. For correlated data, the transformation model naturally extends the semiparametric random effects model by directly incorporating random effects to the transformation functions, treating them as realizations of an underlying random function. Bayesian analyses have found much use in this new area. For example, the beta process has been found to be a reasonable candidate for the prior of the monotone transformation function [15, 16, 17].

This chapter focuses on the Bayesian analysis of the transformed linear model with censored data and in a clustered setting. In many biomedical studies, the observations are naturally clustered. For example, patients in observational studies can be grouped in analysis according to a variety of factors, such as age, race, gender, and hospital, in order to reduce the confounding effects. Following Mallick and Walker [18], we explore using a mixture of beta distributions and the beta process as the candidates for the prior distribution of the random transformation function [17, 19, 20].

The rest of this chapter is structured as follows. Section 2 reviews the use of the Bayesian approach to infer parametric random effects models. In the setting of survival analysis, Section 3 proposes a beta process prior to fit random effects model with nonparametric transformation functions, and Section 4 applies the method to study the progression of Alzheimer’s disease (AD). Section 5 concludes the chapter with future research directions.


2. Dirichlet process mixture prior

In parametric random effects models, we considered the situation that the distribution form of the random error term is unknown. Dirichlet process mixture (DPM) is used as the prior for the baseline distribution in that error terms used to be continuous random variables in most situations.

2.1 Linear mixed effects model

With a longitudinal data set Yixizi, we posit a mixed effects model with an AR(1) serial correlation structure:


where yi=yi1yiniTwith yijbeing the jth response of the ith subject for i=1,,m, βis a p×1vector of fixed effect parameters, bia q×1Gaussian random vector representing the subject-specific random effects, xiand ziare ni×pand ni×qdesign matrices linking βand bito yi, respectively, wi=wi1winiTis an ni×1vector of model errors, ρis the autoregressive coefficient, and ϵijsare i.i.d. noises. When ϵijis non-normal, we assume a mixture model:


where φuσ2is the probability density function for a normal random variable with mean uand variance σ2and Gis an unspecified probability distribution of usatisfying udGu=0, which ensures that ϵcomes from a mean-zero mixture distribution.

Replacing the Dirichlet process by an equivalent Pólya urn representation, [8] employed an empirical likelihood approach with the moment constraints and developed a posterior adjusted Gibbs sampler for more precise estimation. The algorithm is computationally feasible.

2.2 Accelerated failure time model

We shift gears to study survival outcomes with a cluster structure. Denote the data set by TijXij,i=1,,K,j=1,,ni, where Tijis the failure time of the jth subject in the ith cluster and Xijis a vector of associated covariates. To accommodate such data, we utilize a general accelerated failure time model:


where βis a vector of p-dim regression coefficients of interest and εijare independent random errors following the distribution with density fi. [7] posed an exponential tilt on the distributions of error terms to incorporate the cluster heterogeneity. That is,


where qtis a q-dimensional prespecified functions containing potential covariate information and θiis the corresponding parameter vector with θ0i=logexpθiTqtf1tdt1.Thus, θirepresents the parametric random effects in the model. Li et al. [7] place the DPM prior on the baseline density f1to develop a set of procedures which improves estimation efficiency through information pooling.

3. Beta process prior

We now present a nonparametric random effects model for the clustered survival data with nonparametric monotone link functions. We employ a beta process as the prior for the baseline function.

Let Tijdenote the failure time of the jth subject in the ith cluster, Xijbe the covariate vector for the subject, and Cijbe the potential censoring time to the jth subject in the ith cluster. Assume that Cijis independent of the failure time Tij. Let Zij=minTijCijand let δij=ITij<Cijbe the censoring indicator. Then the observed data can be described as


Within each cluster, Tijis linked to Xijvia the following transformation model:


where εijare i.i.d. variables with a known density function fεand Hitare unknown cluster-specific monotone functions, which are i.i.d. realizations of a random function and can be viewed as a nonparametric version of random effects for independent clusters. In a parametric setting, if we set Hit=texpbiwith bibeing a cluster-specific random effect, Eq. (6) reduces to a classical random effects model, which has been discussed in Section 2.2. The challenge, however, lies in how to draw inferences in such a nonparametric setting.

To proceed, let the coefficient vector βbe a p-dim unknown vector of interest. We further assume Hi′s are differentiable with derivative hit=Hit, and then the likelihood based on the observed data is




Here Sεis the survival function of varepsilondefined by Sεs=Pεs.

We develop a Bayesian inference procedure based on model (6). We assume that the regression coefficient βfollows a normal prior:


where Ipis the p×pdimensional identity matrix. Since Hiis assumed differentiable, we model it with a kernel convolution:


where Bis an increasing function and Φσis the zero-mean normal distribution with variance σ2. Hence, the derivative of Hiis


with ϕσt=1σϕtσ.This actually mimics the idea of DPM to smooth beta process by convolution.

We are in a position to select an appropriate stochastic process used as the prior of Bi. Beta process, as studied by [16, 17], is an ideal candidate for the prior of a monotone function. Specifically, beta process BPγB0with concentration parameter γand a base measure B0is an increasing Lévy process with independent increments of the form


Teh et al. [20] showed that a sample from BPγB0could be represented as


where pil=j=1lνiland θilνilfollows


In practice, we need to approximate samples of BPγB0with a finite dimensional form. Since beta process BPγB0can be represented by a stick-breaking process defined in Eq. (9), a natural approximation is obtained by retaining its first Lcomponents. That is,


with pil=j=1lνil,l=1,,L. Denote ξi=νi1νiLθi1θiLTand define


The approximated posterior based on the truncated DP is




The samples for βand ξ1ξnbased on the posterior can be obtained with Markov chain Monte Carlo (MCMC) [21]. In our simulation, we use the R-package MCMC ( to draw samples for ξ1,,ξnand βand use the Metropolis algorithm with a normal working distribution.

4. An application to Alzheimer’s disease neuroimaging initiative

Alzheimer’s Disease Neuroimaging Initiative (ADNI) is a multisite cooperative study for the purpose of improving the prevention and treatment of Alzheimer’s disease. The subjects in the study fall into three groups, cognitively normal (CN) individuals, mild cognitive impairment (MCI) patients, and early AD patients. ADNI provides a rich array of patients’ information, including functional magnetic resonance imaging (fMRI), positron emission tomography (PET), longitudinal functional cognitive tests scores, blood samples, genetics data, and censored failure time outcomes. Details of the study can be found at

We focus on the MCI group. MCI is recognized as a transitional stage between normal cognition and Alzheimer’s disease. The failure time is defined to be the time that a MCI patient is diagnosed with AD, which will be censored if a MCI patient remains at the MCI stage at the end of the follow-up time. Wide heterogeneities are exhibited among the failure times, which may be due to demographics and a variety of functional clinical biomarkers, such as the brain areas of the hippocampus, ventricles, and entorhinal cortex. The goal of the analysis is to study the impact of risk factors on progression to AD.

Using the same data as analyzed by [14], we demonstrate our methodology by modeling the failure time (the observed time of AD diagnosis from MCI stage in year) of 281 MCI patients on gender (0 = female, 1 = male), years of education, the number of apolipoprotein E alleles (0, 1, or 2), and the baseline hippocampal volume.

As age is a strong confounder but the functional form of its impact has not reached consensus, we elect to model its impact nonparametrically. Specifically, we use age to form two strata (below and above the median age) and use model (6) to estimate the stratum-specific transformation functions and the effects of other covariates. For comparisons, we also fit model (6) with age as a continuous variable and with a common transformation function. That is, we do not assume the data are clustered. For both models, the regression errors ε’s are assumed to follow an exponential distribution with mean 10. In our calculation, we approximate the BPs by a finite truncation with L=20. We assume the precision parameter α=1and scale parameter σ21/σ2.

Figure 1 illustrates the estimated transformation function Hof the failure time without clustering. The posterior means (PM) and standard errors (SE) of the regression coefficients in the model are reported in Table 1 . We run the MCMC for 20,000 iterations with the first 4000 draws discarded as burn-in samples and use Geweke’s statistic to ensure the convergence of the chains.

Figure 1.

Smoothed transformation function without clustering.


Table 1.

Posterior estimates of regression coefficients with standard errors.

Figure 2 illustrates the estimated transformation functions with age-stratified data, and Table 2 summarizes the posterior means and standard errors of the other regression coefficients.

Figure 2.

Smoothed transformation functions with two age-strata: The left curve is the smoothed transformation function for group aged below the average age; the right curve is the smoothed transformation function for the group aged over the average age.


Table 2.

Posterior estimators of regression coefficients with standard errors.

The left curve is relatively flat, while the right curve has a sharper slope. This is consistent with the recognition that AD is an aging disease: elder people above a certain age threshold tend to progress faster from MCI to AD.

Both Tables 1 and 2 show that none of the biomarkers are significant, whereas they are statistically significant in the analysis of [14]. One possible conjecture is that our nonparametric transformation functions may have well captured the effects of unobserved confounders, which may leave little to be explained by the observed covariates. More thorough investigation is warranted.

5. Future directions

Following [12], we can extend the transformation model (6) by allowing the error function fεto be unspecified. In this case, we need to specify the regression coefficient βto obey some constraints such as β1=1or β=1for identifiability. We will propose to model the error function using a Dirichlet processes mixture model:


where φtμσ2is a normal kernel with mean μand variance σ2and Gare samples from a Dirichlet process DPα1G0=Nμμ0σ02×IGab, where α1is the mass parameter and IGabis the inverse gamma distribution with shape parameter aand scale parameter b.

In a slightly different context, we may also consider clustering observations by developing a new nested beta-Dirichlet process prior with companion MCMC algorithms. As there are limited works on functional random effects models that accommodate clustering structures observed, for example, from neural studies, we may propose a nested Dirichlet process [19] as the prior of Dirichlet process to cluster cumulative distribution functions successfully. We envision that such a nested Bayesian procedure will provide substantial computational expedience for practitioners and can certainly be applied to studies that cover beyond the neurodegenerative and aging diseases.



Shen’s research is partially supported by Beijing Natural Science Foundation 1192006 and National Natural Science Foundation of China; Liu’s research is partially supported by General Research Fund, Research Grants Council, Hong Kong, 15327216, and the Hong Kong Polytechnic University grant YBTR. Data collection and sharing for this project were funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research Development, LLC.; Johnson & Johnson Pharmaceutical Research Development LLC.; Lumosity; Lundbeck; Merck Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database ( As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at:

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Junshan Shen and Catherine C. Liu (June 16th 2020). Bayesian Analysis for Random Effects Models, Bayesian Inference on Complicated Data, Niansheng Tang, IntechOpen, DOI: 10.5772/intechopen.88822. Available from:

chapter statistics

368total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Bayesian Inference of Gene Regulatory Network

By Xi Chen and Jianhua Xuan

Related Book

Statistical Approaches With Emphasis on Design of Experiments Applied to Chemical Processes

Edited by Valter Silva

First chapter

Introductory Chapter: How to Use Design of Experiments Methodology to Get Most from Chemical Processes

By Valter Bruno Reis e Silva, Daniela Eusébio and João Cardoso

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us