Posterior estimates of regression coefficients with standard errors.
Random effects models have been widely used to analyze correlated data sets, and Bayesian techniques have emerged as a powerful tool to fit the models. However, there has been scarce literature that systematically reviews and summarizes the recent advances of Bayesian analyses of random effects models. This chapter reviews the use of the Dirichlet process mixture (DPM) prior to approximate the distribution of random errors within the general semiparametric random effects models with parametric random effects for longitudinal data setting and failure time setting separately. In a survival setting with clusters, we propose a new class of nonparametric random effects models which is motivated from the accelerated failure models. We employ a beta process prior to tact clustering and estimation simultaneously. We analyze a new data set integrated from Alzheimer’s disease (AD) study to illustrate the presented model and methods.
- beta process
- Dirichlet process mixture
- clustered data
- longitudinal data
- random effects
- survival outcome
- nonparametric transformation model
Random effects models have been widely used as a powerful tool for analyzing correlated data [1, 2]. The model features a finite number of random terms acting as latent variables to model unobserved factors; see  for a comprehensive review. Some authors have further proposed semiparametric mixed effect models by allowing for infinite dimensional random effects [4, 5]. Most of the aforementioned works draw inferences using frequentist approaches, while Bayesian approaches have been largely ignored because of the lack of computational feasibility and expediency. With the advent of the “supercomputer” era, Bayesian analyses have recently sparked much interest in the setting of random effects models for clustered data or longitudinal settings. However, there is scarce literature that has systematically reviewed the Bayesian works in the area.
By extending the traditional random effects models, recent research focus has shifted to study heterogeneous random effects or nonparametric distributions of random effects, which arise because of skewness of data, missing covariates, or unmeasurable subject-specific covariates . The extended random effects models, termed semiparametric random effects models, improve statistical performance with added interpretability. Bayesian techniques, which provide a convenient means to model non-Gaussian distributions, have recently been proposed for semiparametric random effects model in a variety of settings ([7, 8], among others). The discreteness of the Dirichlet process makes it impossible as a prior for estimating a density. However, as a remedy by convolving with a kernel, Dirichlet process mixture plays an important role .
For censored outcome data, transformation models, which transform the time-to-event responses using a monotone function and link them to the covariates of interest, have surged as a strong competitor of the Cox model . Moreover, the transformation model framework is fairly general. The Cox model and the proportional odd model  can be viewed as nonparametric transformation linear models with some specific error terms; see [12, 13, 14]. For correlated data, the transformation model naturally extends the semiparametric random effects model by directly incorporating random effects to the transformation functions, treating them as realizations of an underlying random function. Bayesian analyses have found much use in this new area. For example, the beta process has been found to be a reasonable candidate for the prior of the monotone transformation function [15, 16, 17].
This chapter focuses on the Bayesian analysis of the transformed linear model with censored data and in a clustered setting. In many biomedical studies, the observations are naturally clustered. For example, patients in observational studies can be grouped in analysis according to a variety of factors, such as age, race, gender, and hospital, in order to reduce the confounding effects. Following Mallick and Walker , we explore using a mixture of beta distributions and the beta process as the candidates for the prior distribution of the random transformation function [17, 19, 20].
The rest of this chapter is structured as follows. Section 2 reviews the use of the Bayesian approach to infer parametric random effects models. In the setting of survival analysis, Section 3 proposes a beta process prior to fit random effects model with nonparametric transformation functions, and Section 4 applies the method to study the progression of Alzheimer’s disease (AD). Section 5 concludes the chapter with future research directions.
2. Dirichlet process mixture prior
In parametric random effects models, we considered the situation that the distribution form of the random error term is unknown. Dirichlet process mixture (DPM) is used as the prior for the baseline distribution in that error terms used to be continuous random variables in most situations.
2.1 Linear mixed effects model
With a longitudinal data set , we posit a mixed effects model with an AR(1) serial correlation structure:
where with being the th response of the th subject for , is a vector of fixed effect parameters, a Gaussian random vector representing the subject-specific random effects, and are and design matrices linking and to , respectively, is an vector of model errors, is the autoregressive coefficient, and are i.i.d. noises. When is non-normal, we assume a mixture model:
where is the probability density function for a normal random variable with mean and variance and is an unspecified probability distribution of satisfying , which ensures that comes from a mean-zero mixture distribution.
Replacing the Dirichlet process by an equivalent Pólya urn representation,  employed an empirical likelihood approach with the moment constraints and developed a posterior adjusted Gibbs sampler for more precise estimation. The algorithm is computationally feasible.
2.2 Accelerated failure time model
We shift gears to study survival outcomes with a cluster structure. Denote the data set by , where is the failure time of the th subject in the th cluster and is a vector of associated covariates. To accommodate such data, we utilize a general accelerated failure time model:
where is a vector of -dim regression coefficients of interest and are independent random errors following the distribution with density .  posed an exponential tilt on the distributions of error terms to incorporate the cluster heterogeneity. That is,
where is a -dimensional prespecified functions containing potential covariate information and is the corresponding parameter vector with Thus, represents the parametric random effects in the model. Li et al.  place the DPM prior on the baseline density to develop a set of procedures which improves estimation efficiency through information pooling.
3. Beta process prior
We now present a nonparametric random effects model for the clustered survival data with nonparametric monotone link functions. We employ a beta process as the prior for the baseline function.
Let denote the failure time of the th subject in the th cluster, be the covariate vector for the subject, and be the potential censoring time to the th subject in the th cluster. Assume that is independent of the failure time . Let and let be the censoring indicator. Then the observed data can be described as
Within each cluster, is linked to via the following transformation model:
where are i.i.d. variables with a known density function and are unknown cluster-specific monotone functions, which are i.i.d. realizations of a random function and can be viewed as a nonparametric version of random effects for independent clusters. In a parametric setting, if we set with being a cluster-specific random effect, Eq. (6) reduces to a classical random effects model, which has been discussed in Section 2.2. The challenge, however, lies in how to draw inferences in such a nonparametric setting.
To proceed, let the coefficient vector be a -dim unknown vector of interest. We further assume ′s are differentiable with derivative , and then the likelihood based on the observed data is
Here is the survival function of defined by .
We develop a Bayesian inference procedure based on model (6). We assume that the regression coefficient follows a normal prior:
where is the dimensional identity matrix. Since is assumed differentiable, we model it with a kernel convolution:
where is an increasing function and is the zero-mean normal distribution with variance . Hence, the derivative of is
with This actually mimics the idea of DPM to smooth beta process by convolution.
We are in a position to select an appropriate stochastic process used as the prior of . Beta process, as studied by [16, 17], is an ideal candidate for the prior of a monotone function. Specifically, beta process with concentration parameter and a base measure is an increasing Lévy process with independent increments of the form
Teh et al.  showed that a sample from could be represented as
where and follows
In practice, we need to approximate samples of with a finite dimensional form. Since beta process can be represented by a stick-breaking process defined in Eq. (9), a natural approximation is obtained by retaining its first components. That is,
with . Denote and define
The approximated posterior based on the truncated DP is
The samples for and based on the posterior can be obtained with Markov chain Monte Carlo (MCMC) . In our simulation, we use the R-package MCMC (
4. An application to Alzheimer’s disease neuroimaging initiative
Alzheimer’s Disease Neuroimaging Initiative (ADNI) is a multisite cooperative study for the purpose of improving the prevention and treatment of Alzheimer’s disease. The subjects in the study fall into three groups, cognitively normal (CN) individuals, mild cognitive impairment (MCI) patients, and early AD patients. ADNI provides a rich array of patients’ information, including functional magnetic resonance imaging (fMRI), positron emission tomography (PET), longitudinal functional cognitive tests scores, blood samples, genetics data, and censored failure time outcomes. Details of the study can be found at
We focus on the MCI group. MCI is recognized as a transitional stage between normal cognition and Alzheimer’s disease. The failure time is defined to be the time that a MCI patient is diagnosed with AD, which will be censored if a MCI patient remains at the MCI stage at the end of the follow-up time. Wide heterogeneities are exhibited among the failure times, which may be due to demographics and a variety of functional clinical biomarkers, such as the brain areas of the hippocampus, ventricles, and entorhinal cortex. The goal of the analysis is to study the impact of risk factors on progression to AD.
Using the same data as analyzed by , we demonstrate our methodology by modeling the failure time (the observed time of AD diagnosis from MCI stage in year) of 281 MCI patients on gender (0 = female, 1 = male), years of education, the number of apolipoprotein E alleles (0, 1, or 2), and the baseline hippocampal volume.
As age is a strong confounder but the functional form of its impact has not reached consensus, we elect to model its impact nonparametrically. Specifically, we use age to form two strata (below and above the median age) and use model (6) to estimate the stratum-specific transformation functions and the effects of other covariates. For comparisons, we also fit model (6) with age as a continuous variable and with a common transformation function. That is, we do not assume the data are clustered. For both models, the regression errors ’s are assumed to follow an exponential distribution with mean 10. In our calculation, we approximate the BPs by a finite truncation with . We assume the precision parameter and scale parameter .
Figure 1 illustrates the estimated transformation function of the failure time without clustering. The posterior means (PM) and standard errors (SE) of the regression coefficients in the model are reported in Table 1 . We run the MCMC for 20,000 iterations with the first 4000 draws discarded as burn-in samples and use Geweke’s statistic to ensure the convergence of the chains.
The left curve is relatively flat, while the right curve has a sharper slope. This is consistent with the recognition that AD is an aging disease: elder people above a certain age threshold tend to progress faster from MCI to AD.
Both Tables 1 and 2 show that none of the biomarkers are significant, whereas they are statistically significant in the analysis of . One possible conjecture is that our nonparametric transformation functions may have well captured the effects of unobserved confounders, which may leave little to be explained by the observed covariates. More thorough investigation is warranted.
5. Future directions
Following , we can extend the transformation model (6) by allowing the error function to be unspecified. In this case, we need to specify the regression coefficient to obey some constraints such as or for identifiability. We will propose to model the error function using a Dirichlet processes mixture model:
where is a normal kernel with mean and variance and are samples from a Dirichlet process , where is the mass parameter and is the inverse gamma distribution with shape parameter and scale parameter .
In a slightly different context, we may also consider clustering observations by developing a new nested beta-Dirichlet process prior with companion MCMC algorithms. As there are limited works on functional random effects models that accommodate clustering structures observed, for example, from neural studies, we may propose a nested Dirichlet process  as the prior of Dirichlet process to cluster cumulative distribution functions successfully. We envision that such a nested Bayesian procedure will provide substantial computational expedience for practitioners and can certainly be applied to studies that cover beyond the neurodegenerative and aging diseases.
Shen’s research is partially supported by Beijing Natural Science Foundation 1192006 and National Natural Science Foundation of China; Liu’s research is partially supported by General Research Fund, Research Grants Council, Hong Kong, 15327216, and the Hong Kong Polytechnic University grant YBTR. Data collection and sharing for this project were funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research Development, LLC.; Johnson & Johnson Pharmaceutical Research Development LLC.; Lumosity; Lundbeck; Merck Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (