Bayesian Analysis for Random Effects Models

Random effects models have been widely used to analyze correlated data sets, and Bayesian techniques have emerged as a powerful tool to fit the models. How-ever, there has been scarce literature that systematically reviews and summarizes the recent advances of Bayesian analyses of random effects models. This chapter reviews the use of the Dirichlet process mixture (DPM) prior to approximate the distribution of random errors within the general semiparametric random effects models with parametric random effects for longitudinal data setting and failure time setting separately. In a survival setting with clusters, we propose a new class of nonparametric random effects models which is motivated from the accelerated failure models. We employ a beta process prior to tact clustering and estimation simultaneously. We analyze a new data set integrated from Alzheimer ’ s disease (AD) study to illustrate the presented model and methods.


Introduction
Random effects models have been widely used as a powerful tool for analyzing correlated data [1,2]. The model features a finite number of random terms acting as latent variables to model unobserved factors; see [3] for a comprehensive review. Some authors have further proposed semiparametric mixed effect models by allowing for infinite dimensional random effects [4,5]. Most of the aforementioned works draw inferences using frequentist approaches, while Bayesian approaches have been largely ignored because of the lack of computational feasibility and expediency. With the advent of the "supercomputer" era, Bayesian analyses have recently sparked much interest in the setting of random effects models for clustered data or longitudinal settings. However, there is scarce literature that has systematically reviewed the Bayesian works in the area.
By extending the traditional random effects models, recent research focus has shifted to study heterogeneous random effects or nonparametric distributions of random effects, which arise because of skewness of data, missing covariates, or unmeasurable subject-specific covariates [6]. The extended random effects models, termed semiparametric random effects models, improve statistical performance with added interpretability. Bayesian techniques, which provide a convenient means to model non-Gaussian distributions, have recently been proposed for semiparametric random effects model in a variety of settings ( [7,8], among others).
The discreteness of the Dirichlet process makes it impossible as a prior for estimating a density. However, as a remedy by convolving with a kernel, Dirichlet process mixture plays an important role [9].
For censored outcome data, transformation models, which transform the timeto-event responses using a monotone function and link them to the covariates of interest, have surged as a strong competitor of the Cox model [10]. Moreover, the transformation model framework is fairly general. The Cox model and the proportional odd model [11] can be viewed as nonparametric transformation linear models with some specific error terms; see [12][13][14]. For correlated data, the transformation model naturally extends the semiparametric random effects model by directly incorporating random effects to the transformation functions, treating them as realizations of an underlying random function. Bayesian analyses have found much use in this new area. For example, the beta process has been found to be a reasonable candidate for the prior of the monotone transformation function [15][16][17].
This chapter focuses on the Bayesian analysis of the transformed linear model with censored data and in a clustered setting. In many biomedical studies, the observations are naturally clustered. For example, patients in observational studies can be grouped in analysis according to a variety of factors, such as age, race, gender, and hospital, in order to reduce the confounding effects. Following Mallick and Walker [18], we explore using a mixture of beta distributions and the beta process as the candidates for the prior distribution of the random transformation function [17,19,20].
The rest of this chapter is structured as follows. Section 2 reviews the use of the Bayesian approach to infer parametric random effects models. In the setting of survival analysis, Section 3 proposes a beta process prior to fit random effects model with nonparametric transformation functions, and Section 4 applies the method to study the progression of Alzheimer's disease (AD). Section 5 concludes the chapter with future research directions.

Dirichlet process mixture prior
In parametric random effects models, we considered the situation that the distribution form of the random error term is unknown. Dirichlet process mixture (DPM) is used as the prior for the baseline distribution in that error terms used to be continuous random variables in most situations.

Linear mixed effects model
With a longitudinal data set Y i , x i , z i f g , we posit a mixed effects model with an AR(1) serial correlation structure: where y i ¼ y i1 , … , y in i T with y ij being the jth response of the ith subject for i ¼ 1, … , m, β is a p Â 1 vector of fixed effect parameters, b i a q Â 1 Gaussian random vector representing the subject-specific random effects, x i and z i are n i Â p and n i Â q design matrices linking β and b i to y i , respectively, w i ¼ w i1 , … , w in i ð Þ T is an n i Â 1 vector of model errors, ρ is the autoregressive coefficient, and ϵ ij0 s are i.i. d. noises. When ϵ ij È É is non-normal, we assume a mixture model: where φ Áju, σ 2 ð Þis the probability density function for a normal random variable with mean u and variance σ 2 and G is an unspecified probability distribution of u satisfying Ð udG u ð Þ ¼ 0, which ensures that ϵ comes from a mean-zero mixture distribution.
Replacing the Dirichlet process by an equivalent Pólya urn representation, [8] employed an empirical likelihood approach with the moment constraints and developed a posterior adjusted Gibbs sampler for more precise estimation. The algorithm is computationally feasible.

Accelerated failure time model
We shift gears to study survival outcomes with a cluster structure. Denote the data set by where T ij is the failure time of the jth subject in the ith cluster and X ij is a vector of associated covariates. To accommodate such data, we utilize a general accelerated failure time model: where β is a vector of p-dim regression coefficients of interest and ε ij are independent random errors following the distribution with density f i . [7] posed an exponential tilt on the distributions of error terms to incorporate the cluster heterogeneity. That is, where q t ð Þ is a q-dimensional prespecified functions containing potential covariate information and θ i is the corresponding parameter vector with θ 0i ¼ : Thus, θ i represents the parametric random effects in the model. Li et al. [7] place the DPM prior on the baseline density f 1 to develop a set of procedures which improves estimation efficiency through information pooling.

Beta process prior
We now present a nonparametric random effects model for the clustered survival data with nonparametric monotone link functions. We employ a beta process as the prior for the baseline function.
Let T ij denote the failure time of the jth subject in the ith cluster, X ij be the covariate vector for the subject, and C ij be the potential censoring time to the jth subject in the ith cluster. Assume that C ij is independent of the failure time T ij . Let be the censoring indicator. Then the observed data can be described as Within each cluster, T ij is linked to X ij via the following transformation model: where ε ij are i.i.d. variables with a known density function f ε Á ð Þ and H i t ð Þ are unknown cluster-specific monotone functions, which are i.i.d. realizations of a random function and can be viewed as a nonparametric version of random effects for independent clusters. In a parametric setting, if we set H i t ð Þ ¼ texp Àb i ð Þwith b i being a cluster-specific random effect, Eq. (6) reduces to a classical random effects model, which has been discussed in Section 2.2. The challenge, however, lies in how to draw inferences in such a nonparametric setting.
To proceed, let the coefficient vector β be a p-dim unknown vector of interest. We further assume H i 0 s are differentiable with derivative h i t ð Þ ¼ H 0 i t ð Þ, and then the likelihood based on the observed data is where Here S ε is the survival function of varepsilon defined by S ε s ð Þ ¼ P ε ≥ s ð Þ. We develop a Bayesian inference procedure based on model (6). We assume that the regression coefficient β follows a normal prior: where I p is the p Â p dimensional identity matrix. Since H i is assumed differentiable, we model it with a kernel convolution: where B is an increasing function and Φ σ is the zero-mean normal distribution with variance σ 2 . Hence, the derivative of H i is This actually mimics the idea of DPM to smooth beta process by convolution.
We are in a position to select an appropriate stochastic process used as the prior of B i . Beta process, as studied by [16,17], is an ideal candidate for the prior of a monotone function. Specifically, beta process BP γ, B 0 ð Þwith concentration parameter γ and a base measure B 0 is an increasing Lévy process with independent increments of the form Teh et al. [20] showed that a sample from BP γ, B 0 ð Þcould be represented as where p il ¼ Q l j¼1 ν il and θ il , ν il ð Þfollows In practice, we need to approximate samples of BP γ, B 0 ð Þwith a finite dimensional form. Since beta process BP γB 0 ð Þ can be represented by a stick-breaking process defined in Eq. (9), a natural approximation is obtained by retaining its first L components. That is, The approximated posterior based on the truncated DP is where The samples for β and ξ 1 , … , ξ n ð Þbased on the posterior can be obtained with Markov chain Monte Carlo (MCMC) [21]. In our simulation, we use the R-package MCMC (https://cran.r-project.org/web/packages/mcmc/index.html) to draw samples for ξ 1 , … , ξ n and β and use the Metropolis algorithm with a normal working distribution.

An application to Alzheimer's disease neuroimaging initiative
Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multisite cooperative study for the purpose of improving the prevention and treatment of Alzheimer's disease. The subjects in the study fall into three groups, cognitively normal (CN) individuals, mild cognitive impairment (MCI) patients, and early AD patients. ADNI provides a rich array of patients' information, including functional magnetic resonance imaging (fMRI), positron emission tomography (PET), longitudinal functional cognitive tests scores, blood samples, genetics data, and censored failure time outcomes. Details of the study can be found at http://adni.loni.usc.edu.
We focus on the MCI group. MCI is recognized as a transitional stage between normal cognition and Alzheimer's disease. The failure time is defined to be the time that a MCI patient is diagnosed with AD, which will be censored if a MCI patient remains at the MCI stage at the end of the follow-up time. Wide heterogeneities are exhibited among the failure times, which may be due to demographics and a variety of functional clinical biomarkers, such as the brain areas of the hippocampus, ventricles, and entorhinal cortex. The goal of the analysis is to study the impact of risk factors on progression to AD.
Using the same data as analyzed by [14], we demonstrate our methodology by modeling the failure time (the observed time of AD diagnosis from MCI stage in year) of 281 MCI patients on gender (0 = female, 1 = male), years of education, the number of apolipoprotein E alleles (0, 1, or 2), and the baseline hippocampal volume.
As age is a strong confounder but the functional form of its impact has not reached consensus, we elect to model its impact nonparametrically. Specifically, we use age to form two strata (below and above the median age) and use model (6) to estimate the stratum-specific transformation functions and the effects of other covariates. For comparisons, we also fit model (6) with age as a continuous variable and with a common transformation function. That is, we do not assume the data are clustered. For both models, the regression errors ε's are assumed to follow an exponential distribution with mean 10. In our calculation, we approximate the BPs by a finite truncation with L ¼ 20. We assume the precision parameter α ¼ 1 and scale parameter σ 2 $ 1=σ 2 . Figure 1 illustrates the estimated transformation function H of the failure time without clustering. The posterior means (PM) and standard errors (SE) of the regression coefficients in the model are reported in Table 1. We run the MCMC for 20,000 iterations with the first 4000 draws discarded as burn-in samples and use Geweke's statistic to ensure the convergence of the chains. Figure 2 illustrates the estimated transformation functions with age-stratified data, and Table 2 summarizes the posterior means and standard errors of the other regression coefficients.
The left curve is relatively flat, while the right curve has a sharper slope. This is consistent with the recognition that AD is an aging disease: elder people above a certain age threshold tend to progress faster from MCI to AD.
Both Tables 1 and 2 show that none of the biomarkers are significant, whereas they are statistically significant in the analysis of [14]. One possible conjecture is that our nonparametric transformation functions may have well captured the effects of unobserved confounders, which may leave little to be explained by the observed covariates. More thorough investigation is warranted.

Future directions
Following [12], we can extend the transformation model (6) by allowing the error function f ε to be unspecified. In this case, we need to specify the regression coefficient β to obey some constraints such as β 1 ¼ 1 or ∥β∥ ¼ 1 for identifiability. We will propose to model the error function using a Dirichlet processes mixture model: where φ tjμ, σ 2 ð Þis a normal kernel with mean μ and variance σ 2 and G are samples from a Dirichlet process DP α 1 , where α 1 is the mass parameter and IG Ája, b ð Þis the inverse gamma distribution with shape parameter a and scale parameter b.
In a slightly different context, we may also consider clustering observations by developing a new nested beta-Dirichlet process prior with companion MCMC algorithms. As there are limited works on functional random effects models that accommodate clustering structures observed, for example, from neural studies, we may propose a nested Dirichlet process [19] as the prior of Dirichlet process to cluster cumulative distribution functions successfully. We envision that such a nested Bayesian procedure will provide substantial computational expedience for practitioners and can certainly be applied to studies that cover beyond the neurodegenerative and aging diseases.
Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni. usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf. © 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.