The percentage of correct classification of voxels with a threshold at 0.8722 for considering the spatial dependence in (a) and without considering spatial dependence in (b) for temporal parameters in the model.
Bayesian modelling has attracted great interest in cognitive science and offered a flexible and interpretable way to study cognitive processes using functional magnetic resonance imaging data. In this chapter, a spatial Bayesian hierarchical model is applied to an event-related fMRI study of cognitive control using the Simon test. We consider a sparse spatial generalized linear mixed-effects model to capture the spatial dependence among activated voxels and temporal parameters and to benefit computationally by reducing dimensionality. We demonstrate that the proposed model has the capability of identification of the brain areas related to cognitive tasks. Moreover, the reduction in the false positive rate is observed in the simulation study, and the relevant brain regions involved in processing cognitive control are clearly detected in a real-life fMRI example.
- functional magnetic resonance imaging
- Markov chain Monte Carlo
- spatial generalized linear mixed-effects model
Functional magnetic resonance imaging (fMRI) has increasingly become an important and popular modality that allows researchers to investigate brain activity resulting from a particular stimulus . In an fMRI experiment, a subject is asked to perform a task by responding to a series of stimuli that may involve a motor, sensory or cognitive task, then the MR machine records the changes in the blood oxygen level dependent (BOLD) of the brain across different time points, resulting in three-dimensional fMRI time-series images. Numerous statistical models have been proposed to allow researchers to detect localized regions activated during a task, to describe the networks required for a particular brain function or to assess physical characteristics elicited by cognitive processes in the brain. However, the growth in the complexity, the significant size and the hierarchical structure of fMRI data make it difficult to comprise a fully efficient computationally feasible statistical model to accurately explain the temporal and spatial characteristics of the data.
The standard approach used on fMRI data is known as statistical parametric mapping (SPM) , which applied either a voxel-wise t-test or F-test statistics. In order to obtain an activation map of the brain, the next step is to threshold the test statistics at a given overall error rate that leads to a major multiplicity problem. The most common way of solving this problem is to use Gaussian random field theory . This technique is based on the assumption of a stationary Gaussian random field, which may not be satisfied in fMRI settings. Another limitation is that most current methods ignore at least one of the spatial or temporal relationships between observations. Ignoring either spatial or temporal correlations in the model leads to seriously biased conclusions .
This article introduces a novel Bayesian modelling approach to fMRI data analysis. Bayesian approaches have great potential in applications because they allow a flexible modelling of spatial and temporal correlations in data . We consider a Bayesian spatiotemporal model in a computationally feasible manner to detect brain regions that are activated by the external stimulus in fMRI. Accurate and powerful single-subject task-related activation models are required in order to develop effective imaging biomarkers, which constitutes the primary scientific problem. Moreover, the Bayesian paradigm provides an attractive inferential framework that can directly incorporate the physical characteristics of an experiment. Our goal is to put forward a model and inferential framework by which to investigate task-specific changes in the BOLD signal. Clearly, the model must account for the spatial relationships between the voxels, but there are other possible sources of variation that should not be ignored. Ultimately, we develop a hierarchical Bayesian model, which not only takes into account the spatiotemporal and temporal drift relationships in the data under consideration, but also easily investigates the role of specific regions that integrate brain activity to coordinate cognition and behaviour.
The applied Bayesian hierarchical approach contains several characteristics [4, 6]. Latent binary variables are introduced to indicate activation/inactivation of voxels. The spatial generalized linear mixed model (SGLMM) [7, 8] is considered to capture the spatial dependence of the latent binary variables. In addition, the autoregression (AR) model is used to model the temporal dependence of signal changes. Several studies have found that spatial dependence also appears in the temporal parameters in AR models [4, 9]. Thus, neglecting the spatial dependence of temporal correlations moderates the computational intensity, but the simplification may produce a biased estimation of the temporal coefficients and consequently may result in the spurious detection of brain activities . Therefore, we also consider the spatial linear mixed-effects model to spatially regularize the AR parameters.
In fMRI data, the posterior inferences are based on the estimations of parameters; however, the posterior distribution is typically extremely large, and is unavailable in analytic form. Hence, we employ Markov chain Monte Carlo (MCMC)  sampling techniques that combine Metropolis-Hastings  schemes to generate samples from the posterior distribution for the purpose of performing inferential tasks. To reduce the computational burden and accelerate the sampling procedure, we parcel the brain so that sampling procedures can be performed in parallel. In addition, prior information on activation in the form of spatially informative variables, e.g. the grey and white areas of the brain, can be incorporated into the model.
2. Statistical modelling
In this section, we introduce the proposed statistical model in the analysis of fMRI data. To make inferences about task-related change in underlying neuronal activity, a general linear model is used to model BOLD signal changes for each voxel. In addition to the essential temporal dependence of BOLD signals in a voxel itself, the BOLD signal changes show spatially contiguous and locally homogenous among voxels . Shmuel et al.  in the visual fMRI study have demonstrated BOLD response seems to be well approximated by separable spatiotemporal model. Thus, for computational convenience, we consider a Bayesian separable spatiotemporal model for BODL signal changes to simultaneously account for the temporal dependence in nearby time points and spatial dependence in local neighbouring voxels.
Let , be a T × 1 column vector and denote the observed BOLD signals from a voxel at time t, . Following , we model the BOLD response for a particular voxel v with a linear regression model defined as
where is the design matrix, each column of which consists of values obtained from an impulse stimulus function [15, 16] with respect to a task convolved with the hemodynamic response function (HRF); , is a vector and corresponds to the effect of stimuli on the BOLD signal changes. The temporal correlation is modelled by , being an r × 1 autoregression coefficient vector, with , a matrix of lagged prediction errors [2, 14]. We assume that the error terms
For the purpose of detecting the activation of a voxel, a vector of binary random variables is introduced to indicate whether the voxel v is in response to a sequence of input stimuli. The voxel v is considered active to the stimulus j if and, on the other hand, inactive if . Given that , we assume has a spike and slab mixture prior of two normal distributions  given by
where is denoted as a normal distribution with a mean μ and a variance . In Eq. (2), we let being fixed and assume to have an inverse gamma distribution
where . We set to be large resulting in the nonzero estimate of so that the stimulus j is considered to activate the voxel v. As George et al.  suggested, should be taken less than 104 to avoid the computational problems and we find that is a reasonable choice in our simulation studies and real fMRI example.
Since the response at a particular voxel is likely to be consistent to the responses of neighbouring voxels, we apply SGLMM [7, 35] to capture the spatial relationship. We assume are independently distributed in accordance to the Bernoulli distribution with a logistic link , where and is a constant intended to incorporate the expert knowledge or anatomical information and is a vector of spatial random effects. The spatial dependence between the binary variables is implicitly captured by assuming to have
where is vth row of M, an matrix consisting of multi-resolutional spatial basis vectors that are able to explain spatial variation sources. The columns of M, consist of the q principal eigenvectors with respect to the first q largest eigenvalues from the adjacency matrix A of voxels, an matrix with the th element in Eq. (5), defined as
where indicates s and v are neighbours. In the three-dimensional fMRI data analysis, a 26-adjacent neighbourhood is considered . Typically, q is less than V/2 or equal to the number of eigenvalues greater than 0.05. This choice can reduce the dimensionality significantly but still maintains the spatial structure of the data. The graph Laplacian matrix relates spatial basis vectors to represent the image data  and 1 is a column vector of ones. We assume a conjugate prior for the smoothing parameter given by
It is also found that the temporal correlations between voxels tend similar . Such spatial dependence is modelled by a spatial linear mixed-effects model. For computational convenience and simplicity, we make a transformation of such that and assume . For the spatial random effect for the temporal parameter is assumed to be
where is the spatial random effect for the rth order of the temporal parameters, and M and Q in Eq. (8) are the same as those listed in Eq. (4). Finally, we assume that
3. Posterior inference
To explore parallel computation, which allows us to substantially speed-up expensive operations in the MCMC iteration, we partition the brain into non-overlapping areas such as rectangular three-dimensional lattices or Brodmann areas. We then carry out the statistical model in Section 2 to analyse the partitioned data. In this chapter, for simplicity, we use Brodmann’s map to parcel the brain into distinct regions before implementing the proposed model to the real fMRI data. In addition, separate regions divided by a data-drive segmentation procedure using functional clustering can be considered. Next, we introduce the likelihood function and the priors combining the parcellation procedure to form the posterior distribution for inference.
Suppose that a brain can be divided into G parcels, where the gth parcel contains Vg voxels, and where we denote the voxel-level parameters as and the parcel-level parameters as . The posterior distribution is obtained by combining the priors and the likelihood , which are defined in Eqs. (9) and (10) as follows [8, 20]:
The posterior quantities of interest are for the activation map and for the magnitude of the effect caused by the stimulus j on the BOLD signal changes for voxel v. They can be directly estimated based on MCMC samples by
where and are the mth sample from a total M MCMC samples for and , respectively. The distribution of in a map provides a way to visually inspect brain regions with peak, high, low and practically no activation. In addition, in Eq. (11) offers researchers to view the strength of response in the brain to the stimulus.
The construction of the binary activation map is obtained by thresholding the posterior probability of . A threshold value c needs to be used in the detection of brain activity, that is, a voxel v is defined as active to a stimulus j if . However, threshold determination lacks agreement among investigators. Under certain conditions, a median selection criterion, , results in the minimum prediction risk . Smith et al.  and Lee et al.  defined a threshold by matching a Bayes factor approximation to a likelihood ratio test for activation such that at a 5% level of significance. Additional to both, Kalus et al.  applied the false discovery rate (FDR)  to determine a threshold. In practice, it would be reasonable to construct activation maps for a grid of thresholds and to evaluate elicited results with neuroscience experts. In our experience, activation maps seem to be comparably robust against an exact choice of the threshold.
4. Simulation studies
We conduct stimulation studies to investigate the performance of the proposed model on the detection of brain activity. We measure the accuracy of the classification of voxels as either active/inactive and show the proposed model to be robust with regard to different spatial structures. Finally, we illustrate the implementation of the proposed model to a simulated fMRI data that mimics a study in face repetition effects in memory tests .
4.1. Benchmark example
Consider a binary activation image , , generated independently from
We let , , where M is a matrix whose columns are principal eigenvectors of the adjacency matrix, A, for the image corresponding to the first largest eigenvalues. Moreover, is the vth row of M, and .
We consider an AR(1) temporal correlation between time points within voxels. For each voxel v, let and are generated from
We let and .
We consider a stimulus and the block and event-related designs with a total number of time points spanning 400 and repetition time (TR) equal to 2s. In the block design, the duration time is 20s. In the event-related design, the stimulus is randomly assigned. The stimulus function is convolved with a HRF modelled by a double gamma function  to create the design matrix . The stimulus functions and the corresponding design matrices for the block and event-related designs are shown in Figure 1(a) and (b), respectively.
Given , the BOLD signal changes are generated from a normal distribution with a covariance , where and the th element of is , and a mean of , where
We then apply the proposed model to detect the activation areas, that is, to identify which voxel has . We ran MCMC chains with 100,000 iterations in order to ensure the largest Monte Carlo standard error (MCSE)  of all posterior probabilities of , less than 0.01.
We obtained posterior activation maps by setting the posterior probability threshold at 0.8722, that is, the voxel is categorized as active if , and it is categorized as inactive otherwise . To measure the performance of our proposed model, we calculate the true classification rate (TCR), the true positive rate (TPR) and the false positive rate (FPR), which are defined as
Table 1 shows the proposed method performs well with regard to detecting the active voxels where only a small number of active voxels are falsely identified as inactive in both designs. Over 10 replications, the model with consideration of spatial dependence performs better on the detection of activations. Especially, FPR reduces around 50%.
|Block design (%)||Event-related design (%)|
|TCR||92.44 (90.28–96.67)||91.12 (91.82–96.00)|
|TPR||99.97 (99.94–100)||99.97 (99.94–100)|
|FPR||6.98 (3.45–8.75)||6.54 (3.54–8.66)|
|Block design (%)||Event-related design (%)|
|TCR||96.33 (95.89–97.56)||95.78 (95.12–97.50)|
|TPR||99.99 (99.97–100)||99.99 (99.97–100)|
|FPR||3.42 (1.12–5.27)||3.22 (1.32–4.78)|
4.2. Structures with spatial dependence
To demonstrate that the proposed model is able to handle the different binary and temporal image spatial dependencies, we generate a dataset that is the same as the benchmark example in the paper of . We create a binary image as shown in Figure 2(a), where the areas in red indicate that the voxels are active, and otherwise, they are not. Moreover, the values of ρs are generated from a uniform distribution between –0.3 and 0.3 for the non-active areas. However, we assign the fixed values –0.5, 0.5, and 0.75 to ρ in each active area, respectively, as shown in Figure 2(b). This is designed to investigate the patterns of temporal dependencies within the different areas. Similar to the settings discussed in Section 3.1, we consider both block and event-related designs with one stimulus. First, ’s generated from U(2, 5) and from U(1, 3). The design matrix and temporal correlation matrix are defined to be the same as in Section 3.1. Given , and the BOLD signal changes are generated from a normal distribution with a mean and a covariance . We carry out the proposed approach to detect the activation and estimate the parameters of interest. For ease of visualization of the results, we provide one of the estimated binary images without thresholding and the corresponding estimated image, as shown in Figure 2(c). By visually inspecting estimated values of ρs in Figure 2(d) compared to the simulated ones in Figure 2(b), we are confident that the spatial dependences also can be captured by the model.
After thresholding by a value determined by controlling FDR as 0.05, the accuracy of the classification of voxels measured by TCR is 99.25%, and the FPR is 4.84% with consideration of the spatial dependence of the temporal correlations over 10 replications. On the other hand, when the threshold is taken to be equal to 0.8722, TCR is 99.17% and FPR is 5.01%. Therefore, the proposed model can be applied to detect the activation of brain image data even for different spatial dependence structures.
4.3. Parcellation effect
In the previous simulations, image data was analysed together. In this simulation, we partition the data and then apply the proposed model to analyse each unit of the partitioned data. Our goal is to investigate the effect of the parcellation of an image on the estimation of active voxels. Given the binary image γ shown in Figure 3(a), a unit of data is generated with the parameters being the same as the settings discussed in Section 3.1.
We consider four different parcellations, as shown in Figure 3(a), (b) and (d). Table 2 shows the values of different measurements over 10 replications. Little difference can be found on the measures between different parcellations. However, the computational time is dramatically reduced for case (d), where each parcellation contains fewer voxels. The same results in block and event-related fMRI experiments.
|TCR||99.12 (98.61–99.84) %||99.07 (98.68–99.56) %||98.94 (98.03–99.64) %||99.08 (98.76–99.51) %|
|FPR||6.35 (1.51–9.97) %||6.68 (3.67–9.36) %||7.45 (3.12–13.87) %||5.82 (2.83–9.44) %|
|Running time ratio||8.75||2.75||2.75||1|
4.4. A real-life simulation example
To further demonstrate the capability of the proposed model to accurately identify the activation regions, we simulate an fMRI dataset from an R package: neuRosim . The simulated data follows the example of event-related fMRI study . We consider four different tasks denoted by N1, N2, F1 and F2, which are presented randomly in the experiment. The onsets for each condition are schematically shown in Figure 4. We then specify three areas that are activated by the tasks shown in Figure 5. The simulated four-dimensional fMRI data consists of 351 scans, each containing three-dimensional image of size and being collected in every 2 s resulting the total time for the experiment is 11 min and 42 s.
Now we apply the proposed model to analyse the simulated data. We split the array into disjointed arrays based on the Brodmann level regions. The voxel for all regions ranges from 236 to 969. We collect 1,000,000 samples to estimate the posterior probability of activation in each voxel. We consider the following probability:
which is used to denote whether the voxel v is activated by any of applied stimuli. Once the posterior probability is greater than 0.8722, then the corresponding voxel is considered as activated by a stimulus. The activation maps registered into MNI 152 template (a reference brain map from the Montreal Neurological Institute) and the areas considered active are in yellow as shown in Figure 6. In addition to comparing specified activation regions in Figure 5 to those detected in Figure 6, we also calculate the three measures, TCR, TPR and FPR which are 97.71%, 99.34% and 3.25%, respectively. The results show that proposed model performs well on classifying the active and inactive voxels.
We apply the spatial Bayesian variable selection approach discussed in Section 2 to real fMRI data, a study of the Simon effect . In this Simon task, participants responded to one colour with the right hand, and to the other colour with the left hand. Congruence in this aspect means that a colour appears on the default side (congruent condition), or it may appear on the opposite side (incongruent condition). One participant’s brain image data is selected to be analysed for the purpose of illustration. The four-dimensional fMRI BOLD image data was pre-processed using FSL from Oxford Centre for Functional MRI of the Brain , including motion correction, realignment, slice timing correction, spatial smoothing and high-pass filtering, before being analysed using our model. Incorrect trials were removed. The four tasks in this experiment were convolved with a double gamma function. We divided the pre-processed data into 48 Brodmann areas. The statistics of interest were registered into a standard template MNI152 for ease of visualization. We used a first-order temporal autocorrelation throughout the study.
We consider a voxel to be active when the posterior probability of is greater than c, which is either a deterministic value of 0.8722  or is determined by an FDR equal to 0.05 . However, in this event-related fMRI, 0.8722 seems too conservative, so the value of c is determined corresponding to an FDR equal to 0.05. The activation maps for different tasks are shown in Figure 7(a)–(d).
One interesting question is to compare active domains corresponding to congruency. Posterior probabilities helped us find the following inequality between the incongruent and congruent tasks groups:
where is the magnitude of corresponding effect on the BOLD signal changes. The posterior activations are shown in Figures 8 and 9 when posterior probabilities are thresholded by a critical value corresponding to FDR equal to 0.05. The active regions include frontal and occipital lobes detected by the models with and without consideration of spatial dependence of temporal correlations. These results are consistent with other studies [31, 32]. However, more active areas were detected by the model without consideration of spatial dependence of temporal correlations. For something like the Simon task, it is expected that the lateral prefrontal cortex to be activated as they are parts of the cognitive control network/multiple demand network, while the medial prefrontal cortex is part of the default mode network . Therefore, it is possible that the activation in the medial part may be some spread of activity and therefore false positives. This is an important example that illustrates considering the spatial dependence of temporal correlations may improve the estimation and reduce the some spurious detection of activation.
6. Discussion and conclusions
In this work, we applied a new approach to performing Bayesian variable selection with consideration of spatial dependencies from both regression and temporal coefficients in single-subject event-related fMRI data. Through simulations, improvement in the detection of brain activity was observed for a wide range of different spatial structures, parcellations and experimental designs. We found that the proposed approach potentially decreases false positive rates as shown in Table 1 in the simulation and Figure 9 in the real example. In addition, prior information from brain structure and function for subject-level inference can be incorporated into the analysis.
It is worth noting that the proposed Bayesian approach partitions a brain into several regions before implementing the model to the fMRI data. In addition to Brodmann areas, several parcellations of the brain into distinct regions are available, such as the separate regions divided by a data-drive segmentation procedure using functional clustering. From the simulation study, we found the computational burden to be greatly reduced by the joint use of parcellation and an SGLMM. In particular, the probability of activation and activation magnitudes were readily computed without requiring an adjustment for multiple comparisons in a post-processing step. For broader goals, it would be of interest to extend the model to group studies. In addition, we are confident that Bayesian approaches represent an important direction in fMRI and in high-dimensional research in general.
This research was, in part, supported by the Ministry of Education, Taiwan, R.O.C. and National Cheng Kung University (the Aim for the Top University Project to National Cheng Kung University, NCKU). We thank the Mind Research and Imaging Center (MRIC) at NCKU for consultation and instrument availability.