## 1. Introduction

Functional magnetic resonance imaging (fMRI) has increasingly become an important and popular modality that allows researchers to investigate brain activity resulting from a particular stimulus [1]. In an fMRI experiment, a subject is asked to perform a task by responding to a series of stimuli that may involve a motor, sensory or cognitive task, then the MR machine records the changes in the blood oxygen level dependent (BOLD) of the brain across different time points, resulting in three-dimensional fMRI time-series images. Numerous statistical models have been proposed to allow researchers to detect localized regions activated during a task, to describe the networks required for a particular brain function or to assess physical characteristics elicited by cognitive processes in the brain. However, the growth in the complexity, the significant size and the hierarchical structure of fMRI data make it difficult to comprise a fully efficient computationally feasible statistical model to accurately explain the temporal and spatial characteristics of the data.

The standard approach used on fMRI data is known as statistical parametric mapping (SPM) [2], which applied either a voxel-wise *t*-test or *F*-test statistics. In order to obtain an activation map of the brain, the next step is to threshold the test statistics at a given overall error rate that leads to a major multiplicity problem. The most common way of solving this problem is to use Gaussian random field theory [3]. This technique is based on the assumption of a stationary Gaussian random field, which may not be satisfied in fMRI settings. Another limitation is that most current methods ignore at least one of the spatial or temporal relationships between observations. Ignoring either spatial or temporal correlations in the model leads to seriously biased conclusions [4].

This article introduces a novel Bayesian modelling approach to fMRI data analysis. Bayesian approaches have great potential in applications because they allow a flexible modelling of spatial and temporal correlations in data [5]. We consider a Bayesian spatiotemporal model in a computationally feasible manner to detect brain regions that are activated by the external stimulus in fMRI. Accurate and powerful single-subject task-related activation models are required in order to develop effective imaging biomarkers, which constitutes the primary scientific problem. Moreover, the Bayesian paradigm provides an attractive inferential framework that can directly incorporate the physical characteristics of an experiment. Our goal is to put forward a model and inferential framework by which to investigate task-specific changes in the BOLD signal. Clearly, the model must account for the spatial relationships between the voxels, but there are other possible sources of variation that should not be ignored. Ultimately, we develop a hierarchical Bayesian model, which not only takes into account the spatiotemporal and temporal drift relationships in the data under consideration, but also easily investigates the role of specific regions that integrate brain activity to coordinate cognition and behaviour.

The applied Bayesian hierarchical approach contains several characteristics [4, 6]. Latent binary variables are introduced to indicate activation/inactivation of voxels. The spatial generalized linear mixed model (SGLMM) [7, 8] is considered to capture the spatial dependence of the latent binary variables. In addition, the autoregression (AR) model is used to model the temporal dependence of signal changes. Several studies have found that spatial dependence also appears in the temporal parameters in AR models [4, 9]. Thus, neglecting the spatial dependence of temporal correlations moderates the computational intensity, but the simplification may produce a biased estimation of the temporal coefficients and consequently may result in the spurious detection of brain activities [10]. Therefore, we also consider the spatial linear mixed-effects model to spatially regularize the AR parameters.

In fMRI data, the posterior inferences are based on the estimations of parameters; however, the posterior distribution is typically extremely large, and is unavailable in analytic form. Hence, we employ Markov chain Monte Carlo (MCMC) [11] sampling techniques that combine Metropolis-Hastings [12] schemes to generate samples from the posterior distribution for the purpose of performing inferential tasks. To reduce the computational burden and accelerate the sampling procedure, we parcel the brain so that sampling procedures can be performed in parallel. In addition, prior information on activation in the form of spatially informative variables, e.g. the grey and white areas of the brain, can be incorporated into the model.

## 2. Statistical modelling

In this section, we introduce the proposed statistical model in the analysis of fMRI data. To make inferences about task-related change in underlying neuronal activity, a general linear model is used to model BOLD signal changes for each voxel. In addition to the essential temporal dependence of BOLD signals in a voxel itself, the BOLD signal changes show spatially contiguous and locally homogenous among voxels [11]. Shmuel et al. [13] in the visual fMRI study have demonstrated BOLD response seems to be well approximated by separable spatiotemporal model. Thus, for computational convenience, we consider a Bayesian separable spatiotemporal model for BODL signal changes to simultaneously account for the temporal dependence in nearby time points and spatial dependence in local neighbouring voxels.

Let *T* × 1 column vector and denote the observed BOLD signals from a voxel *t*, *v* with a linear regression model defined as

where *r* × 1 autoregression coefficient vector, with *T* and a covariance matrix

For the purpose of detecting the activation of a voxel, a vector of binary random variables *v* is in response to a sequence of input stimuli. The voxel *v* is considered active to the stimulus *j* if

where *μ* and a variance

where *j* is considered to activate the voxel *v*. As George et al. [17] suggested, ^{4} to avoid the computational problems and we find that

Since the response at a particular voxel is likely to be consistent to the responses of neighbouring voxels, we apply SGLMM [7, 35] to capture the spatial relationship. We assume

where *v*_{th} row of *M*, an *M*, consist of the *q* principal eigenvectors with respect to the first *q* largest eigenvalues from the adjacency matrix *A* of voxels, an _{th} element in Eq. (5), defined as

where *s* and *v* are neighbours. In the three-dimensional fMRI data analysis, a 26-adjacent neighbourhood is considered [8]. Typically, *q* is less than *V*/2 or equal to the number of eigenvalues greater than 0.05. This choice can reduce the dimensionality significantly but still maintains the spatial structure of the data. The graph Laplacian matrix **1** is a

To avoid generating artefactual spatial structure in the posterior distribution, [19] suggested *X* is defined in just below Eq. (7)

It is also found that the temporal correlations between voxels tend similar [4]. Such spatial dependence is modelled by a spatial linear mixed-effects model. For computational convenience and simplicity, we make a transformation of

where *r*th order of the temporal parameters, and *M* and *Q* in Eq. (8) are the same as those listed in Eq. (4). Finally, we assume that , and . The values of *a*’s and *b*’s in the gamma or inverse gamma distributions are determined by the user to reflect the strength of one’s prior belief before observing the data.

## 3. Posterior inference

To explore parallel computation, which allows us to substantially speed-up expensive operations in the MCMC iteration, we partition the brain into non-overlapping areas such as rectangular three-dimensional lattices or Brodmann areas. We then carry out the statistical model in Section 2 to analyse the partitioned data. In this chapter, for simplicity, we use Brodmann’s map to parcel the brain into distinct regions before implementing the proposed model to the real fMRI data. In addition, separate regions divided by a data-drive segmentation procedure using functional clustering can be considered. Next, we introduce the likelihood function and the priors combining the parcellation procedure to form the posterior distribution for inference.

Suppose that a brain can be divided into *G* parcels, where the *g*th parcel contains *V*_{g } voxels,

and

Due to the intractability of the posterior, Gibbs and Metropolis-Hastings updates [20, 34] are applied to sample the posterior distribution for estimation and inference of the model parameters.

The posterior quantities of interest are *j* on the BOLD signal changes for voxel *v*. They can be directly estimated based on MCMC samples by

where *m*th sample from a total *M* MCMC samples for

The construction of the binary activation map is obtained by thresholding the posterior probability of *c* needs to be used in the detection of brain activity, that is, a voxel *v* is defined as active to a stimulus *j* if

## 4. Simulation studies

We conduct stimulation studies to investigate the performance of the proposed model on the detection of brain activity. We measure the accuracy of the classification of voxels as either active/inactive and show the proposed model to be robust with regard to different spatial structures. Finally, we illustrate the implementation of the proposed model to a simulated fMRI data that mimics a study in face repetition effects in memory tests [24].

### 4.1. Benchmark example

Consider a

where

We let *M* is a *A*, for the image corresponding to the first *v*_{th} row of *M*, and

We consider an AR(1) temporal correlation between time points within voxels. For each voxel *v*, let

We let

We consider a stimulus and the block and event-related designs with a total number of time points spanning 400 and repetition time (TR) equal to 2s. In the block design, the duration time is 20s. In the event-related design, the stimulus is randomly assigned. The stimulus function is convolved with a HRF modelled by a double gamma function [15] to create the design matrix **Figure 1(a)** and **(b)**, respectively.

Given _{th} element of

We then apply the proposed model to detect the activation areas, that is, to identify which voxel has

We obtained posterior activation maps by setting the posterior probability threshold at 0.8722, that is, the voxel is categorized as active if

**Table 1** shows the proposed method performs well with regard to detecting the active voxels where only a small number of active voxels are falsely identified as inactive in both designs. Over 10 replications, the model with consideration of spatial dependence performs better on the detection of activations. Especially, FPR reduces around 50%.

### 4.2. Structures with spatial dependence

To demonstrate that the proposed model is able to handle the different binary and temporal image spatial dependencies, we generate a dataset that is the same as the benchmark example in the paper of [27]. We create a **Figure 2(a)**, where the areas in red indicate that the voxels are active, and otherwise, they are not. Moreover, the values of *ρ*s are generated from a uniform distribution between –0.3 and 0.3 for the non-active areas. However, we assign the fixed values –0.5, 0.5, and 0.75 to *ρ* in each active area, respectively, as shown in **Figure 2(b)**. This is designed to investigate the patterns of temporal dependencies within the different areas. Similar to the settings discussed in Section 3.1, we consider both block and event-related designs with one stimulus. First, *U*(2, 5) and *U*(1, 3). The design matrix **Figure 2(c)**. By visually inspecting estimated values of *ρ*s in **Figure 2(d)** compared to the simulated ones in **Figure 2(b)**, we are confident that the spatial dependences also can be captured by the model.

After thresholding by a value determined by controlling FDR as 0.05, the accuracy of the classification of voxels measured by TCR is 99.25%, and the FPR is 4.84% with consideration of the spatial dependence of the temporal correlations over 10 replications. On the other hand, when the threshold is taken to be equal to 0.8722, TCR is 99.17% and FPR is 5.01%. Therefore, the proposed model can be applied to detect the activation of brain image data even for different spatial dependence structures.

### 4.3. Parcellation effect

In the previous simulations, image data was analysed together. In this simulation, we partition the data and then apply the proposed model to analyse each unit of the partitioned data. Our goal is to investigate the effect of the parcellation of an image on the estimation of active voxels. Given the binary image *γ* shown in **Figure 3(a)**, a unit of data is generated with the parameters being the same as the settings discussed in Section 3.1.

We consider four different parcellations, as shown in **Figure 3(a)**, ** (b)** and **(d)**. **Table 2** shows the values of different measurements over 10 replications. Little difference can be found on the measures between different parcellations. However, the computational time is dramatically reduced for case (d), where each parcellation contains fewer voxels. The same results in block and event-related fMRI experiments.

### 4.4. A real-life simulation example

To further demonstrate the capability of the proposed model to accurately identify the activation regions, we simulate an fMRI dataset from an **R** package: **neuRosim** [28]. The simulated data follows the example of event-related fMRI study [24]. We consider four different tasks denoted by N1, N2, F1 and F2, which are presented randomly in the experiment. The onsets for each condition are schematically shown in **Figure 4**. We then specify three areas that are activated by the tasks shown in **Figure 5**. The simulated four-dimensional fMRI data consists of 351 scans, each containing three-dimensional image of size

Now we apply the proposed model to analyse the simulated data. We split the array into disjointed arrays based on the Brodmann level regions. The voxel for all regions ranges from 236 to 969. We collect 1,000,000 samples to estimate the posterior probability of activation in each voxel. We consider the following probability:

which is used to denote whether the voxel *v* is activated by any of applied stimuli. Once the posterior probability is greater than 0.8722, then the corresponding voxel is considered as activated by a stimulus. The activation maps registered into MNI 152 template (a reference brain map from the Montreal Neurological Institute) and the areas considered active are in yellow as shown in **Figure 6**. In addition to comparing specified activation regions in **Figure 5** to those detected in **Figure 6**, we also calculate the three measures, TCR, TPR and FPR which are 97.71%, 99.34% and 3.25%, respectively. The results show that proposed model performs well on classifying the active and inactive voxels.

## 5. Application

We apply the spatial Bayesian variable selection approach discussed in Section 2 to real fMRI data, a study of the Simon effect [29]. In this Simon task, participants responded to one colour with the right hand, and to the other colour with the left hand. Congruence in this aspect means that a colour appears on the default side (congruent condition), or it may appear on the opposite side (incongruent condition). One participant’s brain image data is selected to be analysed for the purpose of illustration. The four-dimensional fMRI BOLD image data was pre-processed using FSL from Oxford Centre for Functional MRI of the Brain [30], including motion correction, realignment, slice timing correction, spatial smoothing and high-pass filtering, before being analysed using our model. Incorrect trials were removed. The four tasks in this experiment were convolved with a double gamma function. We divided the pre-processed data into 48 Brodmann areas. The statistics of interest were registered into a standard template MNI152 for ease of visualization. We used a first-order temporal autocorrelation throughout the study.

We consider a voxel to be active when the posterior probability of *c*, which is either a deterministic value of 0.8722 [4] or is determined by an FDR equal to 0.05 [22]. However, in this event-related fMRI, 0.8722 seems too conservative, so the value of *c* is determined corresponding to an FDR equal to 0.05. The activation maps for different tasks are shown in **Figure 7(a)**–**(d)**.

One interesting question is to compare active domains corresponding to congruency. Posterior probabilities helped us find the following inequality between the incongruent and congruent tasks groups:

where **Figures 8** and **9** when posterior probabilities are thresholded by a critical value corresponding to FDR equal to 0.05. The active regions include frontal and occipital lobes detected by the models with and without consideration of spatial dependence of temporal correlations. These results are consistent with other studies [31, 32]. However, more active areas were detected by the model without consideration of spatial dependence of temporal correlations. For something like the Simon task, it is expected that the lateral prefrontal cortex to be activated as they are parts of the cognitive control network/multiple demand network, while the medial prefrontal cortex is part of the default mode network [33]. Therefore, it is possible that the activation in the medial part may be some spread of activity and therefore false positives. This is an important example that illustrates considering the spatial dependence of temporal correlations may improve the estimation and reduce the some spurious detection of activation.

## 6. Discussion and conclusions

In this work, we applied a new approach to performing Bayesian variable selection with consideration of spatial dependencies from both regression and temporal coefficients in single-subject event-related fMRI data. Through simulations, improvement in the detection of brain activity was observed for a wide range of different spatial structures, parcellations and experimental designs. We found that the proposed approach potentially decreases false positive rates as shown in **Table 1** in the simulation and **Figure 9** in the real example. In addition, prior information from brain structure and function for subject-level inference can be incorporated into the analysis.

It is worth noting that the proposed Bayesian approach partitions a brain into several regions before implementing the model to the fMRI data. In addition to Brodmann areas, several parcellations of the brain into distinct regions are available, such as the separate regions divided by a data-drive segmentation procedure using functional clustering. From the simulation study, we found the computational burden to be greatly reduced by the joint use of parcellation and an SGLMM. In particular, the probability of activation and activation magnitudes were readily computed without requiring an adjustment for multiple comparisons in a post-processing step. For broader goals, it would be of interest to extend the model to group studies. In addition, we are confident that Bayesian approaches represent an important direction in fMRI and in high-dimensional research in general.