Application of ICA and Dynamic Mixture Model to Identify Microvasculature Activation in fMRI

The emphasis of this work is on developing novel data-processing techniques to achieve a higher spatiotemporal resolution in dynamic functional magnetic resonance imaging (fMRI). Due to partial volume effects, a pixel in fMRI may contain signals from a mixture of micro- and macrovasculature, with very different temporal characteristics. This mixture effect provides a way to separate microvasculature from macrovasculature in fMRI. A multi-component model representing a mixture of many reference functions is used to fit the time course of pixels in fMRI. The results suggest that it may be possible to separate the micro- and macrovasculature fractional contributions to pixels by this approach. Compared to the classical single-component model, the multi-component model fits the measured fMRI time course with a higher correlation coefficient and also detects voxels with low latencies more efficiently. Spatial independent component analysis (ICA) as a preprocessing step is implemented to remove major physiological noise and artifacts. The results of mixture model fitting after ICA cleaning show better results for microvasculature detection.


Introduction
Functional magnetic resonance imaging (fMRI) is the most widely used modality to map brain function because it can be easily implemented, is noninvasive, and has a relatively high spatial resolution. The dynamic fMRI signal change is regulated by the local changes in cerebral blood flow (CBF), cerebral blood volume (CBV), and blood oxygenation. CBF studies have suggested that a local increase in oxygen delivery beyond metabolic demand occurs in active cerebral tissue, which results in a higher concentration of oxygenated blood and a decrease in deoxyhemoglobin concentration within the microvasculature of metabolically active brain regions. Due to the four unpaired electrons, deoxyhemoglobin maintains a larger observed magnetic susceptibility effect and is paramagnetic relative to oxyhemoglobin and the surrounding brain tissue. The decrement in this paramagnetic substance in the activated brain leads to an increase in the local magnetic homogeneity and reduces dephasing of spins. This increases the T2* contrast in the activated brain and results in increases of MR signal relative to the resting state. A fast MRI data acquisition sequence known as the echo-planar imaging (EPI) sequence is commonly used to acquire fMRI signals. The physiological contributors to the fMRI signal changes include the blood-oxygenation-level-dependent (BOLD) and in-flow effects such as the increase in local CBF and arterial oxygenation. The signal in the functional area reflects the local changes in the CBF and oxygen consumption rate due to the task or stimulus [1]. And finally, the quantitative fMRI image indicates the spatiotemporal mapping of the hemodynamic in response to a given task at specific brain areas.
The coupling between the BOLD hemodynamic effect and the underlying neuronal activity has been studied and emphasized recently [2][3][4]. The first question is whether the BOLD effect can reflect neuronal activation. Experiments have been done with both animals and humans to verify that the BOLD contrast directly reflects the neural responses elicited by a stimulus [5,6]. The second question is how the BOLD signal reflects the underlying neuronal activation. The exact nature of the neurovascular coupling is not known yet. The studies by Logothetis suggest that the BOLD signal is more likely to reflect the input and local neuronal processing in a given area [5], whose weighted average of dendro-somatic components is measured as the local field potential (LFP). However, because of the slow-brain hemodynamics and the draining effects of vessels and veins, the BOLD activation detected in fMRI is temporally delayed and spatially blurred from the actual site of neuronal activation. The third question is then how to detect the neuronal activations from fMRI. Because of the unknown nature of the neurovascular coupling, how to detect neuronal activation remains an open question. Since neuronal activation originates in tissue subserved by the microvasculature, the detected microvasculature will be co-localized or at least closer to neuronal activation.
The fMRI BOLD effect originates within the microvasculature but also spreads into veins that drain blood from the activated brain tissue. And fMRI-based BOLD contrast consists mainly of activations in the microvasculature, large venules, and draining veins [7][8][9][10]. Because the BOLD signal is largely contaminated by the signals in large veins and noise, extracting earlier microvasculature activation is difficult and several issues need to be resolved. One major problem is the compounding effects from the physiological cardiac and respiratory noise, random noise, and also the contamination of head and vessel motion artifacts [11]. The percentage signal changes triggered by the stimuli typically is 1-10% in 1.5-3 T scanners [7]. Averaging scans for all events can improve signal-to-noise ratio (SNR) in fMRI by canceling random noise. Low-pass and high-pass filtering for the data can also improve SNR by removing the slow physiological processes such as subject habituation, learning or fatigue, subject motion, machine calibration drift, and scan-to-scan baseline variability [12]. However, artifacts in fMRI are often correlated with the signal of interest. Thus, classical average and filtering methods are not very effective. Noise-removing methods that are based on the intrinsic structure of the measured signals are more effective.
Another challenge is the partial volume effect (PVE) within one fMRI voxel. Because of the relatively large size of the voxel at the scale of mm compared to the size of veins and microvasculature, a mixture of micro-and macrovasculatures is present in the activated voxel with different temporal characteristics. Since the actual site of neuronal activity could be masked by signals from macrovasculature, a technique to separate micro-and macrovasculature within a voxel would be of great significance to fMRI to improve spatial specificity as well.
The vascular contributions to the BOLD signal depend on magnetic field strength as well as on data acquisition methods. Many previous works have been done to enhance the detection of microvasculature. In Chen and Ugurbil [13], a higher field at 7 T was used to increase the relative contribution of microcomponent to the BOLD signal. In spin-echo fMRI [14], large vessel contributions were suppressed because the 180°radiofrequency (RF) pulse in spin-echo (SE) sequence refocused the dephasing effect of the static field inhomogeneity around large vessels. A fast response that may be attributed to an increased oxygen consumption had been observed [15,16]. This fast dip might be more sensitive to microvasculature. Also, previous approaches to separate the microvasculature have relied upon post-processing techniques that utilize the fact that the phase of the MR signal often reflects the presence of larger vessels in a voxel [17,18]. Thus, larger vessels could be removed in the frequency domain or K-space. Our group has presented a study of segmenting fMRI pixels into microvasculature, venules, and large veins using intensity, phase, and temporal delay as features [17].
Independent component analysis (ICA) was first applied to fMRI in 1998 by McKeown et al. using INFORMAX [19] and has been shown to be superior to principle component analysis (PCA) in determining the spatial and temporal extents of task-related activation. ICA can also be used to identify the nontaskrelated components, such as physiological noise and movement artifacts. Initially, ICA methods assumed that the sources were naturally occurring sources and mostly had a super-Gaussian probability density function. Later on, the super-Gaussian assumption was expanded to a combination of super-Gaussian and sub-Gaussian distribution assuming that the source distribution was either sub-Gaussian or super-Gaussian [20]. Recently, a mixture density model for the sources has been proposed that enables the unknown sources to have a flexible density distribution [21]. The advantages of ICA over PCA, the correlation of spatial ICA and temporal ICA to fMRI, and some other issues have been discussed in many papers for the past decade [22,23]. In this study, ICA is implemented as an advanced preprocessing step in fMRI activation detection to remove artifacts by identifying and then removing some unrelated noisy components. ICA can also be used to identify temporally independent sources by implementing temporal ICA to fMRI signals within the region of interest (ROI). Sources identified by temporal ICA provide extra information regarding the segmentation of microvasculature and macrovasculature mixtures within one voxel.
Temporal characteristics of the BOLD response had been investigated by using a series of time-shifted reference functions [7,24]. A better localization of the activated sites and temporal relationships among different brain regions within selected clusters of activated voxels was achieved using this dynamic correlation method. But this dynamic fitting used only a one-reference function at a time. Our method is to use a multi-component model representing a mixture of many vascular components to account for partial volume effect within one voxel [25,26]. Because of physiological and random noises in the fMRI signal, the multiple components fitting of the dynamic mixture model can be further improved with both spatial and temporal ICA methods to improve SNR. Our purpose is to implement dynamic fitting in the proposed mixture model to account for different temporal characteristics of vascular components and to improve SNR with ICA integration for better microvasculature detections and a higher spatiotemporal resolution.

Experiment
To test the methodology, an Institutional Review Board (IRB)-approved human study was conducted with fMRI on two normal subjects aged 25 and 40 years.
A 480-volume of event-related EPI was acquired on a GE 1.5 T LX system from two continuous slices (i.e., two images per volume) through the visual cortex. The stimulus was a reversing checkerboard flashing with a 2-Hz frequency for 2 s every 20 s. The pulse repetition time TR = 275 ms, effective echo time TE = 45 ms, 45°flip angle, 64 Â 128 acquisition matrix, and 20 Â 40 cm field of view. A total of seven events were acquired.

Model
A multi-component reference function with a variable latency and a variable time separation between adjacent components was fitted to the time course of each voxel within the visual cortex, as shown in Eq. (1) where y is the normalized time course of a voxel, n is fMRI noise, N is the number of component, s i is the ith component, a i is the contributions or the mixture coefficient of s i in y, T is the number of time points in the time course.
Each vascular component is modeled by a reference function with a latency parameter (2): where X(t) is the reference function to best represent BOLD response, and T i is the latency parameter for the ith component to account for delay. Since latency is the most important and influential parameter in dynamic fitting, a dual-component model was investigated in this chapter for simplicity.

Estimation algorithm
Assuming the noise in fMRI is Gaussian white noise and the components (or mixtures) can be explicitly modeled by a series of reference functions, there are several ways to estimate the mixture coefficient and the latency of each component.
A non-negative least square (NNLS) solver [27] can be used to estimate the contribution coefficients of each component after normalizing both the time course and the components. At each iteration, only the column of S where the associated entry of A > 0 was used for least square estimation as in Eq. (3) If the non-negative constraint is removed from the estimation, then a standard minimum norm method can be used to estimate the contribution coefficients of each component. The model falls in the general linear model (GLM) fitting problem [28]. Thus, the estimation of the coefficient and hypothesis testing for the estimation can be done using Eq. (4) Recently, a first-order Taylor approximation for the temporal derivative of the reference function is used to estimate the delay of the fMRI response and the latency difference in different regions [29,30]. Assuming that there is a slight time delay T 0 between the reference function and the measurement, the delay T 0 can be estimated as listed in Eq. (5) where r t ð Þ is a one-reference function and r • t ð Þ is the temporal derivative of the reference function. Both are used as two basis functions in a GLM. The betaparameters β 1 and β 2 are estimated using the GLM algorithm. In case of a dualcomponent model, the derivative of only one component or the derivatives of both components are tested.
After the mixture coefficients are estimated for any combination of two (or more) different reference functions, the combination of the two-reference functions that has the minimum fitting error or a maximum correlation coefficient with regard to the original time course of each voxel is the estimate of the two components with different latencies.
To account for the relatively small microvasculature signal compared to veins at 1.5 T, a weighting factor can be used to estimate the relative fractions of micro-and macrovasculature inside a voxel from the fitted coefficients. For two components, assume is the estimate of fraction coefficient from each component in one voxel using NNLS method, and is the weighting factor for each component. Then, the percentage contribution of each component in this voxel is computed as in Eq. (6)

Simulation
In Eq. (2), each component comes from a reference function with certain latencies. The reference function mimicking the BOLD response is represented by the convolution of the stimuli function and the hemodynamic response function (HRF), assuming that the brain response is linear to the input (7) HRF is the brain response to an impulse stimulus and is modeled as the difference between two gamma functions as in Eq. (8) [31] h t; τ 1 ; where τ 1 controls the rising time to peak, τ 2 controls the peak time of the undershoot, δ 1 , δ 2 determine the dispersion of the two peaks, and c controls the influence of the undershoot.
Firstly, the influences of the HRF parameters τ 1 , τ 2 , δ 1 , δ 2 , c and the reference function latency parameter T 0 were studied. These parameters were in the range of as listed in the study: [24]. Then, the reference function with all these parameters was fitted to one time course in the activated brain. The correlation coefficient between time course and the reference function as a function of shape parameter δ 1 and delay parameter τ 1 at one latency parameter T 0 is used as a criterion for optimization, similar to the dictionary-based finger-printing method. Except for the latency parameter T 0 , all the other parameters of HRF are found to have a minor influence on the correlation coefficient, and thus, only the latency parameter is used as a variable for each reference function in this work. And HRF parameters are the same as in SPM software [28].
Secondly, a Monte-Carlo study was conducted to test the fitting algorithm and to study the influence of noise on the latency estimations. The simulated time course was a mixture of one-or two-reference functions at different latencies from a series of reference functions. The mixture coefficient W i , i ¼ 1, 2 of each reference function (or component) had a uniform distribution of W i $ U 0; 1 ½ . A Gaussian white noise was added to the mixed time course with different SNR. The latencies of the components were estimated by different GLM and NNLS with or without derivative algorithms. The sampling step for the reference functions was dt ¼ 105 ms in the case studied based on maximal temporal resolution that fMRI could achieve. The process of adding random Gaussian noise to the mixture of one or two components with a random uniform coefficient was repeated 1000 times for each SNR. The SNRs were tested at level from 1 to 10, 20, and infinite which is noise-free. The results were obtained for a traditional one-reference function condition and a mixture of two-reference function condition.
For the simulated time course coming from one-reference function case, the tested algorithms are GLM method for one component and one derivative (i.e., two basis functions), GLM method with only one component, and NNLS method with only one component. The results show that the estimation is unbiased for both NNLS and GLM methods for all SNRs, and the standard deviation (STD) for the estimation is relatively small (less than 100 ms) for both methods at SNR larger than 3. For the GLM plus the derivative component method, the estimation error is non-zero for larger SNR. This is because the method uses the first-order derivative as an approximation, assuming that the delay is very small and the assumption is not always valid. The result is consistent with Hensen [29]. So only, the GLM and the NNLS without derivative were tested for the mixture of two components.
For the case in which the simulated time course came from two mixed reference functions, the latency of first component and separation of the two reference functions were estimated. First, only the latency of the first component was estimated and the separation of the two reference functions was initialized and fixed. Then, the separation of the two reference functions is also set as a variable. The Monte-Carlo simulation shows that both fixed and variable separations between two reference functions give a small bias in the estimation of latency as a function of SNR in case of mixture fitting. However, the NNLS estimation algorithm produces smaller bias than GLM. Also, a variable separation gives a higher STD than a fixed separation for latency estimation. Therefore, NNLS with a fixed separation is used for this work.

ICA denoise preprocessing
To improve the fitting using the multi-component model, spatial ICA (SICA) was implemented first to improve SNR. Temporal ICA (TICA) had also been applied to the cleaned data within a region of interest to extract the possible intrinsic temporally independent sources. TICA has also been used on functional MRI by several groups [32,33].
In SICA, the assumption is that all the intrinsic spatial independent components are mixed temporally and measured at different time (which has the same meaning as "channel"). In order for spatial ICA to work, the measured fMRI EPI 2D or 3D image will be transformed to 1D vector in the same order at each time. The whole fMRI data are formulated as a 2D matrix: X ij , i ¼ 1, 2, ⋯, N; j ¼ 1, 2, ⋯, V. N is the number of EPI volumes and V is the number of voxels in each volume. Assuming S MÂV are the M independent components, the independent components are mixed in the following way (9): where W À1 NÂM is the mixing matrix and W NÂM is called the unmixing matrix. In order to get a good estimation of unmixing matrix and source components, the number of samples or voxel number (V) and the number of sources (M) should satisfy V≥M * M þ 1 ð Þ=2. The number of sources should not exceed the number of channels: M≤N [19]. In the ICA algorithm, the number of sources by default is set to be the number of channels (time points in case of spatial ICA and voxel number in case of temporal ICA). The source numbers are usually very large and can increase the computational complexity and lead to unstable solution [21]. One way to solve this problem is to estimate the number of sources (or model order) using the probability PCA such as Bayesian information criterion (BIC) [34].
In this chapter, we used PCA to estimate the number of the sources (M) in the data based on the eigen decomposition of the covariance matrix of the data. The number of components is estimated to maintain >95% of non-zero eigenvalues [33] to contain a majority of data information. After PCA preprocessing, the data that maintain the first M largest components were used for the spatial ICA decomposition using the ICA INFORMAX software [35]. The unmixing matrix and independent components are obtained as the output.
Three features are extracted for each independent component (IC) in order to select the artifacts components: (1) Spatial ICA map obtained by superimposing activated voxels on the anatomy for the ith IC, S i , i ¼ 1, 2, ⋯, 30. Each IC is scaled by the variance after removing mean: The active voxels are selected such that Z j j ≥ 1:96 corresponds to statistical p = 0.05. To clean the data, the noise independent components are removed by setting the associated columns of the noise components in the mixing matrix to be zero. Data are reconstructed from the possible signal components as shown in Eq. (10) 3. Results and discussion

Microvasculature estimation before ICA cleaning
Microvasculature estimation based on the methods described was applied to the original data and the data after ICA cleaning. The histogram of voxels was detected as a function of latency in steps of TR = 275 ms for the single component ( Figure 1). The histogram was fitted by a Gaussian distribution with the estimated mean and standard deviation. Since pixels containing mostly microvasculature would have a shorter latency among all detected voxels, the time separation from the peak of the Gaussian to its baseline on the left side would be a reasonable estimate of the time separation between the micro-and macrocomponents. The peak level was 22 (number of pixels) and Gaussian baseline is chosen at 10% of peak level which was 2.2. These correspond to indexes of 20 and 12, respectively, in units of TR. Therefore, a separation of 8*TR = 2.2 s was selected between the components of the twocomponent model. Figure 2 shows the histogram of dual-component models using separation time = 2.2 s. The histogram is a combination of two Gaussian distributions. The latency boundary of micro-and macrovascular classes is chosen based on the separation between two classes. The vertical line at $15 shows the separation boundary ( Figure 2). Figure 3a shows the voxels (numbering 34) localized from fitting indexes 2-15 with earlier latency (latency up to 15, Figure 2) and has >50% fractional   Figure 3b. Figure 3c shows the distribution of voxels indexed with a high latency (after 15 shown in Figure 2) likely to be veins. The relative contributions of the two components in these voxels are plotted in Figure 3d. In Figure 3c, a large vein structure can be seen that may contain a mixture of two macrovasculature components. In Figure 3a, the microvasculature estimated in the V5 region (marked by circle) is in gray matter, though a couple of pixels are likely to be macrovasculature and thus contain two vascular components as shown in Figure 3b. For macrovasculature voxels estimated in Figure 3c, since there might still be two vascular components (venules and veins) with different latencies, the fractional contributions shown in Figure 3d were not equally distributed as in Figure 3b.

Microvasculature estimation after ICA cleaning
To further improve the mixture model, ICA is used as a preprocessing operation for denoising. PCA was used to estimate the number of the sources, and the number of components was chosen to be 30 ( Figure 4) that contains ≥95% data variation and information. After PCA preprocessing, the data that maintain the first 30 largest components were used for the spatial ICA decomposition using the ICA INFORMAX software. Figure 5 shows the features of a one-source component. The first row is the spatial map of the 15th IC. V1, V2, and V5, expected to be activated, can be seen in the spatial map. The second row is the associated time course and the averaged time courses of original data. The associated time course matches well with the averaged original time course. The correlation coefficient between the associated time course and the reference function is 0.4 with P < 0.0001. The third row is the PSD of the associated time course shown in the unit of Hz. Since the stimulus is presented every 20 s, the corresponding frequency is 1/20 s = 0.05 Hz. The peak at 0.05 Hz can be seen in the PSD; however, there are also some large peaks around 0.1 Hz and lower frequencies that may come from the alias of the physiological noise. This component is mostly likely to be task-related based on the high CC of 0.4 and a distinct peak at 0.05 Hz in PSD. Figures 6 and 7 show two examples of components attributed to physiological noise. For instance, the source that is most likely from the heart-beating with a dominant peak in 1.2 Hz is shown in Figure 6, and the source that is from breathing and heart beating activation in the ventricles with distinct frequencies at 0.27 and 1.2 Hz as in Figure 7. Figure 8 demonstrates an example of the motion artifact component. The associated time course shows a gradual drift along time. This component is likely to be movement-based lowfrequency drift. The activations have a "ring-like" spatial distribution that is coming from head movement.  Eight noise components were identified based on the three features, and data were reconstructed by removing these components. We applied both multicomponent model and TICA to the original data and the data after ICA cleaning to the visual cortex. Dynamic mixture model was used to fit the data after ICA cleaning. The same time separation, 2.2 s, of "before ICA" was used for "after ICA" fitting.  Another noisy component from both breathing and heart beating with distinct frequencies at 0.27 (from breathing) and 1.2 Hz (from heart beating). Figure 9 shows the histogram of a dual-component model using component separation time = 2.2 s after ICA cleaning. The separation of micro-and macrovascular classes was $12. The shape of the Gaussian distribution is narrowed compared to Figure 2 before ICA. This is because ICA has removed the noisy voxels and thus the distribution is less Gaussian. Figure 10a shows the voxels (numbering 50) localized from low latency (up to 12, Figure 4) and has >50% fractional contribution from the earlier component. These voxels are likely to contain a microvasculature component. Figure 10b shows the relative fractional contribution of these components. Figure 10c shows the distribution of voxels indexed with a high latency (after 12 in Figure 9) likely to be veins. The relative contributions of two components in these later voxels are plotted in Figure 10d

Comparison of results before and after spatial ICA
The average correlation coefficient for the fitting after ICA cleaning has increased around 70% compared to the original fitting ( Figure 11). The number of voxels at an earlier latency (up to 15 in Figure 2 and up to 12 in Figure 9) also increased. The number of voxels that are most likely to be microvasculature has  increased from 34 to 50 ($50%) after ICA. The regions marked by a circle in Figure 10 identified microvasculature in V5 region on the left side which was missed by the estimation before ICA.
For all the estimated microvasculature, the fractional contribution coefficients of two components after ICA (Figure 10b) are the same, suggesting all the voxels are in the microvasculature. The fractional contribution coefficients of two components in the macrovasculature are different with venules and veins.

Comparison microvasculature estimation with temporal ICA
We have implemented further temporal ICA to the data after spatial ICA cleaning in the cluster that has a higher correlation (≥0.3) to the reference function.
The assumption is that the concurrent active voxels may still be mixed with different types of temporally independent components.
The number of components was set to be 10 based on the PCA of the cleaned data within the activated cluster. There is an associated spatial map for each temporal component that reflects the spatial contribution of the component. The spatial map of each temporal IC is shown in Figure 12. Compared to the micro and macrovasculature images, temporal IC #9 and IC #1 in Figure 12 have activation patterns similar to the macrovasculature image in Figure 10c, while the spatial map

Discussion
We have described a novel multiple-component model that takes into consideration vascular mixtures in the fMRI BOLD signal and partial volume effect and developed methods to estimate the contribution of each component. Experimental studies have shown that compared to the traditional single-component model, our method achieves a better match to the original time courses of fMRI and thus reduces the fitting errors. Another advantage of the method is that it allows us to estimate microvasculature. The microvasculature is closer to the site of neuronal activation and validated with the temporal ICA method, as expected [36]. Spatial ICA has been used as a preprocessing step in the mixture model to remove noise and improve the microvasculature detection with a higher CC and more voxels with lower latencies detected. The spatial and temporal distributions of all these noisy components were consistent with the results of other studies [32,34,37].
We use a series of reference functions to model the brain vascular components. Compared to the classical single-component model, the multi-component model fits the measured fMRI time course with a higher correlation coefficient and also detects voxels with low latencies more efficiently. Different vascular components will have different HRF shapes. Therefore, how the brain vascular components can be modeled more accurately needs to be investigated in the future. Also, the multiple reference functions are not orthogonal to each other; some de-correlation methods can be further implemented to improve the robustness of the fitting. Temporal ICA decomposition in the activated regions could overcome these problems with good spatial correspondence results between temporal ICA and mixture models. One limitation is that temporal independent assumption might not be fully satisfied in fMRI data since hemodynamic responses evolve with time [29].

Conclusion
In conclusion, we had used two new methods (i.e., ICA and dynamic mixture model) to improve microvasculature detection in fMRI that is closer to true neuronal activation and therefore improve the specificity of the fMRI microvasculature detection in both functional and structural ways [38]. Further integration and validation with other modalities such as EEG and PET are warranted in the near future. Further imaging of the full dynamic spatiotemporal multi-parametric functional and neurophysiological profile including BOLD microvasculature activation, couplings between BOLD and CBF/CBV, between BOLD, and oxygen extraction/ metabolism [39] are expected in the near future [40].