Technologies using electroencephalographic (EEG) signals have been penetrated into public by the development of EEG systems. During EEG system operation, recordings ought to be obtained under no restriction of movement for routine use in the real world. However, the lack of consideration of situational behavior constraints will cause technical/biological artifacts that often mixed with EEG signals and make the signal processing difficult in all respects by ingeniously disguising themselves as EEG components. EEG systems integrating gold standard or specialized device in their processing strategies would appear as daily tools in the future if they are unperturbed to such obstructions. In this chapter, we describe algorithms for artifact rejection in multi-/single-channel. In particular, some existing single-channel artifact rejection methods that will exhibit beneficial information to improve their performance in online EEG systems were summarized by focusing on the advantages and disadvantages of algorithms.
- electroencephalographic signal
- artifact rejection
- blind source separation
- signal decomposition
- non-negative matrix factorization
Variegated branching patterns and trends of sympathetic neurons for realizing the brain function/dysfunction have yet to be completely definitized so far. A functional neuroimaging technique of the human brain has established itself as a trustworthy visible tool to definitize indeterminate patterns and discover new functions . Indeed, visualized information through neuroimaging techniques has contributed building intuitive understanding and relative quantification of brain functions [2, 3].
Key benefits of the electroencephalographic (EEG) modality hold over other neuroimaging techniques (e.g., local field potential, near infrared spectroscopy, and electrocorticogram) are the high temporal resolution on the order of milliseconds, the small installation space for operating systems, and its usability in noninvasive recording . Although the spatial resolution and specificity are low because it observes the volume conduction effects in brain network , this has been attracted attention as a viable and inexpensive modality to study kaleidoscopic functional states of the cerebral cortex: where, when, how, and under what our brain functions come into being . Therefore, providing a capacity to adapt EEG systems to real environments is always a major challenge for neuroscientists and neuroengineers on the final stretch of constructing systems.
Using an extremely small number of electrodes (the single-electrode case would be an extreme case) for signal acquisition should result in better practical application in daily life. Recently, specialized (headband type or headset type) devices, which are endowed with small number of electrodes less than gold standard devices having 16, 32, 64, or more channels, have been developed as for compact, portable, and feasible EEG systems to use themselves in the real environments . The devices are usually implemented with dry electrodes and wireless sensor network technology for recordings. These can diminish the burden on the user caused by oppressive feeling in the head, eliminate the discomfort from conductive gel or paste, and improve degree of freedom of movements by doing away with wires plugged into an amplifier .
However, technical/biological artifacts, such as active power line interference, eyeblink, and muscle activity caused by recording mistake, good conductivity of the scalp, and so on, are often mixed with EEG signals whether the type of device is gold standard or specialized. They ingeniously disguise themselves as EEG components in observed EEG signals and cause a discrepancy between research motivation and system realization. Removing mimetic components (artifacts) or extracting intrinsic EEG components from observed EEG signals will become a more important process in all EEG systems for practical use even if single electrode is integrated with data acquisition module by a specialized device.
Disclosing the meaning of electric signals comprising various neuronal populations (sources) breaks down the EEG inverse (blind source separation (BSS)) problem . It is well known that the enormous indeterminacies in brain make the BSS problem ill-posed; however, statistical natures lead to restoring the well-posedness of the problem in a biosignal processing. By the properties, theoretically multivariate statistical analysis approaches like independent component analysis (ICA) can separate observed EEG signals into spatially and temporally distinguishable components effectively, and then, estimated components will be identified as neuronal or artifactual sources by hard/soft threshold to reconstruct artifact-free EEG matrix [10, 11]. Whereas there are several reviews on artifact rejection methods including overall procedure (signal separation, component identification, and signal reconstruction) for multi-channel EEG signals [12–16], we have never seen review of artifact rejection methods for single-channel EEG signals. In this chapter, we therefore describe algorithms for artifact rejection in multi-/single-channel EEG signals.
2. Concise description of technical/biological artifacts
2.1. Technical artifacts
Technical artifacts such as power line interference, impedance fluctuation, and wire movement superimpose their energy on observed EEG signals because of faults in setting conditions [18, 19]. These can be precluded from easy ways, detaching a charging AC adapter from the recording device, carefully attaching electrodes to the scalp, and using appropriated electrode wires or adhesive tapes to stabilize wires shown in Figure 1. The cross mark in the figure indicates detaching the source of technical artifact from the setting conditions.
2.2. Biological artifacts
Biological artifacts, which are discharged potentials of internal organs, diffuse their energy over the head and reach each electrode attaching on the surface of the scalp as observed EEG signal. They contaminate observed signals due to the iron accumulation in the brain and good conductivity of the scalp can be broadly separated into four categories: (i) muscular, (ii) cardiac, (iii) eyemovement, and (iv) eyeblink. EEG devices capture comprehensive electric field which was reached at an electrode even if the potential contains information of electrophysiological actions except neuronal one (see Figure 2). Because all electrical potentials will be equally and blindly treated, recording information including only EEG components from electrodes placed on the scalp is hardly realized. Furthermore, frequency characteristics of biological artifacts and neuronal oscillations could be overlapped. That means that shunning contact with biological artifacts may seem hopelessly difficult compared with technical artifacts. If contaminated epochs are found in visual or quantitative analysis, the EEG system has to ignore them before deciding control commands. Otherwise, the operator will make a fatal mistake in its system by counterfeit EEG patterns [12, 17].
Alternatively, signal processing techniques can extract EEG components from observed signals. Through this process, EEG systems would provide correct outputs for their unique and beneficial interface. Even today, many works for detection, classification, and removal of artifacts within observed EEG signals have been reported [20–22].
3. Review of existing methods on artifact rejection
In this section, the standard assumptions of observed cerebral signal for spatially and temporally separating components are described before introduction of artifact rejection methods to reach deep understanding of the statistical framework. Then, methods of multi-/single-channel artifact rejection (principal component analysis (PCA), independent component analysis (ICA), regression, filtering, ICA-based signal decomposition, and nonnegative matrix factorization) are presented. Each algorithm has specialized approaches for calculating demixing matrix, identifying separated components, and denoising the artifactual components to complete source separation. We have focused on the advantages and disadvantages of approaches.
3.1. In multi-channel signals
3.1.1. Standard assumption of sources
The first thing that all artifact rejection methods have to do is calculating demixing matrix W under the standard assumption of sources regardless of the target object. In EEG signal processing, the observed cerebral signal x(n) is considered as the sum of the cerebral source (local-field) activity s(n) and the noise/artifact d(n). Neuronal cells have limited their connection ability to short-range order (less than 500 μm) . Besides, synchrony in local-field activities diffuses through a contiguous cortical area rather than jump between distant and weakly connected cortical areas .
Therefore, an assumption that cerebral sources and non-cerebral sources are linearly combined, allows the following formulation of the underlying biophysics of the signal generation and propagation of the potential :
where: is the observed P-channel EEG data at the n-th point (superscript T means the transpose of a vector or matrix); is the Q unknown source data, in which each row means cerebral or non-cerebral source; A is the P × Q full-rank unknown mixing matrix; and is the P additive zero-mean noise data. In real scenarios, there are likely to be more sources than observations (Q > P); however, handing the number of sources the same as the number of observations (Q = P) does not normally become a fatal problem. Thus, most algorithms extract a linear combination of sources belonging to the same subspace [26, 27].
All algorithms have a common disadvantage that they can only handle over-determined mixture for the inverse process while having no priori information on the characteristics of the sources. Additional three assumptions are reluctantly accepted: (i) the noise/artifact is spatially uncorrelated with the observed data (, where is the expectation operator), and temporally uncorrelated (, where τ is lag time and ∀τ > 0); (ii) the number of sources is equal to or less than the number of observations (Q ≤ P); and (iii) the mixing matrix A is stationary .
3.1.2. Blind source separation algorithms
Under aforementioned assumptions, BSS approaches estimate sources from observed EEG data . Unsupervised learning methods such as PCA and ICA jointly estimate demixing matrix :
Each unsupervised learning method has an algorithm that is subject to various indices: uncorrelatedness, independence, non-Gaussianity, instantaneous propagation, and linearity . Linear mixture concept of blind EEG source separation is shown in Figure 3 that presents a demixing matrix W(=W1W2) as two-step estimator because some methods firstly decorrelate an observed matrix by W1 and then demix it by W2. Given a mixing matrix A is composed of the three blind cerebral sources s(n) and provides the same number of observations x(n) in the figure.
PCA converts the observed matrix of possibly correlated variables into values of linearly uncorrelated variables (principal components (PCs)) with the first-and second-order statics . This algorithm conducts the eigenvalue decomposition to get the directions u of greater variance in the input space of the EEG data X based on assumptions that data are jointly normally distribution, and the sources are uncorrelated. In order to satisfy the assumptions, obtained matrix Xold should be standardized to decorrelate samples of the same dimension () and to uniform unit ().
In PCA algorithm, the first PC, which has the largest variance in the standardized input space, is a linear combination of X defined by weights :
where ∑ () is covariance matrix of X. Therefore, this algorithm formulates the given problem in an optimization problem:
It can be solved by Lagrange multiplier method:
The covariance matrix ∑ is sequentially decomposed into eigenvector up and eigenvalue λp by an assumption that the PCs are orthogonal. The eigenvector up is similar to the column of the inverse demixing matrix W−1. PCA-based methods have an advantage over stationary data; however, satisfying their assumption for EEG data is difficult . On the other hand, PCA algorithm is often incorporated into a first decorrelation or whitening step of some ICA algorithms .
ICA is the most famous and prevalent unsupervised learning algorithm to decompose multi-channel EEG data X into independent components (ICs) with high-order (spatial) moments, beyond the second-order statics used in PCA, whereas some algorithms use the statics as well as PCA . A state-of-the-art topical review published on 2015 reported that second order blind interference (SOBI) and information maximization (InfoMax) are the most commonly used algorithm for EEG signal processing . In this chapter, we describe InfoMax algorithm.
The fundamental problem tackled by InfoMax ICA is how to minimize the mutual information (MI) of the output vector ,
Probability density functions of observed signal p(x) and estimated signal have following relationship:
where J(x) is Jacobian matrix. The estimating entropy is given by:
Therefore, the MI can be rewritten as following:
By partially differentiating this index on parameters W, optimized solution for source separation will be obtained.
As analytical computation of equation as mentioned above is difficult, this algorithm uses a gradient update rule based on the natural gradient  and learning rate η that is a positive constant:
3.1.3. Component identification after source separation
After source separation, estimated sources have to be continuously identified as neuronal or artifactual sources to reconstruct artifact-free EEG matrix . Visual inspection of scalp topography and empirical judgment was given the credit for identification of components [10, 14]. The overused techniques are still examined in an expedient manner for checking the results. That leads to increase in workload; therefore, hard/soft-threshold function, probability approach, and machine learning algorithm with features of the prepared material have been used for automatically identifying artifacts in estimated sources to reduce the workload and to get more repeatable labels [34, 35]. Proposing automatic and unsupervised component identification algorithm to characterize more precisely and flexibly has still been an active research area [36, 37]. Once estimated sources are identified, they advance to next step called denoising step, and then an underlying EEG matrix will be reconstructed using inverse linear demixing process (see Figure 4).
3.2. In single-channel signals
3.2.1. Discrepancy among standard assumptions about multi-/single-channel data
We can easily imagine that single-channel data do not always satisfy the assumptions for BSS techniques. Calculating demixing matrix W is especially difficult with single-channel artifact rejection methods (see Figure 5), so that researchers are forced to select whether to add information by using the reference channel before applying a method or to separate data by using only one-channel.
where , , and d(n) are intrinsic EEG data, artifact, and noise. It is assumed that the expected value of d(n) is 0.
The artifact would be corrected by calculating propagation factors to estimate the relationship between the reference signal and the observed EEG signal and subtracting the regressed portion . The rationale of the procedure is as follows:
Step 1. Separately average over observed EEG and reference signals of T trials to estimate the artifact waveform related variation for the channels:
Step 2. Subtract the averages from every trial data to obtain deviations:
where is duplicated T × 1 matrix of the observed EEG average,
Step 3. Calculate the propagation factor C by linear least-square regression whereby the observed EEG data are considered as a dependent variable and the reference data are considered as the independent variable:
Step 4. Correct the observed EEG data by subtracting the reference data scaled by the propagation factor C:
Because averaging operator emphasizes a time-locked activity in observed EEG signals, this method requires a reference channel and is powerful only if the operating system treats event-related brain potentials. Cerebral activities are usually not time-locked that means that important nontime-locked components will be lost by the averaging operation. Furthermore, this method does not take bidirectional contamination into account and cancels the cerebral information from each observed EEG signal upon linear subtraction . Despite its disadvantages, regression is still used as the ”gold standard” method to which the performance of any artifact rejection algorithms may be compared.
Band-pass is one of the classical and simple separation attempts to remove artifacts from an observed EEG signal. This method is effective if the spectral distributions of the EEG component and artifact do not overlap, and there are small band artifacts such as power line noise (50/60 Hz interference) . However, fixed-gain filtering is not effective for biological artifacts because it will attenuate EEG component and change both amplitude and phase of signal if the filtering keeps doing that . Some adaptive algorithms try to adapt the filter parameters w to minimize the error between the artifact-free EEG signal and the desired original signal x(n) to suppress the limitations of this method.
Adaptive filtering assumes that the intrinsic EEG signal and artifact are uncorrelated; therefore, the artifact is considered to be an additive noise within the observed signal:
where xt(n) is the observed EEG signal of t-th trial, n0(0) is the additive noise to offset and is uncorrelated with intrinsic EEG signal st(n). The filter parameters w are iteratively adjusted by a feedback (recursive) process designed to make the output as close as possible to some desired response with an additive noise interference [44, 45]. Figure 6 shows the noise canceller system using adaptive filtering. In this system, the primary input xt(n) and the reference input are the observed EEG and reference signals. A reference input which is a noise correlated with and uncorrelated with intrinsic EEG signal st(n), adds information to minimize the error et(n) between the response yt(n) and the desired response.
Recursive least squares (RLS)-based adaptive filtering presents a superior performance than least mean squares-based one . The algorithm can be implemented using the following equations:
where g(n) and w(n) are the gain vector and the filtering parameters. The initial value of cross-correlation R(0) is δI, where δ and I are some sufficiently large positive value and identity matrix. The updated filter parameters lead to output artifact-free EEG signal.
Consequently, adaptive filtering approach has a potential to recover “pure” EEG signal more rapidly and accurately than linear regression for ocular and cardiac artifacts . However, it is rather difficult to converge to the solution of filtering parameters if muscular and vibration artifacts have contaminated in the observed EEG signal. In that situation, the algorithm sometimes does not converge because of their convulsive burst.
Optimal filtering like Kalman filtering can capture non-stationary properties of artifacts. The framework has flexibility for non-linear system due to approximating the probability density function that might lead to more effective artifact rejection method. Many works on filtering algorithms have developed this approach for more useful module in real-time applications [49, 50].
3.2.4. ICA-based signal decomposition
ICA will achieve an artifact rejection with an outstanding performance if the number of independent sources is equal to or lower than observations. Unfortunately, this method is only applicable to multi-channel data; however, some works extended the idea to single-channel data to unmix a set of observed signals (components) into intrinsic sources [51–53]. These methods decompose a single-channel into multiple components by dividing into a sequence of blocks or different spectral modes before applying ICA so that we call these methods ICA-based signal decomposition approaches (see Figure 7).
Single-channel ICA is the oldest method for single-channel data under an assumption that stationary sources are being disjoint in the frequency domain . An observed signal x(n) is split up into K short segments X, a sequence of contiguous blocks of length L which is to be handled as a set of observations.
where k is the block index. A standard ICA algorithm than performs to the matrix X to derive the demining matrix W. The artifacts overlap with EEG components and EEG signal has non-periodic components; therefore, this method can be applied within limited situations. Wavelet transform (WT)-based and empirical mode decomposition (EMD)-based ICA have already been reported successful in removing artifacts for solving the similar problem than single-channel ICA .
WT-based ICA transforms an observed signal into components of disjoint spectra (a matrix) instead of signal (a vector) via discrete WT .
where W(a, b) and denote that the wavelet representation of x(n) and the mother wavelet with a and b defining the time-scale and location. The decision of parameters is hard if the user does not have a priori knowledge of the signal of interest. Each IC using wavelet coefficients is, respectively, identified as either neuronal or artifactual by manually. The artifactual ICs are replaced their values with arrays of zeros and then reconstructed to wavelet components. Finally, artifact-free signal is acquired by inverse discrete WT.
EMD-based ICA decomposes an observed signal into a number of K intrinsic mode functions (IMFs) hk(n),
where d(n) is a residue of the original data and a nonzero mean slowly varying function with only a few or no extreme . This method can remove artifacts without a priori knowledge regarding characteristics of the signal embedded in the data . Each IMF has monocomponent of the original data and is estimated by an iterative process called “shifting process”:
Step 1. Find the local maxima and minima in xk(n),
Step 2. Connect all of the local maxima and minima by cubic splines to form an upper and a lower envelope,
Step 3. Calculate the mean of the two envelopes, respectively,
Step 4. Obtain improved IMF hk+1(n) by subtracting the mean of the two envelopes from the current IMF hk(n),
Step 5. Go to Step 1 until the residue is below a stopping criterion.
This decomposition is based on the three conditions: (i) the number of extreme and the number of zero-crossing must be equal or up to plus/minus one; (ii) zero mean; and (iii) all the maxima and all the minima of IMF will be positive and negative everywhere. Each IC using IMFs is, respectively, identified as either neuronal or artifactual by manually as well as WT-based ICA. The artifactual ICs are replaced their values with arrays of zeros. Finally, reconstructed IMFs are summed simply together to acquire artifact-free signal.
WT-based and EMG-based ICA have been reported as superb methods for artifact rejection [51, 58, 59]. Therefore, a certain number of researchers tends to select them over recent years. However, separating intrinsic EEG components and artifacts are not successfully completed by this approach because frequency characteristics of biological artifacts and EEG components could be overlapped. In addition, a presence of similar oscillations in different modes or a presence of disparate amplitude oscillations in the same mode, named “mode mixing” makes the performance of artifact rejection worse . Signal distortion or attenuation typically occurs according to the above-mentioned methods by excessive interference. Thus, these approaches are not suitable for real-time applications.
3.2.5. Nonnegative matrix factorization
In linear regression, filtering, and ICA-based signal decomposition approaches, parameters W cannot often converge to a solution for perfectly demixing the mixtures. This implies that partially restricting the active space should be determined for single-channel signals.
Meanwhile, non-negative matrix factorization (NMF)  has recently attracted attention as effective algorithms to remove artifacts from single-channel signals because it can find the latent features underlying the interactions between EEG components and artifacts. An M-dimensional non-negative data vector xn is placed in the column of M × N matrix X, where N is number of data vectors. The matrix X is based on short-time Fourier transform and approximately factorized into an M × K nonnegative matrix H and a K × N nonnegative matrix W where K is the number of “basis” which is optimized for linear approximation of the input vectors. It can be represented by the following equation:
where an hk and a wk,n denote an entry of H and W. This equation means that respective non-negative EEG feature (power spectrum or amplitude spectrum) vector is approximated by linear combination of the basis vector hk weighted by the component of wk,n. Therefore, it can be rewritten as
Some works reported that the supervised NMF could effectively factorize the observed EEG signals into the brain activity components and the artifacts if the user has artifact data in advance [62, 63]. Before applying supervised learning, template matrix XArt has been factorized into HArt and WArt. The matrix X is continuously factorized into H and W where H contains the elements of matrix HArt. The matrix HArt has no relation to the elements of H while using standard NMF algorithm because the initial values are set randomly and updated by multiplicative rules. In supervised learning algorithm, the matrix HArt is used as a fixed value that will partially restrict the active space. By contrast, activity components in the matrix WArt are variable values. For this constraint, the matrix H can attempt to express EEG components in the matrix X with the remaining based K′. EEG components will be stored in the bases (see Figure 8).
After these processing, non-negative data of artifact-free EEG are reconstructed from the following equation:
Eq. (40) and inverse Fourier transform make it possible to acquire artifact-free signal. Supervised NMF is still in its infancy, showed high performance for artifact rejection. However, epoch detention step, which is not part of normal procedures in artifact rejection, must be embedded in the epoch-based method. This leads to increase the computational cost inevitably. Some low-cost (real-time) artifact detection algorithms for single-channel EEG signal [64, 65] are a silver lining in a dark cloud.
By the properties of artifacts, theoretically multivariate statistical analysis approaches such as PCA and ICA, which separate multi-channel EEG signals into spatially and temporally distinguishable components, are useful for extracting EEG components from the scalp recordings. In particular, ICA is a powerful tool for separating observed EEG signals into maximally independent activity patterns derived from cerebral or non-cerebral (artifactual) sources. However, ICA is unsuitable for analyzing EEG signals recorded by specialized EEG device because of mismatching of its assumption in the single (or few) channel case. Thus, proposing a removal method of artifact from single-channel EEG signals is currently a major challenge in EEG signal processing for the widespread use of systems as a conventional technology.
In this chapter, we tried to summarize some existing artifact rejection algorithms (PCA, ICA, regression, filtering, ICA-based signal decomposition, and NMF) focusing on the advantages and disadvantages of algorithms, which would provide beneficial information to improve their performance in online EEG systems. Last but not least, muscular artifacts reflecting body actions are natural enemies of EEG systems. The inevitable encounter must be solved by artifact rejection techniques. During real-time EEG system operation using specialized devices, unsupervised learning algorithms cannot separate observed signal into EEG and EMG components so far. Neuroscientists and neuro-engineers should carefully analyze the characteristics of artifacts and integrate them in a supervised learning algorithm for effective rejection of artifacts or extraction of intrinsic EEG components from observed EEG signals without altering the underlying brain activity to routinely use EEG systems in the future.
This work was supported in part by NEDO (New Energy and Industrial Technology Development Organization) SIP Number YYN6022-111123.