InTech uses cookies to offer you the best online experience. By continuing to use our site, you agree to our Privacy Policy.

Engineering » Biomedical Engineering » "Intelligent and Biosensors", book edited by Vernon S. Somerset, ISBN 978-953-7619-58-9, Published: January 1, 2010 under CC BY-NC-SA 3.0 license. © The Author(s).

Chapter 2

Signal Processing and Classification Approaches for Brain-Computer Interface

By Tarik Al-ani and Dalila Trad
DOI: 10.5772/7032

Article top

Overview

Basic BCI layout.
Figure 1. Basic BCI layout.
Grand average ERD curves recorded during motor imagery from the left (C3) and right sensorimotor cortex (C4) (the electrodes C3 and C4 are placed according to the International 10-20 system). The ERD time courses were calculated for the selected bands in the alpha range for 16 subjects. Positive and negative deflections, with respect to baseline (second 0.5 to 2.5), represent a band power increase (ERD) and decrease (ERD), respectively. The gray bar indicates the time period of cue presentation (i.e. the imagination starts at second 3). Figure from (Pfurtscheller et al., 2000a) which is modified from (Neuper & Pfurtscheller, 1999a
						).
Figure 2. Grand average ERD curves recorded during motor imagery from the left (C3) and right sensorimotor cortex (C4) (the electrodes C3 and C4 are placed according to the International 10-20 system). The ERD time courses were calculated for the selected bands in the alpha range for 16 subjects. Positive and negative deflections, with respect to baseline (second 0.5 to 2.5), represent a band power increase (ERD) and decrease (ERD), respectively. The gray bar indicates the time period of cue presentation (i.e. the imagination starts at second 3). Figure from (Pfurtscheller et al., 2000a) which is modified from (Neuper & Pfurtscheller, 1999a ).
ERD maps for a single subject calculated for the cortical surface of a realistic head model. Figure from (Pfurtscheller et al., 2000a) which is modified from (Neuper & Pfurtscheller, 1999a
						).
Figure 3. ERD maps for a single subject calculated for the cortical surface of a realistic head model. Figure from (Pfurtscheller et al., 2000a) which is modified from (Neuper & Pfurtscheller, 1999a ).
Signal enhancement methods in BCI designs. Figure modified from (Bashashati et al., 2007) with authorisation.
Figure 4. Signal enhancement methods in BCI designs. Figure modified from (Bashashati et al., 2007) with authorisation.
Typical P300 wave. From (Hoffman et al., 2008).
Figure 5. Typical P300 wave. From (Hoffman et al., 2008).
Feature extraction methods in BCI designs based on sensorimotor activity, VEP, P300, SCP, response to mental tasks, activity of neural cells, and multiple neuromechanisms. Taken from (Bashashati et al., 2007).
Figure 6. Feature extraction methods in BCI designs based on sensorimotor activity, VEP, P300, SCP, response to mental tasks, activity of neural cells, and multiple neuromechanisms. Taken from (Bashashati et al., 2007).
Feature selection and dimensionality reduction methods in BCI designs. Figure modified from (Bashashati et al., 2007) with authorisation.
Figure 7. Feature selection and dimensionality reduction methods in BCI designs. Figure modified from (Bashashati et al., 2007) with authorisation.
The idea of the SVM: transform a nonlinear problem of separation in the representation space into a linear problem of separation in a redescription space of higher dimension.
Figure 8. The idea of the SVM: transform a nonlinear problem of separation in the representation space into a linear problem of separation in a redescription space of higher dimension.
Linear mapping: the optimal separating hyperplane is that which maximizes the margin in the redescription space.
Figure 9. Linear mapping: the optimal separating hyperplane is that which maximizes the margin in the redescription space.
Non-linear mapping: a complex shape. Figure from Schölkopf & Smola (2002).
Figure 10. Non-linear mapping: a complex shape. Figure from Schölkopf & Smola (2002).
The display used for evoking the P300. One image was flashed at a time (Hoffman et al., 2008), (Hoffman et al., 2009).
Figure 11. The display used for evoking the P300. One image was flashed at a time (Hoffman et al., 2008), (Hoffman et al., 2009).
The best configuration with 8 electrodes given by Hoffman et al. (Hoffman et al., 2008), (Hoffman et al., 2009).
Figure 12. The best configuration with 8 electrodes given by Hoffman et al. (Hoffman et al., 2008), (Hoffman et al., 2009).
Structures of the training and testing data sets for subject
							
								
									s
									,
									 1
									≤
									s
									≤
									8
								
							
						. There are six subsets of multiple observation sequences (feature matrices
							
								
									{
									
										O
										1
									
									,
									
										O
										2
									
									,
									 
									...
									,
									
										O
										L
									
									}
								
							
						,
							
								
									L
									≥
									1
								
							
						), one subset for each image (Task). Each matrix is of dimension 8 ×32.
Figure 13. Structures of the training and testing data sets for subject s ,  1 ≤ s ≤ 8 . There are six subsets of multiple observation sequences (feature matrices { O 1 , O 2 ,   ... , O L } , L ≥ 1 ), one subset for each image (Task). Each matrix is of dimension 8 ×32.
Left: Training scheme for one subject and for six images. Right: Testing scheme for one subject given one or more task-related sequences.
Figure 14. Left: Training scheme for one subject and for six images. Right: Testing scheme for one subject given one or more task-related sequences.
Classification percent rates versus task number for subject 3 (top): late-stage amyotrophic lateral sclerosis and for subject 6 (down): subject had no known neurological deficits.
Figure 15. Classification percent rates versus task number for subject 3 (top): late-stage amyotrophic lateral sclerosis and for subject 6 (down): subject had no known neurological deficits.
Means and standard deviations of the classification percent rates in all tasks versus subject number (top: HMMs case, down: SVM case).
Figure 16. Means and standard deviations of the classification percent rates in all tasks versus subject number (top: HMMs case, down: SVM case).

Signal Processing and Classification Approaches for Brain-computer Interface

Tarik Al-ani1, 2 and Dalila Trad1, 3

1. Introduction

Research on brain-computer interface (BCI) systems began in the 1970s at the University of California Los Angeles (UCLA) (Vidal, 1973; 1977). The author gave in his papers the expression "Brain Computer Interface" which is the term currently used in literature.

A BCI system is a direct communication pathway between a brain and an external artificial device. BCI systems were aimed at assisting, augmenting or repairing human cognitive or sensory-motor functions.

The BCI systems (BCIs) allow control of an artificial device based on the features extracted from voluntary electric, magnetic, or other physical manifestations of brain activity collected from epi- or subdurally from the cortex or from the scalp or in invasive electrophysiological manner, i.e. brain signals recorded intracortically with single electrode or multi-electrode arrays (Dornhege et al., 2007). There is a variety of non-invasive techniques for measuring brain activity. These non-invasive techniques include, the electroencephalography (EEG), magnetoencephalography (MEG), positron emission tomography (PET), functional magnetic resonance imaging (fMRI), and optical imaging. However, for technical, time resolution, real-time, and price constraints, only EEG monitoring and related techniques are employed in the BCI community. For more details refer to (Wolpaw et al., 2002; Mason et al., 2007; Dobkin, 2007). The neuronal electrical activity contain a broad band frequency, so the monitored brain signals are filtered and denoised to extract the relevant information (see section 3) and finally this information is decoded (see section 6) and commuted into device commands by synchronous control or more efficiently by self-paced or asynchronous control in order to detect whether a user is intending something or not (see chapter 7 in (Dornhege et al., 2007) for details), Fig. 1. For some specific BCI tasks, raw brain signal serves as stimulus as well as a control interface feedback.

The direct BCIs can be seen as a new means of communication that may be used to allow tetraplegic or individuals with severe motor or neuromuscular diseases (e.g. Amyotrophic lateral sclerosis (ALS), brainstem stroke, brain or spinal cord injury, cerebral palsy, muscular

media/image1.jpeg

Figure 1.

Basic BCI layout.

dystrophies, multiple sclerosis) to have effective control over artificial devices or external environment in order to increase or improve their communication qualities or their independence. Recent studies have demonstrated correlations between EEG signals and actual or imagined movements and between EEG signals and mental tasks (Keirn & Aunon, 1990; Lang et al., 1996; Pfurtscheller et al., 1997; Anderson et al., 1998; Altenmüller & Gerloff, 1999; McFarland et al., 2000; Wessberg et al., 2000; Pfurtscheller et al., 2000b; Nicolelis, 2001; Pfurtscheller et al., 2003). The BCIs can be used also in therapeutic applications by neurofeedback for rehabilitation or functional recovery (Birbaumer & Cohen, 2007; Dobkin, 2007; Birbaumer et al., 1999; Dornhege et al., 2007).

The BCI is a communication system that does not require any peripheral muscular activity. It has been shown by (Pfurtscheller & Aranibar, 1977; Pfurtscheller, 1999c; Neuper & Pfurtscheller, 1999a ) that the imagination of either a left or right hand movement results in an amplitude attenuation (event-related desynchronization (ERD) of μ (8-13Hz) and central β (13-30Hz) rhythms at the contra-lateral sensori-motor representation area and, in an amplitude increase (event-related synchronization (ERS) within the γ band (30-40Hz) at the ipsi-lateral hemishpere. The event related (de)synchronisation(ERD, ERS) (Pfurtscheller et al., 1999a), see Fig. 2 and Fig. 3.

media/image2.png

Figure 2.

Grand average ERD curves recorded during motor imagery from the left (C3) and right sensorimotor cortex (C4) (the electrodes C3 and C4 are placed according to the International 10-20 system). The ERD time courses were calculated for the selected bands in the alpha range for 16 subjects. Positive and negative deflections, with respect to baseline (second 0.5 to 2.5), represent a band power increase (ERD) and decrease (ERD), respectively. The gray bar indicates the time period of cue presentation (i.e. the imagination starts at second 3). Figure from (Pfurtscheller et al., 2000a) which is modified from (Neuper & Pfurtscheller, 1999a ).

media/image3.png

Figure 3.

ERD maps for a single subject calculated for the cortical surface of a realistic head model. Figure from (Pfurtscheller et al., 2000a) which is modified from (Neuper & Pfurtscheller, 1999a ).

The direct BCIs can also be seen as a new means to extend communication for healthy subjects in many fields such as multimedia communication, control of robots, virtual reality and video games (Thomas, 1977; Friedman et al., 2004; Bell et al., 2008; Lécuyer et al., 2008).

There are in general two types of BCI systems: endogenous tasks and exogenous tasks based systems (Dornhege et al., 2007).

The endogenous tasks BCI systems, which are based on spontaneous activity, use brain signals that do not depend on external stimuli and that can be influenced by concentrating on a specific mental task. In order to obtain an efficient task recognition system, several concentration trials of human are, in general, realized. The concentration constraint is a very tiring mental task especially for disabled subjects who might have difficulties in acquiring voluntary control over their brain activity and it must be reduced in order to obtain an efficient task recognition system.

The exogenous tasks BCI systems, which are based on evoked activity, use brain signals that do depend on external stimuli. Particularly interesting are systems based either on the P300 or on SSVEPs (see section 2). Advantages of these potentials are that they are relatively well understood from a neurophysiologic point of view and that they can be evoked robustly across different subjects. Moreover, feedback training is not necessary in these systems, as theses potentials appear "automatically" whenever subjects concentrate onto one out of several stimuli presented in random order (Hoffman et al., 2008). Note that the material presented in this chapter is strongly biased towards sensorimotor (Changes in brain rhythms (μ,β, andγ) ) and P300 electrophysiological activities using EEG records.

In order to improve the performance of the BCI system design, it is necessary to use a good method of signal processing to allow easier extraction of physiological characteristics and also to use a good classifier adapted to the specificities of the BCI system. This chapter presents a compact guide to different signal processing techniques that have received more attention in BCIs. We introduce then some selected feature extraction and classification approaches in the context of BCI systems. A more exhaustive and excellent surveys on signal processing and classification algorithms may be found in the papers (Bashashati et al., 2007; Lotte et al., 2007). Then this chapter describes the application of two classification approaches, hidden Markov models (HMMs) and support vector machines (SVM), in the context of exogenous tasks BCI systems based on P300 evoked potential. The chapter ends with a global conclusions and perspectives.

The methods presented in sections 3.3, 4, 5 and 6 are based on the statistical results given in the comprehensive survey of 96 BCI designs using electrical signal recordings published prior to January 2006 by (Bashashati et al., 2007). Among these methods, we give here only a brief descriptions of the most applied methods. They are introduced here without referencing all the published papers for the 96 BCI designs. The reader may refer to the paper (Bashashati et al., 2007) to find a rich bibliographical work. However, we give only the original references corresponding to each proposed method.

2. Electrophysiological control activities in BCIs

Current BCI systems fall into seven main categories, based on the neuromechanisms and recording technology they use to generate control signals (Bashashati et al., 2007). The following list give a short descriptions of these electrophysiological activities used in BCI designs. This list is borrowed and adapted (with the authorization of authors) from the paper (Bashashati et al., 2007). We omitted the references of the different approaches given in this list. Many of these references are given in (Bashashati et al., 2007).

  • - Sensorimotor activity BCI designs that use sensorimotor activity as the neural source of control can be divided into three sub-categories:

  • - Changes in brain rhythms (μ,β,andγ)

  • μ rhythms in the range of 8-12 Hz and β rhythms in the range of 13-30 Hz both originate in the sensorimotor cortex and are displayed when a person is not engaged in processing sensorimotor inputs or in producing motor outputs. They are mostly prominent in frontal and parietal locations. A voluntary movement results in a circumscribed desynchronization in the μ and lower β bands. This desynchronization is called event-related desynchronization (ERD) (Pfurtscheller & Aranibar, 1977; Pfurtscheller, 1999c; Neuper & Pfurtscheller, 1999a ) and begins in the contralateral rolandic region about 2 s prior to the onset of a movement and becomes bilaterally symmetrical immediately before execution of movement. After a voluntary movement, the power in the brain rhythms increases. This phenomenon, called event-related synchronization (ERS), is dominant over the contralateral sensorimotor area and reaches a maximum around 600 ms after movement offset. γ rhythm is a high-frequency rhythm in the EEG. Upon the occurrence of a movement, the amplitude of γ rhythm in the range of 30-40 Hz increases. Gamma γ are usually more prominent in the primary sensory area.

  • - Movement-related potentials (MRPs)

  • MRPs are low-frequency potentials that start about 1-1.5 s before a movement. They have bilateral distribution and present maximum amplitude at the vertex. Close to the movement, they become contralaterally preponderant.

  • - Other sensorimotor activities

  • The sensorimotor activities that do not belong to any of the preceding categories are categorized as other sensorimotor activities. These activities are usually not restricted to a particular frequency band or scalp location and usually cover different frequency ranges. An example would be features extracted from an EEG signal filtered to frequencies below 30 Hz. Such a range covers different eventrelated potentials (ERPs) but no specific neuromechanism is used.

  • - Slow cortical potentials (SCPs)

  • - P300 Evoked potential

  • - Visual evoked potentials (VEPs)

  • VEPs are small changes in the ongoing brain signal. They are generated in response to a visual stimulus such as flashing lights and their properties depend on the type of the visual stimulus. These potentials are more prominent in the occipital area. If a visual stimulus is presented repetitively at a rate of 5-6 Hz or greater, a continuous oscillatory electrical response is elicited in the visual pathways. Such a response is termed steady-state visual evoked potentials (SSVEP). The distinction between VEP and SSVEP depends on the repetition rate of the stimulation.

  • - Response to mental tasks

  • BCI systems based on non-movement mental tasks assume that different mental tasks (e.g., solving a multiplication problem, imagining a 3D object, and mental counting) lead to distinct, task-specific distributions of EEG frequency patterns over the scalp.

  • - Activity of neural cells (ANC)

  • It has been shown that the firing rates of neurons in the motor cortex are increased when movements are executed in the preferred direction of neurons. Once the movements are away from the preferred direction of neurons, the firing rate is decreased.

  • - Multiple neuromechanisms (MNs)

  • BCI systems based on multiple neuromechanisms use a combination of two or more of the above mentioned neuromechanisms.

3. Signal pre-processing methods in BCIs

To extract features (see section 4), it is necessary to pre-process first the data. Three steps are necessary to achieve this goal: Referencing, Temporal filtering and signal enhancement.

3.1. Referencing

(Hagemann et al., 2001) have stated that the differences between results of different studies are partly due to the differences in referencing. In the case of EEG recordings from the cortex or from the scalp, these recordings are obtained using, in general, different electrodes on different positions. Since the brain activity voltage measured by a given electrode is a relative measure, the measurement may be compared to another reference brain voltage situated on another site. This results in a combination of brain activity at the given electrode, brain activity at the reference site and noise. Because of this, the reference site should be chosen such that the brain activity at that site is almost zero. Typically, the nose, mastoids and earlobes are used (Dien, 1998). In general, there are three referencing methods

  • - Common reference

  • The common reference technique is widely used in BCIs. This method uses one common reference for all electrodes. In general, the site of this reference is situated at large distance from all electrodes. The activity at the reference site influences all measurements equally, and differences between electrode measurements still contain all information needed.

  • - Average reference

  • The average reference subtracts the average of the activity at all electrodes from the measurements. This method is based on the principle that the activity at the whole head at every moment sums up to zero. Therefore, the average of all activity represents an estimate of the activity at the reference site. Subtracting this average produces in principle a dereferenced solution. However, the relatively low density of the electrodes and the fact that the lower part of the head is not taken into account, bring some practical problems along (Dien, 1998).

  • - Current source density (CSD)

  • The current source density (CSD) is used in many BCIs. It is "the rate of change of current flowing into and through the scalp" (Weber, 2001). This quantity can be derived from EEG data, and it may be interpreted as the potential difference between an electrode and a weighted average of their surrounding electrodes. The CSD can be estimated by computing the laplacian. The laplacian computes the sum of the differences between an electrode and its neighbours. A problem with this estimation is that it is actually only valid when the electrodes are in a two dimensional plane and equally distant.

3.2. Temporal filtering in BCIs

The brain signals are naturally contaminated by many internal and external noises. They can be removed using simple filters. The relevant information in BCIs is found in the frequencies below 30Hz. Therefore, all noise with higher frequencies (e.g. noise from the electrical net has a fixed frequency of 50Hz or 60 Hz) can be removed using FIR low pass filter. Specific frequency bands may also be selected using FIR bandpass filters.

3.3. Signal enhancement methods in BCI designs

The choice of a suitable enhancement technique is dependent on several factors such as the recording technology, number of electrodes, and neuromechanism of the BCI (Bashashati et al., 2007). Among seventeen pre-processing methods given by (Bashashati et al., 2007), we describe here briefly only six methods which are the most applied in BCI designs:

  • - Spatial filters - Referencing methods

  • The proper selection of a spatial filter for any BCI is determined by the location and extent of the selected brain control signal and of the various sources of EEG or non-EEG noise.

  • - Common average referencing (CAR)

  • Common-average or "reference-free" recording has been suggested as a solution to the problem of the reference electrode (Offner, 1950; Lehmann & Skrandies, 1984; Stanny, 1989). Common-average referencing involves recording in bipolar fashion from a number of electrodes, all referred to a single site. One then calculates the grand mean EEG waveform, by averaging across electrodes, and subtracts the result pointwise from the EEG recorded at each electrode. Activity recorded by the reference electrode is theoretically of equal magnitude in the mean and individualelectrode waveforms. Consequently, the effect of the reference electrode should be eliminated from each recording electrode's output when the common-average waveform is subtracted (Stanny, 1989).

  • - Surface Laplacian (SL)

  • The SL is defined as the 2nd order spatial derivative of the surface potential. Due to its intrinsic spatial high-pass filtering characteristics, the SL can reduce the volume conduction effect by enhancing the high-frequency spatial components, therefore can achieve higher spatial resolution than surface potentials.

  • - Principal component analysis (PCA)

  • The PCA (Pearson, 1901) is a linear mapping that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. Depending on the field of application, it is also named the discrete Karhunen-Loève transform (KLT), the Hotelling transform or proper orthogonal decomposition (POD). The PCA reveals the internal structure of the data in a way which best explains the variance in the data. If a multivariate dataset is visualised as a set of coordinates in a high-dimensional data space (1 axis per variable), ICA supplies the user with a lower-dimensional representation.

  • - Independent component analysis (ICA)

  • The more important artefacts in BCIs are generated by muscles and eyes blink (Gupta & Singh, 1996). Classical automatic methods for removing such artefacts can be classified into rejection methods and subtraction methods.

  • - Rejection methods consist of discarding contaminated EEG, based on either automatic or visual detection can be used in the BCI applications framework. Their success crucially depends on the quality of the detection.

  • - Subtraction methods are based on the assumption that the contaminated EEG is a linear combination of an original EEG and other independent artefact signals generated by the muscles and eyes blink. The original EEG is hence recovered by either subtracting separately recorded artefact-related signals from the measured EEG, using appropriate weights or by applying recent approaches for artefacts rejection: such as independent component analysis (ICA) (Common, 1994; Hyvärinen & Oja, 2000), peak elimination (Nakamura et al., 1996), neural network (Urszula et al., 1999) and fixed bandpass FIR filter based approach (Gupta & Singh, 1996).

  • The ICA (Common, 1994; Hyvärinen & Oja, 2000) is the more used technique. It is a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals. It is a special case of blind source separation (BSS). ICA is particularly efficient when the EEG and the artefacts have comparable amplitudes. For more details about their advantages, their limitations and their applications for the removal of eyes activity artefacts, refer to (Jung et al., 1998; 2000).

  • - Common spatial patterns (CSP)

  • The CSP (Koles, 1991; Müller-Gerking et al., 1999) is a technique used to find the common projection matrix that decomposes the different classes of single trial EEG datasets, and more specifically to find spatial structures of event-related (de)synchronization (ERD/ERS) in a EEG context. Such matrix maximizes the differences between the classes. (Guger et al., 2000) demonstrated the efficiency of the CSP method for real-time EEG analysis and concluded that only parameters that must be adjusted for the CSP are the time segment for the calculation of the CSP and, during on-line processing, the time window for the calculation of the variances. But the selection of these parameters is not very crucial. An advantage of the CSP method is that it does not require a priori selection of subject-specific frequency bands, as necessary for bandpower or frequency estimation methods (Pfurtscheller et al., 1996; McFarland et al., 1997b).

  • The CSP method is very sensitive to artefacts. A single trial containing, for example, a movement artifact can cause severe changes in the CSP (Müller-Gerking et al., 1999). The reason is the sample covariance (nonrobust estimate), which is used to estimate the covariance for the calculation of the spatial filters. However, during on-line operation of the BCI, the spatial filters perform a weighted spatial averaging of the EEG, and this reduces the influence of artefacts (Guger et al., 2000).

  • in some applications, many electrodes are needed, (e.g. more than 18 (Ramoser et al., 2000), which necessitates costly hardware.

  • since the CSP method detects spatial patterns in the EEG, any change in the electrode positions may render the improvements in the classification accuracy gained by this method useless. Therefore, this method requires almost identical electrode positions for all trials and sessions which may be difficult to accomplish (Ramoser et al., 2000). (Guger et al., 2000) recommended not to apply the electrodes anew after setting up a new CSP for the following feedback sessions. For long-term implications to analyze the EEG in real time, EEG data of several sessions can be used for the calculation of the CSP. This allows the generation of a more robust filter in order to overcome the mentioned problems.

  • - Common spatial subspace decomposition (CSSD)

  • The CSSD can extract signal components specific to one condition from multiple MEG/EEG data sets of multiple task conditions. Signal matrices or covariance matrices are decomposed using spatial factors common to multiple conditions. The spatial factors and corresponding spatial filters are then dissociated into specific and common parts, according to the common spatial subspace which exists among the data sets. Finally, the specific signal components are extracted using the corresponding spatial filters and spatial factors. (Wang et al., 1999).

  • - Frequency normalization (Freq-Norm) (Bashashati et al., 2005).

  • Other methods are given by (Bashashati et al., 2007).The study of (Bashashati et al., 2007) showed that

  • signal pre-processing algorithms have been used for EEG-based BCIs and the ANC-based BCIs, but no signal enhancement algorithms have been applied on electrocorticogram (ECoG)-based BCIs. Only PCA has been used in both groups, and

  • spatial filtering including referencing (CAR and SL) methods and CSP are among the most used techniques that have become increasingly popular in EEG-based BCIs.

Fig.4 shows the statistical results of the study realised by (Bashashati et al., 2007) concerning pre-processing methods in BCI designs. (Bashashati et al., 2007) concluded that 96 BCI designs that employ signal enhancement techniques before extracting the features from the signal, 32% use surface Laplacian (SL), 22% use either principal component analysis (PCA) or independent component analysis (ICA), 14% use common spatial patterns (CSP) and 11% use common average referencing (CAR) techniques.

media/image11.png

Figure 4.

Signal enhancement methods in BCI designs. Figure modified from (Bashashati et al., 2007) with authorisation.

In the following, we give a breif description of the two most used methods: Spatial filters and Common spatial patterns.

3.3.1. Spatial filters: (SL) and (CAR)

(McFarland et al., 1997a) showed that the variability of the EEG or non-EEG noise sources within the different BCI designs and even within individuals make difficult the application of the spatial filters. For BCIs that use the μ and β rhythms, the SL and CAR methods are superior to the ear reference method. However, it was shown that the reference method (CAR, bipolar, large Laplacian, small Laplacian, and referenced to the ear (McFarland et al., 1997a)) had minor influence on the classification accuracy (Ramoser et al., 2000). Fast and continuous feedback can also enhance the performance of the system (Guger et al., 2001; Neuper et al., 1999b). In the following, we introduce only the principles of the CSP given in (Guger et al., 2000).

3.3.2. Common spatial patterns (CSP)

As described by (Guger et al., 2000), the CSP method uses the covariance to design common spatial patterns and is based on the simultaneous diagonalisation of two covariance matrices (Fukunaga, 1972). The decomposition (or filtering) of the EEG leads to new time series, which are optimal for the discrimination of two populations (or classes). The patterns are designed such that the signal resulting from the EEG filtering with the CSP has maximum variance for population and minimum variance for the second population and vice versa. In this way, the difference between the first and second populations is maximized, and the only information contained in these patterns is where the variance of the EEG varies most when comparing two conditions.

Given N channels of EEG for each trial X of population 1 and population 2, the CSP method gives an NxN projection matrix according to (Koles, 1991; Müller-Gerking et al., 1999; Ramoser et al., 2000; Guger et al., 2000). This matrix is a set of subject-specific spatial patterns, which reflect the specific activation of cortical areas during hand movement imagination. With the projection matrix W , the decomposition of a trial X is described by

This mapping projects the variance of X onto the rows of Z and results in new time series. The columns of W1 are a set of CSPs and can be considered as time-invariant EEG source distributions. After interpolation, the patterns can be displayed as topographical maps.

By construction, the variance for population 1 is largest in the first row of Z and decreases with the increasing number of the subsequent rows. The opposite is the case for a trial with population 2.

4. Feature extraction methods in BCI designs

This section describes briefly the common BCI features extraction methods. Concerning the design of a BCI system, some critical properties of these features must be considered (Lotte et al., 2007):

  • noise and outliers: the brain signals (e.g. EEGs) have a poor signal-to-noise ratio;

  • high dimensionality: in BCI systems, feature vectors are often of high dimensionality. Several features are generally extracted from several channels and from several time segments before being concatenated into a single feature vector;

  • time information: BCI features should contain time information as brain activity patterns are generally related to specific time variations of EEG;

  • the brain signals are non-stationary in nature;

  • the brain signals are non-linear in nature;

  • non sufficient training sets: training process is time consuming and demanding for the subjects.

There are many methods used in BCI, depending of the type of the BCI systems. In the following we describe some main and specific methods. More exhaustive details are given by (Bashashati et al., 2007). The feature extraction methods described here are: Band powers (BP), Cross-correlation between EEG band powers, frequency representation (FR), time-frequency representation (TFR), Hjorth parameters, parametric modelling, inverse model and specific techniques used for P300 and VEP such as Peak picking (PP) and Slow cortical potentials calculation (SCPs).

4.1. Band powers (BP)

The features may be extracted from the EEG signals by estimating the power distribution of the EEG in predefined frequency bands. (Pfurtscheller et al., 1997) used the band powers (BP) and demonstrated that for each subject, different frequency components in the α and β band were found which provided best discrimination between left and right hand movement imagination. These frequency bands varied between 9 and 14 Hz and between 18 and 26 Hz.

4.2. Cross-correlation between EEG band powers

In the case of EEG measurements the cross-correlation coefficients between the EEG activity may be calculated to obtain some information from comparing different locations and different frequency bands (Farwell & Donchin, 1988; Musha et al., 1997; Bayliss & Ballard, 1999; 2000a;b; Wang et al., 2004a;b).

4.3. Frequency representation (FR)

Frequency representation (FR) features have been widely used in signal processing because of their ease of application, computational speed and direct interpretation of the results (Wolpaw et al., 2000; Blankertz et al., 2006). Specifically, about one-third of BCI designs have used power-spectral density (PSD) features (Bashashati et al., 2007).

4.4. Time-frequency representation (TFR)

Due to the non-linearity and non-stationarity nature of the EEG signal, the classical methods based on Fourier transform (FT) are, in general, not efficient for feature extraction because the obtained features do not provide any time domain information, i.e. these features do not analyze the time-varying spectral content of the signals.

Time-frequency methods decompose the EEGs into a series of frequency bands, and the instantaneous power is represented by the envelop of oscillatory activity, which forms the spatial patterns for a given electrode montage at a time-frequency grid (Millán & Mouriño, 2003; Wang et al., 2004a).

Wavelet-based feature extraction algorithms (Qin & He, 2005; Xu & Song, 2008; Haibin et al., 2008) necessitate the choice of a particular wavelet called mother wavelet in order to extract useful information. This choice of an appropriate mother wavelet may be simplified by the prior knowledge of the physiological activity in the brain.

(Huang et al., 1998) proposed a more fairly recent technique called the Empirical Mode Decomposition (EMD) was proposed for nonlinear and non-stationary time series data. The (EMD) is a data driven approach (i.e. one does not need to define a mother wavelet beforehand) that can be used to decompose adaptively a signal into a finite well-defined high frequency and low frequency components, which are known as intrinsic mode functions (IMFs) or modes. They consider signals at their local oscillations, but they are not necessarily considered in the sense of Fourier harmonics. Their extraction is non-linear, but their recombination for exact reconstruction of the signal is linear. We think that this approach might be useful in BCI design.

4.5. Hjorth parameters

(Hjorth, 1970) has described three parameters that characterize the temporal dynamics of EEG signal, (X(t)) , of length N in terms of amplitude, time scale and complexity. The parameters are measured in the time domain, as opposed to the other features, which are measured in the frequency domain. It has been shown that these parameters are capable of discriminating between different mental states (Vourkas et al., 2000). The parameters are:

  • Activity: a measure of the mean power of the signal (variance of X(VAR(X)) ). It is measured as the standard deviation.

  • Mobility: represents the mean frequency in the signal. The mobility can be computed as the ratio of the standard deviation of the slope and the standard deviation of the amplitude.

  • Complexity: tries to capture the deviation from the sine wave. It is expressed as the number of standard slopes actually seen in the signal during the average time required for one standard amplitude, as given by the mobility.

Complexity(X(t))=Mobility(dX(t)dt)Mobility(X(t))
(4)

These parameters are mainly used as features for the classification of motor imagery (Obermeier et al., 2001; Boostani & Moradi, 2004; Lee & Choi, 2003; Pfurtscheller & Neuper, 2001).

4.6. Parametric modelling

In statistics, a parametric model or parametric family or finite-dimensional model refers to a family of distributions which can be described using a finite number of parameters. These parameters are usually collected together to form a single k-dimensional parameter vector Θ= (θ1,θ2, ...,θk) . In system theory, parametric model assume that the time series under analysis to be the output of a given linear mathematical model. They require an a priori choice of the structure and order of the signal generation mechanism model.

Among the more used parametric modelling in BCIs are the autoregressive parameters (AR) and their variants such as multivariate AR parameters (MVAR), AR parameters with exogenous input (ARMAX) and Adaptive AR parameters (AAR) (Anderson & Sijercic, 1996; Schlogl et al., 1997; Anderson et al., 1998; Roberts & Penny, 2000; Burke et al., 2005; Vidaurre et al., 2007).

AR methods assume that a signal X(t) , measured at time t , can be modeled as a weighted sum of the values of this signal at previous time steps, to which we can add a noise term E t (generally a Gaussian white noise):

X(t)=a1X(t1)+a2X(t2)+...+apX(tp)+Et
(5)

where the weights a i are the autoregressive parameters which are generally used as features for BCI. AAR assume that the weights a i can vary over time. It seems that (AAR) parameters would give better results than (AR) parameters for motor imagery classification (Schlögl et al., 1997; Pfurtscheller et al., 1998), whereas they would give worse results for the classification of cognitive tasks such as mental computations, mental rotation of a geometric figure, etc. (Huan & Palaniappan, 2004a; Huan & Palaniappan, 2004b). It should be noted that it is possible to derive a frequential information from the a i coefficients (McFarland & Wolpaw, 2005).

4.7. Inverse model

Inverse models have shown to be promising feature extraction algorithms (Qin et al., 2004; Grave et al., 2005; Wentrup et al., 2005; Congedo et al., 2006). Such models are able to compute the activity in the whole brain volume, only using EEG and a head model that generally represents the brain as a set of volume elements (voxels). The activity thus calculated in some relevant brain regions or voxels may be used as efficient features for BCI systems.

4.8. specific techniques

4.8.1. Peak picking (PP)

Peak picking (PP) method detects a specific pattern based on its peak value in a region associated with a specific cognitive component. It is used specifically for the evoked potential P300 (or P3)-based BCI system (Meinicke et al., 2003; Garrett et al., 2003; Bayliss et al., 2004; Bayliss & Inverso, 2005; Salimi Khorshidi et al., 2007; Hoffman et al., 2008), Fig.5.

media/image32.png

Figure 5.

Typical P300 wave. From (Hoffman et al., 2008).

PP is a simple algorithm to recognize a P300 component using the difference between the minimum and maximum amplitude in a trial. A trial with a prototypical evoked potential P300 component contains a large peak from 300-400 ms and PP recognizes the P300 signal when the amplitude difference is greater than or equal to a specified voltage difference between the minimum, min(x), and maximum, max(x), voltage points within a specified time window, where x is a vector which represents the data for a single P300 response. For recognition, the time window with the best results may be between three and six hundred milliseconds citepbayliss04.

4.8.2. Slow cortical potentials (SCPs) calculation methods

The SCPs amplitudes are extracted on-line from the regular electroencephalogram, filtered, corrected for eye movement artefacts and fed back to the patient with visual, auditory or tactile feedback (Birbaumer et al., 1999; 2000; Hinterberger et al., 2004). The TFR methods are also used to extract the features of the SCPs (Hinterberger et al., 2003; Bostanov, 2004 ). Fig.6 shows the statistical results of the study realised by (Bashashati et al., 2007) concerning feature extraction methods in BCI designs. (Bashashati et al., 2007) concluded that 41% of the BCIs are based on the sensorimotor activity use PSD features, 16% rely on parametric modelling of the data, 13% use TFR methods and 6% do not employ any feature extraction methods. 74% of the SCP-based BCI designs calculate SCP signals using low-pass filtering methods, and 64% of the VEP-based BCIs use power-spectral features at specific frequencies. 26% of the BCIs based on P300 calculate the PP; 22% use TFR-based methods, 22% use no feature extraction method, and 15% use cross-correlation with a specific template. 41% of the BCI designs that use mental tasks to control a BCI use power spectral features and 37% use parametric modelling of the input signal. As most of the BCI designs that are based on neural

media/image33.png

Figure 6.

Feature extraction methods in BCI designs based on sensorimotor activity, VEP, P300, SCP, response to mental tasks, activity of neural cells, and multiple neuromechanisms. Taken from (Bashashati et al., 2007).

cortical recordings mainly try to model the direct relationship between the neural cortical recordings and movements, they do not use a feature-extraction algorithm. 45% of the BCI designs that are based on multiple neuromechanisms rely on power-spectral features, 17% use parametric modelling, and 17% use TFR methods.

5. Feature selection and dimensionality reduction methods in BCI designs

In BCI applications, several features are generally extracted from several brain activity channels (several electrodes in the case of EEG measurements) and from several time segments (or sessions), before being concatenated into a single feature vector. Hence, the BCIs are often affected by a problem known as curse of dimensionality (Bellman, 1961). It was demonstrated that the amount of data needed to properly describe the different classes increases exponentially with the dimensionality of the feature vectors (Friedman, 1997; Jain, 2000).

(Flotzinger et al., 1994) and (Pfurtscheller & Guger, 1999b) have shown that when feature selection is used, the classification accuracy is better than when all the features are used. If the number of training data is small relatively to the number of features, the classification algorithm which will use these features and data will very likely give bad results. It is recommended to use at least 5 to 10 times more training data per class than the number of features (Jain & Chandrasekaran, 1982; Raudys & Jain, 1991). Unfortunately this cannot be applied in all BCI systems as generally the dimensionality is high and the training set small.

Among fourteen feature selection and dimensionality reduction methods in BCI designs given by (Bashashati et al., 2007), we give here briefly the definitions of only three methods which are the most applied in BCI designs:

  • - Genetic algorithm (GA)

  • A Genetic algorithm (GA) (Goldberg, 1989; Flotzinger et al., 1994) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. They are a particular class of evolutionary algorithms (EA) that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover. These algorithms are based on a sequence of generations whereby the population in each generation produces the next while trying to optimize some fitness criterion (Brill, 1992), such as maximum ability to classify the training set in the classification stage. Each member of the current population is assigned a binay-valued chromosome of length n (for an n-dimensional classification problem), whereby the value of each bit within the chromosome defines whether this feature is to be used for classification or not. A chromosome 11 11 1...1, therefore, means that all parameters are to be used for evaluation of the member's fitness and a chromosome 10100... 0 means that only the first and third parameters are to be used. The accuracy which can be achieved using a specific chromosome is calculated using a clustering algorithm such as k-nearest-neighbour classifier (K-NN) or any other clustering algorithm. For k-nearest-neighbour classification the k nearest data vectors are found for a new input vector which is then classified to the label to which the majority of these k data vectors belong. For more details, refer to (Flotzinger et al., 1994; Brill, 1992).

  • - Principal component analysis (PCA)

  • Principal component analysis (PCA) may be used in pre-processing stage of BCI designs (see section 3.3). PCA may also be used as a dimensionality reduction technique in terms of capturing the variance of the data, and it accounts for correlation among variable. It gives lower-dimensional representations of the data which better generalize to data independent of the training set than using the entire dimensionality of the observation space (Scholkopf, 1999). The PCA transforms a set of m variables into another set of km uncorrelated variables, maintaining as many of the variance of original data as possible (Moghaddam, 2002).

  • - Distinctive sensitive learning vector quantization (DSLVQ)

  • The influence of distinctive features on the distance function in the standard learning vector quantisation (LVQ) (Kohonen, 1990) is equal. The Distinction Sensitive (DS) algorithm (DSLVQ) (Flotzinger et al., 1994) employs an adaptive weighted distance function where the influence of features which frequently contribute to misclassifications is reduced while the influence of features which are shown to be very significant for proper classification is increased. For the weighted distance function of DSLVQ a global weights vector w is used which stores the distinctiveness, i.e. the relevance, of every single feature. This weights vector is adapted interactively along with the codebook. The distance function used may be the Euclidean distance, or any other weighted distance functions. (Flotzinger et al., 1994) proposed the following Euclidean distance between two feature vectors x and y:

DSdist(x,y,w)=i=1N(wi(xiyi))2
(6)

The weights vector w can be seen as a scaling transformation from the original feature space into a DS-feature space. This transformation increases distances for very distinctive features and decreases distances for common features. Despite the usage of a weighted distance function, the codebook learning for DSLVQ is the same as for the LVQ3 algorithm. Additionally, the weights vector w must be updated with every learning iteration. Learning weights and codebook settings in parallel facilitate a quick approximation of these related parameters. For more details, refer to (Flotzinger et al., 1994).

Fig. 7 shows the statistical results of the study realised by (Bashashati et al., 2007) concerning feature selection and dimensionality reduction methods in BCI designs. (Bashashati et al., 2007) concluded that thirty-eight of the reported BCI designs employ feature selection and dimensionality reduction algorithms; 26% of these 38 designs use genetic algorithms (GA), 24% use distinctive sensitive learning vector quantization (DSLVQ), and 13% use PCA.

media/image37.jpeg

Figure 7.

Feature selection and dimensionality reduction methods in BCI designs. Figure modified from (Bashashati et al., 2007) with authorisation.

6. Classification in BCIs

Brain activity patterns are considered as dynamic stochastic processes due both to biological and to technical factors. Biologically, they change due to user fatigue and attention, due to disease progression, and with the process of training. Technically, they change due to amplifier noises, ambient noises, and the variation of electrode impedances (Wolpaw et al., 2002). Therefore, the time course of the generated time series signals (e.g. EEG) should be taken into account during feature extraction (Wolpaw et al., 2002). To use this temporal information, three main approaches have been proposed (Lotte et al., 2007):

The first approach is the most widely used despite that the obtained feature vectors are often of high dimensionality.

6.1. Classifier selection criteria

In order to choose the most appropriate classifier for a given set of features, the properties of the available classifiers must be chosen according to the following four classifier taxonomy and two main problems in BCI design as described by (Lotte et al., 2007):

6.1.1. Classifier taxonomy

  • - Generative or Informative classifier - Discriminative classifier

  • Generative classifiers, e.g., Bayes quadratic, learn the class models. To classify a feature vector, generative classifiers compute the likelihood of each class and choose the most likely. Discriminative ones, e.g., support vector machines (SVM), only learn the way of discriminating the classes or the class membership in order to classify a feature vector directly (Ng & Jordan, 2002; Rubinstein & Hastie, 1997);

  • - Static classifier - Dynamic classifier

  • static classifiers, e.g., multilayer perceptrons (MP), cannot take into account temporal information during classification as they classify a single feature vector. In contrast, dynamic classifiers such as hidden Markov model (HMM) (Rabiner, 1989), FIR filters-multilayer perceptrons (FIR-MLP) (Haselsteiner & Pfurtscheller, 2000) and Tree-based neural network (TBNN) (Ivanova et al., 1995), can classify a sequence of feature vectors and thus catch temporal dynamics.

  • - Stable classifier - Unstable classifier

  • Stable classifiers, e.g., linear discriminant analysis (LDA), have a low complexity (or capacity (Vapnik, 1995; 1999)). They are said to be stable as small variations in the training set do not considerably affect their performance. In contrast, unstable classifiers, e.g., multilayer perceptron, have a high complexity. As for them, small variations of the training set may lead to important changes in performance (Breiman, 1998).

  • - Regularized classifier

  • Regularization consists in carefully controlling the complexity of a classifier in order to prevent overtraining. Regularization helps limit (a) the influence of outliers and strong noise, (b) the complexity of the classifier and (c) the raggedness of the decision surface (Müller et al., 2003). A regularized classifier has good generalization performances and is more robust with respect to outliers (Duda, 2001; Jain, 2000) (see section 6.1.2).

6.1.2. Main classification problems in BCI research

While performing a pattern recognition task, classifiers may be facing two main problems related to the feature properties. These problems are: the curse-of-dimensionality and the bias-variance tradeoff. The first problem is discussed in section 5. The second problem is a general one where three inherent errors may occur.

Independently of the applications of the classification, this stage consists in association of the true class label c* corresponding to a feature vector x using a mapping (or a model f that may be learnt from a training set T. Let f* the optimal unknown theoretical mapping that has generated the labels. If the mean square error (MSE) is considered to evaluate the inherent classification errors, then these errors can be decomposed into three terms (Breiman, 1998; Friedman, 1997):

MSE=E[(cf(x))2]=E[(cf(x)+f(x)E[f(x)]+E[f(x)]f(x))2]=E[(cf(x))2]+E[(f(x)E[f(x)]2)]+E[(E[f(x)]f(x))2]=noice2+bias(f(x))2+variance(f(x))
(7)

where the expectation is with respect to the distribution over c* and the introduced term E[f(x)] is the mean value of the estimated class label. These three terms describe three possible sources of inherent classification error:

  • - noise: represents the irreducible noise within the system;

  • - bias: represents a systematic error which is the divergence between the estimated mapping (i.e. the estimated class label) and the best mapping (i.e. the true class label). This term depends strongly on the classification method that has been chosen to obtain f (linear, quadratic,... );

  • - variance: reflects the sensitivity to the used training set T.

The only terms that may be minimised are the bias and the variance terms. Simple classifiers (e.g. linear classifiers) have a high bias but low variance whereas more complex classifiers (e.g. polynomial classifiers) have a low bias but a high variance. To select the optimal model complexity, we must solve this bias-variance dilemma (Geman et al., 1992).

Actually, stable classifiers tend to have a high bias and a low variance, whereas the inverse is true for unstable classifiers. This can explain why simple classifiers sometimes outperform more complex ones. A low variance can be a solution to cope with the stochastic nature of brain signals. In the presence of strong noise and outliers, even linear systems can fail. One way of overcoming this problem is to use techniques, known as stabilization techniques. These techniques can be used to reduce the variance, e.g. combination of classifiers (Breiman, 1998) and regularisation techniques (Duda, 2001; Jain, 2000).

The nonlinear classifiers have more number of parameters than those in the linear classifiers. In addition, these parameters must be chosen appropriately. However, when there are large amounts and limited knowledge of data, nonlinear classifiers are better suited in finding the potentially more complex structure in the data. In particular, when the source of the data to be classified is not well known, using methods that are good at finding nonlinear transformation of the data is suggested. In these cases, kernel-based and neural-networks based methods can be used to determine the transformations. Kernel-based classifiers are classification methods that apply a linear classification in some appropriate (kernel) feature space. Thus, all the beneficial properties of linear classification are maintained, but at the same time, the overall classification is nonlinear. Examples of such kernel-based classification methods are support vector machines (SVMs) (Vapnik, 1999) and kernel Fisher discriminant (KFD) (Müller et al., 2003).

Better performances might also be achieved by using group (committee) of classifiers rather than using a single classifier but only a few BCI designs have employed such an approach in classifying features and achieved performance improvements (Millan et al., 2000; Peters et al., 2001; Varsta et al., 2000; Millan et al., 2002).

6.2. Classifiers used in BCI research

There are many categories of classification algorithms used to design BCIs. Among the more used Classifiers in the BCI community are: linear discriminant classifiers (LDC), neural networks (NN), nonlinear Bayesian classifiers (NBC), nearest neighbours classifiers (NNC) and combinations of classifiers. As a linear discriminant classifiers (LDC), we introduce in this section only the linear discriminant analysis (LDA) and support vector machines (SVMs), and as nonlinear Bayesian classifiers (NBC), we introduce in this section only the hidden Markov models (HMMs). For an extensive description and tutorials on these and other approaches, refer to (Lotte et al., 2007; Bashashati et al., 2007; Hung et al., 2005).

6.2.1. linear discriminant classifiers (LDC)

LDC are the most popular in BCI design. Two main kinds of LDC have been used for BCI design, namely, linear discriminant analysis (LDA) and support vector machine (SVM).

6.2.1.1. Linear discriminant analysis (LDA)

LDA or Fisher's LDA has been used with success in many of BCIs such as motor imagery based BCI (Pfurtscheller, 1999c), P300 applications (Bostanov, 2004; Hoffman et al., 2008), multiclass (Garrett et al., 2003) or asynchronous (Scherer et al., 2004) BCI. It has a very low computational requirement and it is simple to use and generally provides good results.

The idea of LDA (Fukunaga, 1990; Duda, 2001) is to find a weight vector w so that two projected clusters c1 and c2 of N1 and N2 training feature vectors {x1,x2,...,xN1}and{x1,x2,...,xN2} on w can be well separated from each other by hyperplanes while keeping small variance of each cluster. This can be done by maximizing the so-called Fisher's criterion. LDA assumes normal distribution of the data, with equal covariance matrix for both classes. The separating hyperplane is obtained by seeking the projection that maximizes the distance between the two classes' means and minimizes the between variance (Fukunaga, 1990).

j(w)=wTsbwwSww
(8)

with respect to w , where Sb is the between-class variance matrix:

Sb=(mc1mc2)(mc1mc2)T
(9)
and Sw is the within-class variance matrix:
Sw=xc1(xmc1)(xmc1)T+xc2(xmc2)(xmc2)T
(10)

where mc1 and mc2 are the cluster mean of classes c1 and c2 respectively.

The optimal weight vector w is the eigenverctor corresponding to the largest eigenvalue of Sw1Sb . Once this vector is obtained using the training data xi is obtained by means of the training data, the classification then may be done by projecting the test feature vectors on it, and then the projected test feature vectors may be classified by employing a clustering rule such as k-nearest-neighbor decision rule.

In the case of multiclass separation problem (ci,i 2) , several hyperplanes are used. The strategy generally used in this case is the one versus the rest (OVR) (Schlögl et al., 2005) which separate each class from all the others. This technique is suitable for the on-line BCIs because it has a very low computational requirement. It is simple to use and generally provides good results. The main drawback of LDA is its linearity that can provide poor results on complex nonlinear EEG data (Garcia et al., 2003). A regularized Fisher's LDA (RFLDA) has also been used in the field of BCI (Blankertz et al., 2002; Müller et al., 2004) and give better results for BCI than the non-regularized version (Blankertz et al., 2002; Müller et al., 2004).

6.2.1.2. Support vector machine (SVM)

The generalization of the pattern recognition or regression results obtained from a limited sample constitutes the essential stake of the artificial training (machine learning). It is known that the only minimization of the empirical risk (the error of training) does not guarantee a weak error on a corpus of test. Thus the techniques of regularization, used since the years 1960, allow to carrying out a compromise between the capacity of the model to be learned (related to its complexity) and its aptitude to be generalized. From the point of view conceptual, the concept of structural risk introduced by Vladimir Vapnik into the years 1990 (Vapnik, 1995; Cortes & Vapnik, 1995 ; Vapnik, 1999) gives a bound of the error of test according to the error of training and of the complexity of the model. Support vector machine (SVM) have been applied in various domains for classification and regression.

In the following, we will outline briefly the idea of SVM for binary classification and then how this method can be applied to solve multi-class classification problems.

  • - Binary classification SVM

Support vector machines were originally designed for binary classification (Cortes & Vapnik, 1995 ). The main idea of this approach is to construct a hyperplane in order to separate two classes (ci{1, 1} orci{0, 1}) so that the margin (Friedman, 1996) (the distance between the hyperplane and the nearest point(s) of each classes) is maximal and with minimal error. The theoretical principle of the SVM comprises two fundamental steps:

  • Make a nonlinear mapping Φ of the feature vectors from input space towards another space (redescription space) of higher dimension provided with a scalar product (Hilbert space), Fig. 8.

media/image59.jpeg

Figure 8.

The idea of the SVM: transform a nonlinear problem of separation in the representation space into a linear problem of separation in a redescription space of higher dimension.

  • Construct an optimal hyperplane allowing an optimal linear separation in this redescription space.

The interest of this mapping is that in the redescription space the pattern recognition can prove easy task: intuitively, more the dimension of the redescription space is hight; more the probability of being able to find a hyperplane separating between the feature vectors is high, Fig. 9.

media/image60.jpeg

Figure 9.

Linear mapping: the optimal separating hyperplane is that which maximizes the margin in the redescription space.

From the mathematical point of view, the nonlinear mapping (Φ:2n) is realized via a Kernel function (Hilbert-Schmidt Kernel) easy to calculate. In practice, some families of Kernel functions are known and it is allocated to the user of SVM to carry out test to determine that which is appropriate for its application (it is a question of translating the maximum a priori knowledge of which one lays out on the studied problem and the data). Fig. 10 shows a non-linear separation example.

media/image62.png

Figure 10.

Non-linear mapping: a complex shape. Figure from Schölkopf & Smola (2002).

Let x denote a vector drawn from the input space, assumed to be of dimension N0 and let {Φj(x)},j= 1, 2, ...,N1 denote a set of nonlinear transformations from the input space to the redescription space, where N1 is the dimension of the redescription space. Given such a set of nonlinear transformations, we may define a hyperplane acting as the decision surface which is computed in the redescription space in terms of the linear weights of the machine as follows:

j=0N1wjΦj(x)=0
(11)

where w= {w0,w1, ...,wN1} denotes a set of linear weights connecting the redescription space to the output space. And it is assumed that Φ0(x) = 1 for all x , so that w0 denotes the bias. Define the vector Φ(x) = [Φ0(x),Φ1(x), ...,ΦN1(x)]T , then we can rewrite the decision surface in the compact form:

Given the training feature vectors, Φ(xi) corresponds to the input pattern xi , and the corresponding desired response di , i= 1, 2, ...,N0 , which is either 1 or -1 (or, 0 or 1), it has been shown that (Haykin, 1994) the optimal weight vector w can be expressed as

w=i=1N0αidiΦ(xi)
(13)

where {αi} , i= 1, 2, ...,N0 is the optimal Lagrange multipliers resulted from maximizing the subject function

Q(α)=i=1N0αi12i=1N0j=1N0αiαjdidjΦT(xi)Φ(xj)
(14)

subject to the constraints

(1)
i=1N0αidi=0
(15)
and(2),

where C is a user-specified constant which may be considered as regulation parameter. Substituting equation (13) into (12), we obtain the optimal hyperplane

i=1N0αidiΦT(xi)Φ(x)=0
(17)

which will be used for linearly separating the testing data, i.e. for any testing sample x , if

i=1N0αidiΦT(xi)Φ(x)0
(18)

then x is classified into the subset having the training response di= 1 , otherwise it is classified into the other subset with di=(or 0) . The user may choose the radial basis function (RBF) in defining the inner product kernel K(x,xi) =ΦT(xi)Φ(x) .

According to equation (16), once the number of nonzero Lagrange multipliers, αi, is determined, the number of radial basis functions and their centers are determined automatically. This differs from the design of the conventional neural network, such as the back-propagation neural network or radial-basis function network (Haykin, 1994), where the numbers of hidden layers or of hidden neuron are usually determined heuristically. The kernel generally used in BCI research is the Gaussian or radial basis function (RBF) kernel:

K(x,xi)=exp(xxi22σ2)
(19)

In this case, the SVM is known as Gaussian SVM or RBF SVM (Burges, 1998; Bennett & Campbell, 2000; Garrett et al., 2003; Kaper et al., 2004).

This classifier has been applied, with success, to a relatively large number of BCIs problems (Blankertz et al., 2002; Garrett et al., 2003; Rakotomamonjy et al., 2005; Helmy et al., 2008; Trad et al., 2009).

    1. To solve a given multiclass problem it is preferable to use a combination of several binary SVM classifiers and a decision strategy to decide the class of the input pattern. The most popular methods to achieve this goal are: one-versus-all method using Winner-Takes- All strategy (WTA-SVM); One-Versus-One method implemented by Max-Wins Voting (MWV-SVM) (Duan et al., 2007); Decision Directed Acyclic Graph (DAGSVM) (Platt et al., 2000); and error correcting codes (Dietterich & Bakiri, 1995).

      6.2.2. Nonlinear Bayesian Classifiers (NBC)

      This section introduces one Bayesian classifier used for BCI: hidden Markov models (HMMs). This Classifier produces nonlinear decision boundaries. Furthermore, they are generative, which enables them to perform more efficient rejection of uncertain samples than discriminative classifiers. As brain signals used to drive BCI have specific time courses, HMM have been applied to the classification of temporal sequences of BCI features or even raw EEG (Obermeier et al., 2000; 2001; Cincotti et al., 2003; Schlogl et al., 2005; Helmy et al., 2008; Trad et al., 2009).

      HMMs are very efficient nonlinear techniques used for the classification of time series (Rabiner, 1989). HMMs have been widely applied in automatic speech recognition and now being applied to many fields, e.g. signal processing, pattern recognition, modelling and control. Their use necessitates two stages: a training stage where the stochastic process models are estimated through extensive observation corpus and decoding or detection stage where the model may be used off/on-line to obtain the likelihoods of the given testing sequence evaluated by each model.

      In its conventional definition, a HMM is defined by the following compact notation to indicate the complete parameter set of the model α= (Π,A,B) , where Π,A and B are the initial state distribution vector, matrix of state transition probabilities and the set of the observation probability distribution in each state, respectively: Π= [π1,π2, ...,πN] , with πi=P(q1=si) , A= [aij] , with aij=P(qt+1=sj|qt=si) , i , jN , si,sj2S , S= {s1,s2, ...,sN} , t{1, 2, ...,T} . The observation at time (or index) t, Ot, is considered in this paper as continuous OtK . For a continuous observation (CHMMs case), the state conditional probability of the observation bi(Ot) may be defined by a finite mixture of any log-concave or elliptically symmetric probability density function (pdf ), e.g. Gaussian pdf, with state conditional observation mean vector ¯i and state conditional observation covariance matrix Σi . In this paper we consider only a single Gaussian pdf, so B may be defined as B= {¯i,Σi},i=1, 2, ...,N . At each instant of time t, the model is in one of the states i , 1iN . It outputs Ot according to a density function bi(Ot) and then jumps to state j , jN with probability aij . The state transition matrix defines the structure of the HMM (Rabiner, 1989). The model λ may be obtained off-line by a given training procedure. In practice, given the observation sequence Ol= {O1,O2, ...,OT} , lL and a model λ , the HMMs need three fundamental problems to be solved

      1. How to calculate the likelihood P(Ol|λ) ? The solution to this problem provides a scoren of how Ol belongs to λ .

      2. How to determine the most likely state sequence that corresponds to Ol ? The solution to this problem provides the sequence of the hidden states corresponding to the given observation sequence Ol .

      3. How to adjust the model λ in order to maximize P(Ol|λ) ? This is the problem of estimating the model parameters given a corpus of training observations sequences.

      Problems 1 and 2 are solved in the decoding or detection stage using the forward or the Viterbi algorithms (Rabiner, 1989), while problem 3 is solved during the training phase using either a conventional algorithm such as the Baum-Welch algorithm (Rabiner, 1989) or another optimization based algorithm such as the simulated annealing based algorithm (Alani & Hamam, 2010).

      7. Hidden Markov models and support vector machines approaches applied to P300-based Brain-Computer interface - a comparative study

      This section reports on our contribution to investigate the use of Hidden Markov Models (HMMs) and Support Vector Machines (SVMs) approaches for tasks classification in P300-based Brain-Computer Interface (BCI) system. These approaches are applied to electroencephalogram (EEG) data sets of the École Polytechnique Fédérale de Laussane - BCI Groupe. These data are issued from five disabled subjects and four able-bodied subjects. For each subject, the HMMs and SVM are trained on different sets. Each set represents a specific P300 potential evoked task and contains multiple sequences (issued from different trials) of EEG records. For each task, the HMMs and the SVM that have been built take into account the variability of EEGs during different trials. Based on Bayesian Inference Criterion (BIC) (Schwarz, 1978), the proposed HMM training algorithm is able to select the optimal number of states corresponding to each task. This scheme makes the training procedure independent of the initialization problem and the a priori knowledge of the optimal number of HMM-related states needed in the training algorithm. The SVM is trained by introducing the different sets at the same time. We report training procedures and testing results for the HMMs and the SVM then we compare the performance of the two approaches on the same testing data sets. We also present promising preliminary comparative results based on the two classification approaches with correct classification rates for all subjects.

      7.1. Problem definition

      It has been shown by (Hoffman et al., 2008) that evoked potential (P300) in EEG data due to different images (tasks) appearing randomly on a computer screen in face of the subject may be efficiently used for exogenous mental task recognition. In Hoffman's work, extensions of LDA have been employed for pattern recognition. In our work we investigate the use of HMMs and SVM to classify the same P300 based tasks defined in (Hoffman et al., 2008) and using tasks data given in (Hoffman et al., 2009).

      7.2. Data structure and feature extraction

      For the clarity of the following sections, this section introduce briefly the experimental setup and the data given by Hoffman et al. (Hoffman et al., 2008), (Hoffman et al., 2009) that we adopted in our work.

      7.3. Hoffman's experimental setup

      In the experimental setup given by Hoffman et al., the users used a laptop screen displaying randomly six images (a television, a telephone, a lamp, a door, a window, and a radio), Fig. 11. Subjects were asked to count silently how often a prescribed image was flashed. The six images (Tasks) were displayed on the screen and a warning tone was issued.

      media/image128.png

      Figure 11.

      The display used for evoking the P300. One image was flashed at a time (Hoffman et al., 2008), (Hoffman et al., 2009).

      These images were selected in order to control some electrical appliances based on the BCI system. Each image flashed during 100 ms followed by a period of 300 ms of non flashing. The sampling rate of EEG was 2048 Hz using 32 electrodes according to the 10-20 international system.

      7.3.1. Subjects

      The recorded data given by Hoffman et al. (Hoffman et al., 2009) and used in our work are issued from five disabled subjects (subjects 1 to 5) and four able-bodied subjects (subjects 6 to 9). The disabled subjects were all wheelchair-bound but had varying communication and limb muscle control abilities. Subjects 1 and 2 were able to perform simple, slow movements with their arms and hands but they were unable to control other extremities. Spoken communication with subjects 1 and 2 was possible, although both subjects suffered from mild dysarthria. Subject 3 was able to perform restricted movements with his left hand but was unable to move his arms or other extremities. Spoken communication with subject 3 was impossible. However the patient was able to answer yes/no questions with eye blinks. Subject 4 had very little control over arm and hand movements. Spoken communication was possible with subject 4, although a mild dysarthria existed. Subject 5 was only able to perform extremely slow and relatively uncontrolled movements with hands and arms. Due to a severe hypophony and large fluctuations in the level of alertness, communication with subject 5 was very difficult, thus this subject was not taken into account by Hoffman at al. (Hoffman et al., 2008), (Hoffman et al., 2009). Subjects 6 to 9 were normal persons (all male, age 30 to 32). None of subjects 6 to 9 had known neurological deficits. In this study we use the data from all subjects.

      7.3.2. Preprocessing and feature extraction

      The data were preprocessed (referencing, bandpass-filtering, downsampling, and artefacts rejection). The EEG was downsampled from 2048 Hz to 32 Hz by selecting each 64th sample from the bandpass-filtered data. For each subject, the data were obtained during four sessions. One session comprised on average 810 trials, and the whole data for one subject consisted on average of 3240 trials. One trial duration takes 1000 ms. Single trials of duration 1000 ms were extracted from the data. Single trials started at stimulus onset, i.e. at the beginning of the intensification of an image, and ended 1000 ms after stimulus onset. Due to the Inter-stimulus interval of 400 ms, the last 600 ms of each trial were overlapped with the first 600 ms of the following trial.

      7.3.3. Electrode selection

      Hoffman et al. (Hoffman et al., 2008), (Hoffman et al., 2009) proved that the configuration with 8 electrodes gives the best result for the studying data Fig. 12.

      media/image129.jpeg

      Figure 12.

      The best configuration with 8 electrodes given by Hoffman et al. (Hoffman et al., 2008), (Hoffman et al., 2009).

      7.3.4. Feature vectors construction

      The samples from the selected electrodes were concatenated to obtain matrices of feature vectors. Each matrix is composed of Ne lines and Nt columns, where Ne denotes the number of electrodes and Nt denotes the number of temporal samples (feature vectors) in one trial. Due to the trial duration of 1000 ms and the downsampling to 32 Hz in (Hoffman et al., 2008) and (Hoffman et al., 2009), Nt is always taken equal to 32. In this work, we consider only matrices of dimension 8 × 32, i.e. each matrix O contains a sequence of 32 feature vectors O= {O1,O2, ...,O32} , OtK,K= 8 . For the purpose of our HMMs or SVM training and testing schemes, we divided the set of all matrices corresponding to each subject into two parts: training set and testing set. Each set contains six subsets of multiple observation sequences (feature matrices {O1,O2, ...,OL} , L 1 ), one subset for each image (Task) issued from different trials. Fig. 13 summarizes the training and data structures used in this work

      media/image140.jpeg

      Figure 13.

      Structures of the training and testing data sets for subject s, 1s8 . There are six subsets of multiple observation sequences (feature matrices {O1,O2, ...,OL} , L1 ), one subset for each image (Task). Each matrix is of dimension 8 ×32.

      7.4. Method

      In this section, we give the HMM and SVM training schemes and their implementations in the context of our work.

      7.5. HMM training algorithm

      Given a set of L training data sequences {O1,O2, ...,OL} , Ol= {O1,O2, ...,OTl} , lL issued from one of Z classes of stochastic processes pz , z1, 2, 3, ...,Z . The problem of the training in our work is to build Z HMMs for each process pz .

      We consider that each HMM contains an optimal state number Nopt of internal states to be determined automatically between Nmin and Nmax . In order to determine Nopt , we developed an optimal state number selection algorithm based on the Bayesian Inference Criterion. This algorithm constructs a number of HMMs with different number of states between Nmin and Nmax given by the user. For every state number, each iteration is initialized by the most appropriate model using data clustering, and by the rejection of the least linked state or the rejection of the least probable state of the previous iteration. Consequently, every training iteration begins by a more precise model. After constructing these HMMs, the algorithm selects the optimal HMM with the number of states that maximize the BIC.

      Given Nmin and Nmax , the algorithm may be summarized as follows:

      1. Initialization (clustering): estimate the initial HMM λ0 with Nmax states using k-mean clustering algorithm.

      2. While NNmin , do:

      a. run a training algorithm such a Baum-Welch algorithm (used in this work) until some convergence criterion is satisfied

      b. calculate and save the selection criterion, BIC(N), of the current model

      BIC(N)=logP(O|λN)FN2log(Tt)
      (20)

      FN is the total number of free parameters of the current estimated HMM (λN) and Tt be the total number of observations in all training observation sequences, i.e. Tt=l=1LTl

      c. determine the number of zeros in each line of the state transition matrix, A.

      d. select the best initialization method for the next HMM according to the following criterion:

      • if all the lines of A contain zeros, then make a clustering of the observation data.

      • if all the lines of A don't contain zeros but only some lines, then find and reject the state with minimal connections to other states.

      • else, calculate the least probable state and reject it.

      e. obtain a reduced model λ¯

      f. Fixe NN1 and λN   λ¯ and go to step 2

      In this work we consider only a single Gaussian pdf, so B may be defined as B= {¯i,Σi} , i= 1, 2, ...,N . In this case FN is calculated by

      FN=N(N1)+N[K+(K(K+1))/2]+(N1)
      (21)

      where OtK . In our case, we used Nmin= 2 and Nmax= 4 . These values were chosen in order to not constraint the clustering algorithm because the number of feature vectors is always equal to 32 (see section 7.2). The training and testing data sets for each subject described in section 7.2 were used for the following training and testing schemes:

      7.5.1. Training scheme

      Using the above algorithm, the goal of this scheme is to construct, for each subject, six HMMs: HMM1, HMM2,.., HMM6 corresponding to six images (Tasks). Each HMM is constructed with either 2 states, 3 states or 4 states. The training data sequences for all subjects s , s 8 and for the six images, all collected from different trials (see section 7.2), are given by the set {Tr_s_I} , s 8 , I 6 , where. Tr . and. I . denote training and image (or task) number respectively.

      7.5.2. Off-line testing scheme

      Our task is to classify six given testing feature sequences into class I , I 6 . In order to make off-line (or on-line detection in future work) of these sequences, a stochastic dynamic programming based (Viterbi decoding algorithm) (Rabiner, 1989) has been used. This algorithm gives at the same time the solution to problems 1 and 2 of HMMs discussed in section 6.2.2. In this case, the likelihoods of the data sequence are compared with respect to all six built HMMs. The selected HMM (i.e. task) is the model that maximize the likelihood. The testing data sequences for all subjects s , s 8 and for the six images, all collected from six trials (Hoffman et al., 2008), (Hoffman et al., 2009), are given by the set {Te_s_I} , s 8 , I 6 , where Te and I denote testing and image (or task) number respectively. Fig 14 show the training and testing schemes for one subject and for six images.

      7.5.3. Multiclass SVM (M-SVM)

      In our work, we used a modified version of the multiclass SVM (M-SVM). The M-SVM method adopted in this work is the one-versus-all. This method is done by winner takes-all

      media/image190.jpeg

      Figure 14.

      Left: Training scheme for one subject and for six images. Right: Testing scheme for one subject given one or more task-related sequences.

      combination strategy (Friedman, 1996), the outputs of the binary one-vs-all classifiers are compared and the most probable output among the other ones is chosen. The ith SVM is trained with all of the examples in the ith class with positive labels, and the rest with negative labels. An optimal hyperplane is constructed to separate N/M positive examples from N(M – 1)/M negative examples where M is the number of SVMs and N is the number of the training examples used to construct an SVM for a given class. The same training and testing data structures, described in sections 7.2, 7.5.1 and 7.5.2, were used for SVM training and testing. To train or to test the SVM for a one subject, a number of ×L training or testing feature matrices for all 6 tasks corresponding to this subject were concatenated and fed together to the M-SVM algorithm.

      7.6. Results

      The performance of BCI depends on different evaluation parameters. Among these evaluation parameters of interest in this work is the classification accuracy. This section reports experimental evaluations of the classification accuracy based on the testing scheme introduced in section 7.5.2. Using 60 testing feature matrices for all tasks corresponding to each subject (10 feature matrices for each task), the obtained classification results give the highest percentages of classification rates for the two applied approaches SVM and HMMs. The highest classification percent rates for all subjects are shown in the diagonals of the confusion matrices given in Table 1, see (Appendix). To conclude on the obtained results, we consider the case of comparison between the one of the more severely disabled subject, i.e. subject 3 (late-stage amyotrophic lateral sclerosis) and one of the able-bodied subjects, i.e subject 6 (subject had no known neurological deficits). Fig. 15 show the HMMs-SVM based comparative result for subject 3 and for subject 6. Fig. 15 and Table 1 show that the overall classification percent rates give a satisfactory and promising results for all the disabled subjects compared to the able-bodied ones. Fig. 16 show the HMMs-SVM based comparative results of the means and standard deviations of all the tasks generated by all subjects.

      Fig. 16 show that

      1. media/image194.jpeg

        Figure 15.

        Classification percent rates versus task number for subject 3 (top): late-stage amyotrophic lateral sclerosis and for subject 6 (down): subject had no known neurological deficits.

        media/image195.jpeg

        Figure 16.

        Means and standard deviations of the classification percent rates in all tasks versus subject number (top: HMMs case, down: SVM case).

        1. In both the HMMs and the SVM cases, the means of the classification percent rates for the disabled subjects (except subject 2) as well as for the able-bodied subjects (except subject 9) are approximately uniformly distributed.

        2. The standard deviations of the classification percent rates for all subjects in the SVM case are less than or equal to those in the HMMs case.

        These results give a way to study the ability of the different persons to learn the different P300-based BCI tasks.

        8. Appendix

        media/image196.jpeg

        Table 1.

        Confusion matrices for the eight subjects and the six tasks (subject 5 is not taken into account). Left: HMMs-based, right: M-SVM-based. All the values are rounded to make the text readable.

        9. Conclusions

        This chapter have introduced a comprehensive survey of signal processing, feature extraction/ selection and classification methods used to provide the readers with guidelines on design brain-computer interfaces (BCIs). Based on the results given by (Bashashati et al., 2007) and (Lotte et al., 2007), this chapter shows which method have received more attention by the BCI community prior to 2006. However, other existing methods, not currently applied in BCI design, could be explored. The exploration of new methods would be strongly driven by the new properties that will have to be taken into consideration in the real future applications of BCIs.

        As a contribution to the classification methods, a Hidden Markov Models (HMMs) and support vector machines (SVM) approaches for task classification in P300-based BCI system is presented. In the HMMs case, we proposed a training algorithm which is able to select automatically the optimal number of HMM-related states corresponding to each set of EEG training records. In the SVM case, we applied a multiclass SVM based on the one-versus-all method. The confusion matrices give a correct classification rates for the able-bodied subjects as well as for the disabled subjects. The comparative results demonstrate that the two approaches are promising. If the HMMs are not constructed correctly or the training data set is not sufficient, the HMM approach gives in this case a low classification rates. In such cases, a hybrid HMM-SVM may be employed, where the HMM is used as a dynamic classifier and the SVM is used as a good discriminative classifier by considering all the training examples in the training data set and train all the task-related models simultaneously. The authors are currently working on extending this work to include off-line and on-line training and classification schemes by using this strategy. The information transfer rate (bit rate) will be taken into account to include both accuracy and speed of a BCI system.

        10. Acknowledgements

        The authors would like to thank Dr. U. Hoffman and the EPFL-Brain-Computer team for the data and the software given in (Hoffman et al., 2008) that they were used in this work. The authors would like also to thank Dr. A. Bashashati for his authorization to use or modify some figures given in the paper (Bashashati et al., 2007) to illustrate some sections given in this chapter.

        References

        1 - C. W. Anderson, Z. Sijercic, 1996 Classication of EEG signals from four subjects during five mental tasks. In Solving Engineering Problems with Neural Networks. Proceedings of the International Conference on Engineering Applications of Neural Networks (EANN.96).
        2 - C. W. Anderson, E. A. Stolz, S. Shamsunder, 1998 Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks. IEEE Trans Biomed Eng, 45, 277 286 .
        3 - E. O. Altenmüller, C. Gerloff, 1999 Psychophysiology and the EEG. In: Niedermeyer E, Lopes da Silva FH, editors. Electroencephalography: basic principles, clinical applications and related fields, 4th ed. Baltimore, MD: Williams and Wilkins, 637 655 .
        4 - T. Al-ani, Y. Hamam, 2010 A low Complexity Simulated Annealing Approach For Training Hidden Markov Models. Int. J. of Operational Research (IJOR), (Accepted for publication), 8 3
        5 - J. D. Bayliss, D. H. Ballard, 1999 Single trial 300 recognition in a virtual environment. Proc. Int. ICSC Symp. on Soft Computing in Biomedicine, (Genova, Italy).
        6 - J. D. Bayliss, D. H. Ballard, 2000a Recognizing evoked potentials in a virtual environment. Advances in Neural Information Processing Systems, 12 ed S A Solla, T K Leen and K R Müller (Cambridge, MA: MIT Press), 3 9 .
        7 - J. D. Bayliss, D. H. Ballard, 2000b A virtual reality testbed for brain-computer interface research IEEE Trans. on Rehabilitation Engineering, 8 188 190 .
        8 - J. D. Bayliss, S. A. Inverso, A. Tentler, 2004 Changing the P300 Brain-Computer Interface. Journal of CyberPsychology & Behavior, 7 6 694 704 .
        9 - J. D. Bayliss, S. A. Inverso, 2005 Automatic Error Correction Using 3 Response Verification for a Brain-Computer Interface. Proc. 11th International Conference on Human- Computer Interaction, July 22-27, Las Vegas, NV. Mahwah: Lawrence Relbaum Associates.
        10 - A. Bashashati, R. K. Ward, G. E. Birch, 2005 A new design of the asynchronous brainŰ- computer interface using the knowledge of the path of features. Proc. 2nd IEEE-EMBS Conf. on Neural Engineering, (Arlington, VA), 101 104 .
        11 - A. Bashashati, M. Fatourechi, R. K. Ward, G. E. Birch, 2007 A survey of signal processing algorithms in brain-computer interfaces based on electrical brain signals, J. Neural Eng., June, 4 2 R32 R57 .
        12 - R. Bellman, 1961 Adaptive Control Processes: A Guided Tour. Princeton University Press, New Jersey.
        13 - C. J. Bell, P. Shenoy, R. Chalodhorn, R. P. N. Rao, 2008 Control of a humanoid robot by a noninvasive brainŰcomputer interface in humans, J. Neural Eng. 5, 214 220 .
        14 - K. P. Bennett, C. Campbell, 2000 Support vector machines: hype or hallelujah?, ACM SIGKDD Explor. Newslett., 21 13 .
        15 - N. Birbaumer, H. Flor, N. Ghanayim, T. Hinterberger, I. Iverson, E. Taub, B. Kotchoubey, A. . Kubler, J. Perelmouter, 1999 A spelling device for the paralyzed, Nature, 398 6725 297 298 .
        16 - N. Birbaumer, A. Kübler, N. Ghanayim, T. Hinterberger, J. Perelmouter, J. Kaiser, I. Iversen, B. Kotchoubey, N. Neumann, H. Flor, 2000 The thought translation device (TTD) for completely paralyzed patients. IEEE Trans. Rehab. Eng., 8 2 190 192 .
        17 - Birbaumer, N. , L. G. Cohen, 2007 Brain-computer interfaces: communication and restoration of movement in paralysis. The journal of Physiology, 579 3 637 642 .
        18 - B. Blankertz, G. Curio, K. R. Müller, 2002 Classifying single trial EEG: towards brain computer interfacing. Adv. Neural Inf. Process. Syst. (NIPS 01), 14 157-164.
        19 - B. Blankertz, G. Dornhege, M. Krauledat, K. R. Müller, V. Kunzmann, F. Losch, G. Curio, 2006 The Berlin brain-computer interface: EEG-based communication without subject training. IEEE Trans. on Rehabilitation Engineering, 14 2 , 147 152 .
        20 - R. Boostani, M. H. Moradi, 2004 A new approach in the BCI research based on fractal dimension as feature and adaboost as classifier. Journal of Neural Engineering, 1 4 212 217 .
        21 - V. Bostanov, 2004 BCI competition 2003-data sets ib and IIb: feature extraction from eventrelated brain potentials with the continuous wavelet transform and the t-value scalogram. IEEE Trans. Biomed. Eng., 51 1057 1061 .
        22 - L. Breiman, 1998 Arcing classifiers. Ann. Stat., 26 801 849 .
        23 - F. Z. Brill, D. E. Brown, W. N. Martin, 1992 Fast Genetic Selection of Features for Neural Network Classsers. IEEE Transactions on Neural Networks, 3 324 328 .
        24 - C. J. C. Burges, 1998 A tutorial on support vector machines for pattern recognition. Knowl. Discov. Data Min., 2 121 167 .
        25 - D. P. Burke, S. P. Kelly, P. de Chazal, R. B. Reilly, C. Finucane, 2005 A parametric feature extraction and classification strategy for brain-computer interfacing. IEEE Trans. On Rehabilitation Engineering, 13 1
        26 - F. Cincotti, A. Scipione, A. Tiniperi, D. Mattia, M. G. Marciani, J. Millán, R. del , S. Salinari, L. Bianchi, F. Babiloni, 2003 Comparison of different feature classifiers for brain computer interfaces. Proc. 1st Int. IEEE EMBS Conf. on Neural Engineering.
        27 - P. Comon, 1994 Independent Component Analysis: a new concept?. Signal Processing, Elsevier, 36 3 287 314 .
        28 - M. Congedo, F. Lotte, A. Lécuyer, 2006 Classification of movement intention by spatially filtered electromagnetic inverse solutions. Physics in Medicine and Biology, 51 8 1971 1989 .
        29 - C. Cortes, V. Vapnik, 1995 Support vector networks. Machine Learning. Machine Learning, 20 273 297 .
        30 - J. Dien, 1998 Issues in the application of the average reference: Review, critiques, and recommendations. Behavior Research Methods. Instruments, & Computers, 30 1 34 43 .
        31 - T. Dietterich, G. Bakiri, 1995 Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2 263 286 .
        32 - B. H. Dobkin, 2007 Brain-computer interface technology as a tool to augment plasticity and outcomes for neurological rehabilitation, Journal of physiology, 579 3 637 642 .
        33 - G. Dornhege, J. Millán, R. del , T. Hinterberger, D. J. Mc Farland, K. Müller-R, T. J. Sejnowski, 2007 An introduction to brain computer interfacing. In: Towards. Brain-Computer Interfacing, MIT Press, Cambridge.
        34 - R. O. Duda, P. E. Hart, D. G. Stork, 2001 Pattern Recognition. 2nd. ed. (New York: Wiley- Interscience).
        35 - K. Duan-B, J. C. Rajapakse, M. F. Nguyen, 2007 One-versus-one and one-versus-all multiclass SVM-RFE for gene selection in cancer classification. In: EvoBIO, 228 234 .
        36 - L. A. Farwell, E. Donchin, 1988 Talking off the top of your head: Towards a mental prosthesis utilizing event-related brain potentials Electroencephalogr. Clin. Neurophysiol., 80 510-523.
        37 - D. Flotzinger, M. Pregenzer, G. Pfurtscheller, 1994 Feature selection with distinction sensitive learning vector quantisation and genetic algorithms. Proc. IEEE Int. Conf. on Neural Networks, (Orlando, FL), 3448 3451
        38 - J. H. Friedman, 1996 Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford University.
        39 - J. H. K. Friedman, 1997 On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Min. Knowl. Discov., 1 55 77 .
        40 - D. Friedman, M. Slater, A. Steed, R. Leeb, G. Pfurtscheller, C. Guger, 2004 Using a brain-computer-interface in a highly immersive virtual reality. In IEEE VR Workshop.
        41 - K. Fukunaga, 1972 In: Introduction to Statistical Pattern Recognition, Oxford, UK., Clarendon.
        42 - K. Fukunaga, 1990 Statistical Pattern Recognition. 2nd edn (New York: Academic).
        43 - G. Garcia, T. Ebrahimi, J. M. Vesin, 2003 Support vector EEG classification in the Fourier and time-frequency correlation domains. Proc. 1st IEEE-EMBS Conf. on Neural Engineering, Capri Island, Italy, 591 594 .
        44 - D. Garrett, D. A. Peterson, C. W. Anderson, Thaut. M. H. nd, 2003 Comparison of linear, nonlinear, and feature selection methods for EEG signal classification. IEEE Trans. On Rehabilitation Engineering, 11 141 144 .
        45 - S. Geman, E. Bienenstock, R. Doursat, 1992 Neural Networks and the Bias/Variance Dilemma. Neural Computation, 4 1 58 .
        46 - D. E. Goldberg, 1989 Genetic Algorithms in Search, Optimization. and Machine Learning. Addison-Wesley.
        47 - Peralta. Grave de, Gonzalez. Menendez, R. Andino, S. , L.. Perez, P. Ferrez, J. Millán, 2005 Non-invasive estimation of local field potentials for neuroprosthesis control. Cognitive Processing, Special Issue on Motor Planning in Humans and Neuroprosthesis Control.
        48 - S. Gupta, H. Singh, 1996 Preprocessing EEG signals for direct human-system interface. Intelligence and Systems. IEEE International Joint Symposia on, 32 37 .
        49 - C. Guger, H. Ramoser, G. Pfurtscheller, 2000 Real-Time EEG Analysis with Subject- Specific Spatial Patterns for a Brain-Computer Interface (BCI). IEEE Trans. on rehabilitation engineering, 8 4 447 456 .
        50 - C. Guger, A. Schlögl, C. Neuper, D. Walterspacher, T. Strein, G. Pfurtscheller, 2001 Rapid prototyping of an EEG-based brain-computer interface (BCI). IEEE Trans. Rehab. Eng., 9 1 49 58 .
        51 - D. Hagemann, E. Naumann, J. F. Thayer, 2001 The quest for the EEG reference revisited: A glance from brain asymmetry research. Psychophysiology, 38 847-857.
        52 - Z. Haibin, W. Xu, W. Hong, 2008 Brain-computer interface design using relative wavelet energy. The 2nd International Conference on Control and Decision, 3558 3561 , 2-4 July.
        53 - E. Haselsteiner, G. Pfurtscheller, 2000 Using time-dependant neural networks for EEG classification. IEEE Trans. on Rehabilitation Engineering, 8 457 463 .
        54 - S. Haykin, 1994 Neural Network: A Comprehensive Foundation. New York: Macmillan College Publishing Company.
        55 - S. Helmy, T. Al-ani, Y. Hamam, E. El-madbouly, 2008 300 Based Brain-Computer Interface Using Hidden Markov Models. ISSNIP 2008, Sydney, 15-18 Dec.
        56 - T. Hinterberger, A. Kubler, J. Kaiser, N. Neumann, N. Birbaumer, 2003 A brain- computer interface (BCI) for the locked-in: comparison of different EEG classifications for the thought translation device. Electroencephalogr. Clin. Neurophysiol. 114 416 425 .
        57 - T. Hinterberger, N. Weiskopf, R. Veit, B. Wilhelm, E. Betta, N. Birbaumer, 2004 An EEGDriven Brain-Computer Interface Combined With Functional Magnetic Resonance Imaging (fMRI). IEEE Transactions On Biomedical Engineering, June, 51 6
        58 - B. Hjorth, 1970 EEG analysis based on time domain properties. Electroencephalography and Clinical Neurophysiology, 29 3 306 310 .
        59 - U. Hoffmann, J. Vesin, M. , T. Ebrahimi, K. Diserens, 2008 An efficient P300-based braincomputer interface for disabled subjects, Journal of Neuroscience Methods, 167 1 115 125 .
        60 - U. Hoffmann, J. Vesin, M. , T. Ebrahimi, K. Diserens, 2009 An efficient 300 brain-computer interface for disabled subjects. Available from URL: http://mmspl.epfl.ch/page33712.html. Visited on September, 10, 2009.
        61 - N. Huan, J. , R. Palaniappan, 2004a Classification of mental tasks using fixed and adaptive autoregressive models of EEG signals. In: IEMBS.04. 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 1 5 .
        62 - N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. C. Yen, C. C. Tung, H. H. Liu, 1998 The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society, 454 903 995 .
        63 - N. Huan, J. , R. Palaniappan, 2004b Neural network classification of autoregressive features from electroencephalogram signals for brain-computer interface design. J. Neural Eng., 1 142 150 .
        64 - C. Hung-I, P. Lee-L, Y. Wu-T, T. Yeh-C, J. Hsieh-C, 2005 Recognition of Motor Imagery Electroencephalography Using Independent Component Analysis and Machine Classifiers. Annals of Biomedical Engineering, 33 8 1053 1070 .
        65 - A. Hyvärinen, E. Oja, 2000 Independent Component Analysis: A Tutorial. Neural Networks, 13 4-5 ., 411 430 .
        66 - I. Ivanova, G. Pfurtscheller, C. Andrew, 1995 AI-based classification of single-trial EEG data. Proc. 17th Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society, (Montreal, Canada), 703704
        67 - A. K. Jain, B. Chandrasekaran, 1982 Dimensionality and sample size considerations in pattern recognition practice. In: Handbook of Statistics, 2 835 855 .
        68 - A. K. Jain, R. P. W. Duin, J. Mao, 2000 Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell., 22 4 37 .
        69 - T. Jung-P, C. Humphries, T. Lee-W, S. Makeig, M. J. Mc Keown, V. Iragui, T. J. Sejnowski, 1998 Removing electroencephalographic artefacts: comparison between ICA and PCA. Neural Networks Signal Processing, Vol. III, 63 72 .
        70 - T. Jung-P, S. Makeig, M. Westerfieldad, J. Townsendad, E. Courchesnead, T. J. Sejnowski, 2000 Removal of eye activity artefacts from visual event-related potentials in normal and clinical subjects. Clinical Neurophysiology, 111 1745 1758 .
        71 - M. Kaper, P. Meinicke, U. Grossekathoefer, T. Lingner, H. Ritter, 2004 BCI competition 2003-data set iib: support vector machines for the p300 speller paradigm. IEEE Trans. Biomed. Eng., 51 1073 1076 .
        72 - Z. A. Keirn, J. I. Aunon, 1990 A new mode of communication between man and his surroundings. IEEE Trans Biomed Eng, 37, 1209 1214 .
        73 - T. Kohonen, 1990 The Self-Organizing Map. Proc. IEEE. 78 9 1161 1180 .
        74 - Z. J. Koles, 1991 The quantitative extraction and topographic mapping of the abnormal components in the clinical EEG, Electroenceph. Clin. Neurophysiol., 79 440 447 .
        75 - W. Lang, D. Cheyne, P. Hollinger, W. Gerschlager, G. Lindinger, 1996 Electric and magnetic fields of the brain accompanying internal simulation of movement, Cogn Brain Res, 3, 125 129 .
        76 - E. Lehmann, W. Skrandies, 1984 Spatial Analysis of Evoked Potentials in Man- A Review. Progress in Neurobiology, 23 227 250 .
        77 - H. Lee, S. Choi, 2003 PCA+HMM+SVM for EEG pattern classification. In: Proceedings of the Seventh International Symposium on Signal Processing and Its Applications.
        78 - S. Lemm, C. Schafer, G. Curio, 2004 BCI competition 2003-data set iii: probabilistic modelling of sensorimotor mu rhythms for classification of imaginary hand movements. IEEE Trans. Biomed. Eng., 51 1077 1080 .
        79 - A. Lécuyer, F. Lotte, R. B. Reilly, R. Leeb, M. Hirose, M. Slater, 2008 Brain-Computer Interfaces, Virtual Reality, and Videogames. IEEE Computer, 41 10 66 72 .
        80 - F. Lotte, M. Congedo, A. Lécuyer, F. Lamarcheand, B. Arnaldi, 2007 A review of classification algorithms for EEG-based brain-computer interfaces. J. Neural Eng., 4, R1 R13 .
        81 - S. Mason, A. Bashashati, M. Fatourechi, K. Navarro, G. Birch, 2007 A comprehensive survey of brain interface technology designs, Annals of Biomedical Engineering, 35 2 137 169 .
        82 - P. Meinicke, M. Kaper, F. Hoppe, M. Heumann, H. Ritter, 2003 Improving Transfer Rates in Brain Computer Interfacing: a Case Study. In: Advances in Neural Information Processing Systems, Cambridge, MA, 15, Eds.: Suzanna Becker and Sebastian Thrun and Klaus Obermayer, MIT Press.
        83 - D. J. Mc Farland, L. M. Mc Cane, S. V. . David, J. R. Wolpaw, 1997a Spatial filter selection for EEG-based communication Electroencep. Clin. Neurophysiol., 103 386 394 .
        84 - D. J. Mc Farland, A. T. Lefkowicz, J. R. Wolpaw, 1997b Design and operation of an EEGbased brain-computer interface (BCI) with digital signal processing technology. Behav. Res. Meth. Instrum. Comput., 29 337 345 .
        85 - D. J. Mc Farland, L. A. Miner, T. M. Vaughan, J. R. Wolpaw, 2000 Mu and beta rhythm topographies during motor imagery and actual movement. Brain Topogr, 3, 177 186 .
        86 - D. J. . Mc Farland, J. R. Wolpaw, 2005 Sensorimotor rhythm-based brain-computer interface (BCI): feature selection by regression improves performance. IEEE Trans. on Rehabilitation Engineering, 13 3 372 379 .
        87 - J. Millan, J. Mourino, F. Cincotti, M. Varsta, J. Heikkonen, F. Topani, M. G. Marciani, K. Kaski, F. Babiloni, 2000 Neural networks for robust classification of mental tasks. Proc. 22nd Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society, (Chicago, IL), 1380 1382 .
        88 - J. Millan, J. Mourino, M. Franze, F. Cincotti, M. Varsta, J. Heikkonen, F. Babiloni, 2002 A local neural classifier for the recognition of EEG patterns associated to mental tasks. IEEE Trans. Neural Netw., 13 678 686 .
        89 - J. R. Millán, J. Mouriño, 2003 Asynchronous BCI and local neural classifiers: an overview of the adaptive brain interface project. IEEE Trans. on Rehabilitation Engineering, 11 159-61.
        90 - B. Moghaddam, (2002, (2002).Principal Manifolds and Probabilistic Subspaces for Visual Recognition. IEEE Trans. Patern Analysis and Machine Intelligence, 24 24 6 780 788 .
        91 - J. Müller-Gerking, G. Pfurtscheller, H. Flyvbjerg, 1999 Designing optimal spatial filters for single-trial EEG classification in a movement task. Clinical Neurophysiology, 110 5 787 798 .
        92 - K. R. Müller, C. W. Anderson, G. E. Birch, 2003 Linear and nonlinear methods for brain-computer interfaces. IEEE Trans. on Rehabilitation Engineering, 11 165 169 .
        93 - T. Musha, Y. Terasaki, H. A. Haque, G. A. Ivanitsky, 1997 Feature extraction from EEGs associated with emotions. Artificial Life and Robotics, 1 1 15 19 .
        94 - K. R. Müller, M. Krauledat, G. Dornhege, G. Curio, B. Blankertz, 2004 Machine learning techniques for brain-computer interfaces. Biomed. Technol., 49 11 22 .
        95 - A. Nakamura, T. Sugi, A. Ikeda, R. Kakigi, H. Shibasaki, 1996 Clinical application of automatic integrative interpretation of awake background EEG: quantitative interpretation, report making, and detection of artefacts and reduced vigilance level. Electroenceph. Clin. Neurophysiol., 98 103 112 .
        96 - C. Neuper, G. Pfurtscheller, 1999a Motor imagery and ERD Related Desyncronization, Handbook of Electroencepalography and Clinical Neurophysiology, (Revised Edition), 303 525 , 6 Elsevier, Amsterdam.
        97 - C. Neuper, A. Schlögl, G. Pfurtscheller, 1999b Enhancement of left-right sensorimotor EEG differences during feedback-regulated motor imagery. J. Clin. Neurophysiol., 16 373 382 .
        98 - A. Y. Ng, M. I. Jordan, 2002 On generative versus discriminative classifiers: a comparison of logistic regression and naive Bayes. Neural Information Processing Systems, Ng, A.Y., and Jordan, M.
        99 - M. A. L. Nicolelis, 2001 Actions from Thoughts. In: Nature, 409 January 18, 403 407 .
        100 - B. Obermaier, C. Neuper, C. Guger, G. Pfurtscheller, 2000 Information transfer rate in a five-classes brain-computer interface. IEEE Trans. Neural Syst. Rehabil. Eng., 9 283 288 .
        101 - B. Obermeier, C. Guger, C. Neuper, G. Pfurtscheller, 2001 Hidden markov models for online classification of single trial EEG. Pattern recognition letters, 1299 1309 .
        102 - F. F. Offner, 1950 The EEG as Potential Mapping: The Value of the Average Monopolar Reference. Electroencephalography and Clinical Neurophysiology, VoI. 2, 215 216 .
        103 - K. Pearson, 1901 On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine 2 6 559 572 .
        104 - W. D. Penny, S. J. Roberts, 1999 EEG-based communication via dynamic neural network models. Proc. Int. Joint Conf. on Neural Networks.
        105 - B. O. Peters, G. Pfurtscheller, H. Flyvbjerg, 2001 Automatic differentiation of multichannel EEG signals. IEEE Trans. Biomed. Eng., 48 111 116 .
        106 - G. Pfurtscheller, J. Kalcher, C. Neuper, D. Flotzinger, M. Pregenzer, 1996 On-line EEG classification during externally-paced hand movements using a neural network-based classifier. Electroenceph. Clin. Neurophysiol., 99 416 425 .
        107 - G. Pfurtscheller, C. Neuper, D. Flotzinger, M. Pregenzer, 1997 EEG-based discrimination between imagination of right and left hand movement, Electroenceph clin Neurophysiol, 103, 642 651 .
        108 - G. Pfurtscheller, A. Aranibar, 1977 Event related cortical desynchronization detected by power measurements of scalp EEG. Electroenceph Clin Neurophysiol, 42 817 826 .
        109 - G. Pfurtscheller, C. Neuper, A. Schlogl, K. Lugger, 1998 Separability of EEG signals recorded during right and left motor imagery using adaptive autoregressive parameters. IEEE Trans. on Rehabilitation Engineering, 6 316 325 .
        110 - G. Pfurtscheller, C. Neuper, H. Ramoserb, J. Müller-Gerking, 1999a Visually guided´lmotor imagery activates sensorimotor areas in humans. Neuroscience Letters, 269 3 153 156 .
        111 - G. Pfurtscheller, C. Guger, 1999b Brain-computer communication system: EEG-based control of hand orthesis in a tetraplegic patient. Acta Chir. Austriaca, 31 23-25.
        112 - G. Pfurtscheller, 1999c EEG Event-related Desynchronisation (ERD) and Event-related Synchronisation (ERS). In Niedermeyer, 958 967 , E., Lopes da Silva, F.H. (Eds.), Electroencephalography. Basic Principles, Clinical Applications, and Related Fields, forth ed., Williams and Wilkins, Baltimore.
        113 - G. Pfurtscheller, C. Neuper, C. Guger, W. Harkam, H. Ramoser, A. Schlogl, B. Obermaier, M. Pregenzer, 2000a Current trends in Graz brain-computer interface (BCI) research. IEEE Trans. on Rehabilitation Engineering, 8 2 216 219 .
        114 - G. Pfurtscheller, C. Guger, G. Müller, G. Krausz, C. Neuper, 2000b Brain oscillations control hand orthosis in a tetraplegic, Neurosci. Lett., 292 211 214 .
        115 - G. Pfurtscheller, C. Neuper, 2001 Motor imagery and direct brain-computer communication. proceedings of the IEEE, 89 7 1123 1134 .
        116 - G. Pfurtscheller, G. R. . Müller, J. Pfurtscheller, H. J. Gerner, R. Rupp, 2003 `Thought’- control of functional electrical stimulation to restore hand grasp in a patient with tetraplegia. Neuroscience Letters, 351 1 6 November, 33 36 .
        117 - J. Platt, N. Cristanini, J. Shawe-Taylor, 2000 Large margin DAGs for multiclass classification. Advances in Neural Information Processing Systems, 12 543 557 , MIT Press.
        118 - L. Qin, L. Ding, B. He, 2004 Motor imagery classification by means of source analysis for brain computer interface applications. Journal of Neural Engineering, 1 3 135 141 .
        119 - L. Qin, B. He, 2005 A wavelet-based time-frequency analysis approach for classification of motor imagery for brain-computer interface applications. J. Neural Eng., 2 65 72 .
        120 - L. R. Rabiner, 1989 A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77 257 286 .
        121 - S. J. Raudys, A. K. Jain, 1991 Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13 3 252 264 ,.
        122 - A. Rakotomamonjy, V. Guigue, G. Mallet, V. Alvarado, 2005 Ensemble of SVMs for improving brain computer interface 300 speller performances. Int. Conf. on Artificial Neural Networks.
        123 - H. Ramoser, J. Müller-Gerking, G. Pfurtscheller, 2000 Optimal spatial filtering of singletrial EEG during imagined hand movement. IEEE Trans. Rehab. Eng., 8 441 446 .
        124 - S. J. Roberts, W. D. Penny, 2000 Real-time brain-computer interfacing: a preliminary study using Bayesian learning. Med Biol Eng Comput., Jan, 38 1 56 61 .
        125 - Y. D. Rubinstein, T. Hastie, 1997 Discriminative versus informative learning. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, 49 53 .
        126 - Khorshidi. G. Salimi, A. Jaafari, Nasrabadi. A. Motie, Golpayegani. M. Hashemi, 2007 Modifying the Classic Peak Picking Technique Using a Fuzzy Multi Agent to Have an Accurate P300-based BCI. EUSFLAT Conf. (2) 143 147 .
        127 - G. Schwarz, 1978 Estimating the dimension of a model. Ann. Statist., 6 2 461 464 .
        128 - A. Schlögl, D. Flotzinger, G. Pfurtscheller, 1997 Adaptive autoregressive modelling used for single-trial EEG classification. Biomed. Technik, 42 162 167 .
        129 - R. Scherer, G. R. Müller, C. Neuper, B. Graimann, G. Pfurtscheller, 2004 An asynchronously controlled EEG-based virtual keyboard: improvement of the spelling rate. IEEE Trans. Biomed. Eng., 51 979 984 .
        130 - A. Schlögl, F. Lee, H. Bischof, G. Pfurtscheller, 2005 Characterization of four-class motor imagery EEG data for the BCI-competition. J. Neural Eng., 2 L14 L22 .
        131 - B. Scholkopf, S. Mika, C. J. C. Burges, P. Knirsch, K. R. müller, G. Ratsch, Smola, A. J. , 1999 Input space versus feature space in kernel-based methods. IEEE Trans. Neural Networks, 10 5 1000 1017 .
        132 - B. Schölkopf, A. J. Smolan, 2002 Learning with kernels. The MIT Press.
        133 - Lugger. A. Schlogl, K. , G. Pfurtscheller, 1997 Using adaptive autoregressive parameters for a brain-computer-interface experiment. In: Proceedings 19th International Conference IEEE/EMBS, 1533 1535 .
        134 - S. Solhjoo, A. M. Nasrabadi, M. R. H. Golpayegani, 2005 Classification of chaotic signals using HMM classifiers: EEG-based mental task classification. Proc. European Signal Processing Conference.
        135 - R. R. Stanny, 1989 Mapping the event related Potentials of the brain: Theoretical issues, Technical considerations and computer programs. Naval aerospace Medical Research Laboratory, Naval Air Station, Pensacola, Florida 32506 5700 .
        136 - S. Sun, M. Lan, Y. Lu, 2008 Adaptive EEG signal classification using stochastic approximation methods. Proceedings of the 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 413 416 .
        137 - C. Thomas, 1977 Firefox, New York, NY: Holt, Rinehart and Winston.
        138 - D. Trad, S. Helmy, T. Al-ani, S. Delaplace, 2009 Hidden Markov Models and Support Vector Machines applied to 300 Based Brain-Computer Interface- a Comparative study. ICTA 09, Hammamet, 7-9 Mai.
        139 - S. T. Urszula, M-k. Urszula, A. Kozik, 1999 Blinking Artefact Recognition in EEG Signal Using Artificial Neural Network. In: Fourth Conference Neural networks and Their Applications, Czestochowa.
        140 - V. N. Vapnik, 1995 The nature off Statistical Learning Theory. Springer-Verlag.
        141 - V. N. Vapnik, 1999 An overview of statistical learning theory. IEEE Trans. Neural Netw., 10 988 999 .
        142 - M. Varsta, J. Heikkonen, J. D. R. Millan, J. Mourino, 2000 Evaluating the performance of three feature sets for brain-computer interfaces with an early stopping MLP commitee. Int. Conf. on Pattern Recognition, (Barcelona, Spain), 907 910 .
        143 - J. J. Vidal, 1973 Toward direct brain-computer communication. In: Annual Review of Biophysics and Bioengineering, L. J. Mullins (Ed.), Annual Reviews, Inc., Palo Alto, 2 157 180 .
        144 - J. J. Vidal, 1977 Real-time detection of brain events in EEG. In: Proceedings of the IEEE, May 1977, 65 5 633 641 .
        145 - C. Vidaurre, A. Schlögl, R. Cabeza, R. Scherer, G. Pfurtscheller, 2007 Study of On-Line Adaptive Discriminant Analysis for EEG-Based Brain Computer Interfaces. IEEE Transactions On Biomedical Engineering, 54 3 March, 550 556 .
        146 - M. Vourkas, G. Papadourakis, S. Micheloyannis, 2000 Use of ann and hjorth parameters in mental-task discrimination. In: First International Conference on Advances in Medical Signal and Information Processing, 327 332 .
        147 - Y. Wang, P. Berg, M. Scherg, 1999 Common spatial subspace decomposition applied to analysis of brain responses under multiple task conditions: a simulation study. IEEE Trans. Biomed. Eng., 110 4 604 614 .
        148 - T. Wang, J. Deng, B. He, 2004a Classifying EEG-based motor imagery tasks by means of time-frequency synthesized spatial patterns. Clin. Neurophysiol., 115 2744-2753.
        149 - T. Wang, J. Deng, B. He, 2004b Classification of motor imagery EEG patterns and their topographic representation. Proc. 26th Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society, (San Francisco, CA), 4359 4362 .
        150 - D. L. Weber, 2001 Scalp current density and source current modelling. http://dnl.ucsf.edu/users/dweber/dweber_docs/eeg_scd.html. Visited on September, 10, 2009.
        151 - J. Wessberg, C. R. Stambaugh, J. D. Kralik, P. D. Beck, M. Laubach, J. K. Chapin, J. Kim, S. J. Biggs, M. A. Srinivasan, M. A. Nicolelis, 2000 Real-time prediction of hand trajectory by ensembles of cortical neurons in primates, Nature, 16 Nov, 408 361 365 .
        152 - M. G. Wentrup, K. Gramann, E. Wascher, M. Buss, 2005 Eeg source localization for brain-computer-interfaces. In: 2nd International IEEE EMBS Conference on Neural Engineering, 128 131 .
        153 - J. R. Wolpaw, D. J. Mc Farland, T. M. Vaughan, 2000 Brain-computer interfaces for communication and control, IEEE Trans. on Rehabilitation Engineering, June, 8 2 222 226 .
        154 - J. R. Wolpaw, N. Birbaumer, D. J. Mc Farland, G. Pfurtscheller, T. M. Vaughan, 2002 Brain-computer interfaces for communication and control, Clin. Neurophysiol, 113, 767 791 .
        155 - B. Xu-G, A. Song-G, 2008 Pattern recognition of motor imagery EEG using wavelet transform. J. Biomedical Science and Engineering, 1 64 67 .