## 1. Introduction

Fuzzy theory provides the basis for Fuzzy Inference Systems (FIS) which is a useful tool for classifications of data, static and dynamic process modeling and identification, decision making, classification, and control of processes. These characteristics could be used in different kinds of FIS and be applied to Brain Computer Interface (BCI) systems which are discussed in this chapter.

The first kind of FIS is designed based on the ability of fuzzy logic to model human perception. These FIS elaborates fuzzy rules originates from expert knowledge and they are called fuzzy expert system. Expert knowledge was also used prior to FIS to construct expert systems for simulation purposes. These expert systems were based on Boolean algebra and were not well defined to adapt to regressive intrinsic of underlying process phenomena. Despite that, fuzzy logic allows the rules to be gradually introduced into expert simulators due to input in a knowledge based manner. It also depicts the limitations of human knowledge, particularly the ambiguities in formalizing interactions in complex processes. This type of FIS offers high semantic degree and good generalization ability. Unfortunately, the complexity of large systems may lead to high ambiguities and insufficient accuracies which lead to poor performances [1].

Another class of modeling tools is based on adaptive knowledge based learning from data. This category includes supervised learning and outputs of observations when the training data is provided. A numerical performance index can be defined in such simulators which are usually based on the mean square error. Neural networks have become very popular and efficient in this field. Their main characteristic is the numerical accuracy while they also provide a qualitative black box behavior. The first self-learning FIS was proposed by Sugeno that provided a way to design the second kind of FIS [2]. In this case, even if the fuzzy rules are expressed in the form of expert rules, a loss of semantic occurs because of direct generation of weights from data. These types of simulators are usually named Adaptive Network Fuzzy Inference System (ANFIS).

There are methods for constructing fuzzy structures by using rule-based inference. These methods extract the rules directly from data and can be considered as rule generation approaches. Generation of the rules includes preliminary rule generation and rule adaptation according to input and output data. Automatic rule generation methods were applied to simple systems with limited number of variables. These simple systems do not need to optimize the rule base. It is different for complex systems. The number of generated rules becomes enormous and the description of rules becomes more complex due to the number of variables. The simulator will be easier to interpret if it is defined by the most influential variables. Also, the system behavior will be more comprehensive when the number of rules becomes smaller. Therefore, variable selection and rule reduction are two important subcategories of the rule generation process which is called structure optimization. A FIS has many more parameters that can also be optimized including membership functions and rule conclusions. A thorough study in these fields and their respective advantages and considerations are provided in the following sections.

In this chapter, several feature extraction and classification methods which could be applied to BCI systems are discussed.

## 2. Feature extraction

In the following sections, the features which are used to compose feature vectors are discussed. These features provide various views of EEG signal which can be used in ANFIS classification system.

### 2.1. Energy ratio features

EEG features in a BCI system can be obtained by the frequency analysis of the observed data sequence. For example, in steady state visual evoked potential (SSVEP) BCIs, the frequencies of the light oscillation should be detected. The frequency domain analysis gives a clear picture of changes. Because of frequency changes during BCI, energy ratios between different EEG sub-bands can be computed for each channel during BCI. It is shown that there would be BCI related changes in the EEG according to the brain activities and electrode locations ([3], [4] and [5]). For instance, to discover the brain rhythms during BCI, alpha (8-13 Hz), beta (13-35 Hz), delta (0-4 Hz), and theta (4-8 Hz) band energy ratios of spectrogram

### 2.2. Approximate entropy

Approximate entropy is a recently formulated family of parameters and statistics quantifying regularity (orderliness) in serial data [4]. It has been used mainly in the analysis of heart rate variability [7, 8], endocrine hormone release pulsatility [9], estimating regularity in epileptic seizure time series data [10] and estimating the depth of anesthesia [2]. Approximate entropy assigns a non-negative number to a time series, with larger values corresponding to more complexity or irregularity in the data. EEG signal represents regular and uniform pattern during synchronized cooperative function of cortical cells. This pattern results to low entropy values. In contrast, concentric functions and higher levels of brain activity lead to high values of entropy. Shannon entropy

in which

### 2.3. Fractal dimension

Fractal dimension emphasizes the geometric property of basin of attraction. These dimension show geometrical property of attractors and is also computed very fast [15]. Our goal was to associate each 5-second segment data as a trial to its corresponding class. To do this, features were extracted from each 1 second segment with 50% overlap, and sequence of 9 extracted features were considered as the feature vector of a 5-second segment, which was to be modeled and classified. In Higuchi’s algorithm, k new time series are constructed from the signal

in which

in which

### 2.4. Lyapunov exponent

Lyapunov exponents are a quantitative measure for distinguishing among the various types of orbits based upon their sensitive dependence on the initial conditions, and are used to determine the stability of any steady-state behavior, including chaotic solutions [4]. The reason why chaotic systems show aperiodic dynamics is that phase space trajectories that have nearly identical initial states will separate from each other at an exponentially increasing rate captured by the so called Lyapunov exponent. The Lyapunov exponents can be estimated from the observed time series [5]. This approach is described as follows: consider two (usually the nearest) neighboring points in phase space at time 0 and at time t, distances of the points in the i*th* direction being

or

Generally, Lyapunov exponents can be extracted from observed signals in two different ways [7]. The first method is based on the idea of following the time evolution of nearby points in the state space [17]. This method provides an estimation of the largest Lyapunov exponent only. The second method is based on the estimation of local Jacobi matrices and is capable of estimating all the Lyapunov exponents [18]. Vectors of all the Lyapunov exponents for particular systems are often called their Lyapunov spectra. This method was used for Lyapunov vector extraction in this section. An optimized size of 7 is considered for the vector which led to the best mean classification rate on the support vector machine (SVM) classifier.

### 2.5. Kalman feature extractor

The algorithm to be discussed in this section is based on the Kalman estimation, which is well known in statistical estimation and control theory ([8], [9], and [10]) but perhaps not so in parameter estimation. Therefore, the next paragraphs explain its function in the special context. Kalman filter is essentially a set of mathematical expressions that provides a predictor-modifier estimator. This estimator minimizes the error covariance and therefore is an optimum estimator if appropriate initial condition is selected. The condition for optimum estimation is rarely satisfied however the estimator performs well in sub-optimum situations. Kalman estimator is used for adaptive estimation of dynamic parameters of EEG. The estimator reduces the error variance adaptively and after a period of time a unique estimation is achieved [11].

A Kalman filter computes the response

in which

in which

The estimate and process errors are considered independent additive white Gaussian noises. In practice these are time varying processes which are considered stationary for simplicity.

If there is not input or process noise the matrix

Consider the system

in which

in which

and

The state-space representation of a linear system is much more flexible and useful than the transfer function form because it includes both time-dependent and time-independent systems and also encompasses stochastic and deterministic systems. Furthermore, it is possible to evaluate precisely concepts of observability and controllability, which are useful in determining whether the desired unknown parameters of a system can be estimated from the given observations for instance.

A modification is performed on state estimation algorithm in this section to overcame the lake of deterministic and stationary input

If

If the noise vectors

Where

The recursive Riccati equations are used to estimate the parameters [14]. The algorithm for kalman feature extraction is as follows [15]:

Kalman feature extractor uses the estimates of dynamics system state equation, with the new information contained in

## 3. Classification

The support vector machine (SVM) method has been used extensively for classification of EEG signals [16]. It is shown that EEG signal has separable intrinsic vectors which could be used in SVM classifier. SVM classifiers use discriminant hyper-planes for classification. The selected hyper-planes are those that maximize the margin of classification edges. The distance from the nearest training points usually measured based on a non-linear kernel to map the problem to a linear solvation space [17]. A Radial Basis Function (RBF) kernel based SVM is proposed here that Lagrangian optimization is performed using an adjustable ANFIS algorithm. It will be shown that due to conceptual nature of BCI for patients this proposed method leads to adjustable soft decision classification.

### 3.1. Classification using nonlinear SVM with RBF kernel

Training the SVM is a quadratic optimization problem in which the hyperplane is defined as [18]:

in which

in which

subject to

and,

Usually nonlinear kernels provide classification hyper-planes that cannot be achieved by linear weighting [20]. Using appropriate kernels, SVM offered an efficient tool for flexible classification with a highly nonlinear decision boundary. RBF kernel was used in this section which is defined as:

in which *C* and the kernel standard deviation *n*-fold cross-validation. It has been suggested that the parameters could be chosen by optimizing the upper bound of the generalization error solely based on training data [26]. The fraction of support vectors, i.e., the quotient between the number of support vectors and all training samples, gave an upper bound on the leave-one-out error estimate because the resulting decision function changed only when support vectors were omitted. Therefore, a low fraction of support vectors could be used as a criterion for the parameter selection.

### 3.2. Adjustable ANFIS optimization

An ANFIS system could be used with Sugeno fuzzy model for fine adjustment of SVM classification kernels. Such framework makes the ANFIS modeling more systematic and less reliant on expert knowledge and therefore facilitates learning and adjustment. The ANFIS structure is shown in Figure 3. In the first layer, all the nodes are adaptive nodes. The outputs of layer 1 are the fuzzy membership grade of the inputs, which are given by [21]:

which are the so-called normalized firing strengths. In the fourth layer, the nodes are adaptive nodes. The output of each node in this layer is simply the product of the normalized firing strength and a first order polynomial for the first order Sugeno model. The outputs of this layer are given by:

in which

It can be observed that there are two adaptive layers in this ANFIS architecture, namely the first layer and the fourth layer. In the first layer, there are three modifiable parameters

The task of the learning algorithm for this architecture is to tune all the above mentioned modifiable parameters to make the ANFIS output match the training data. When the premise parameters of the membership function are fixed, the output of the ANFIS model can be written as:

This is a linear combination of the modifiable consequent parameters

A hybrid algorithm combining the least squares method and the gradient descent method was adopted to identify the optimal values of these parameters [22]. The hybrid algorithm is composed of a forward pass and a backward pass. The least squares method (forward pass) was used to optimize the consequent parameters with the premise parameters fixed. Once the optimal consequent parameters are found, the backward pass starts immediately. The gradient descent method (backward pass) was used to adjust optimally the premise parameters corresponding to the fuzzy sets in the input domain. The output of the ANFIS was calculated by employing the consequent parameters found in the forward pass. The output error was used to adapt the premise parameters by means of a standard back propagation algorithm. It has been shown that this hybrid algorithm is highly efficient in training the ANFIS [21].

A classification method base on ANFIS adapted SVM proposed in this section. It showed that non-linear features improve classification rate as an effective component. Through the feature space constructed using approximate entropy and fractal dimension in addition to conventional spectral features, different stages of EEG signals can be recognized from each other expressly. The successful implementations of ANFIS-SVM for EEG signal classification was reported in this context. The results confirmed that this method provides better performance in datasets with lower dimension. This performance achieved for RBF kernel of SVM modified by ANFIS. This section can be a strong base for improved methods in the field of BCI cognition and analysis for therapeutic applications.

### 3.3. Recursive least square

In this method a third order AR model is used that directly estimates EEG parameters without using any intrinsic model. Such linear methods could be used to estimate nonlinear systems in piece wise linear manner.

Use of parametric model provides a time variant estimate of state equations of system at the working point. It is in accordance to the highly variable nature of EEG signals. In fact the modeling of nonlinear and time variant signal using a feature vector with limited dimensions provides a filtering on data. The abstract characteristics of signal would be extracted that leads to lower variance compared to the frequency feature extraction methods. Dimension of feature vector is a critical point to avoid over learning for high values of features or lake of convergence due to small size of feature vector.

### 3.4. Coupled hidden Markov models

Use of multimodal approaches provides improvement in robustness of classification under disturbances. This could extend the margin of security for BCI systems. Integration of two sets of features namely set *a posteriori* probability function [25]:

Where

The parameters are nonlinear which could be seen in equations 3 and 4. The system identification is based on linear ARMA model that means the parameters are computed well only around the operating point of the system. The operating point of system is time-varying and therefore the parameters vary with time. Therefore, the feature vector parameters are computed at each frame during train and test of HMM.

Training of the above mentioned multistream model could be done by synchronous and asynchronous methods. In this section the multistream approach is based on pseudo-synchronous coupled hidden Markov model. As it will be shown, it is an interesting option for multimodal continuous speech identification because of, 1) synchronous multimodal continuous speech identification, 2) consideration of asynchrony between streams. Some resynchronization points are defined at the beginning and end of BCI segments including phonemes or words.

Some resynchronization points are defined at the beginning and end of BCI segments including phonemes or words. Combination of the independent likelihoods is done by multiplying the segment likelihoods from the two streams, thus assuming conditional independence of the streams according to:

The weighting factor

BCI with dynamic parameters was studied in this section. The algorithm adopted is based on pseudo-synchronous hidden Markov chains to model the asynchrony between events. The proposed combination of ARMA model and Kalman filtering for feature extraction resulted to the best identification rates compared to usual methods. Complimentary effect of video information and dynamic parameters on BCI was studied. Effectiveness of the proposed identification system was more beneficial in low signal to noise ratios which reveals the robustness of the algorithm adopted in this study. Owing to high rate of information, the voice of speakers has significant effect on the identification rate; however, it reduces rapidly due to environmental noise. It was also shown that a specific combination weight of voice and video information provides optimum identification rate which is dependent on the signal to noise ratio and provides low dependency on the environmental noise. The phonetic content of spoken phrases was evaluated and the phonemes were sorted based on their influence on BCI rate. Identification rate of proposed model based system was compared to other parameter extraction methods including Kalman filtering, neural network, ANFIS system and auto regressive moving average. The combination of proposed model with Kalman filter led to the best identification performance. Combination of feature vectors content has a great roll on identification rate, Therefore, more efficient methods rather than pseudo-synchronized hidden Markov chain could be used for better achievements. Application of feature extraction methods like sample entropy, fractal dimension and nonlinear model based approaches have shown appropriate performance in BCI processing and could result to better identification rates in this area. Lip shape extraction is a critical point in this identification method and more robust algorithms provide accurate and precise results.

## 4. Conclusions

Some promising methods for feature extraction and classification of EEG signal are described in this chapter. The aim of these methods is to overcome the ambiguities encountering BCI applications.

A feature extraction algorithm based on the Kalman estimation discussed. This estimator minimizes the error covariance and therefore is an optimum estimator if appropriate initial condition is selected. Kalman estimator is used for adaptive estimation of dynamic parameters of EEG. The estimator reduces the error variance adaptively and after a period of time a unique estimation is achieved.

The SVM has been used extensively for classification of EEG signals. It is shown that EEG signal has separable intrinsic vectors which could be used in SVM classifier. SVM classifiers use discriminant hyper-planes for classification. The selected hyper-planes are those that maximize the margin of classification edges. The distance from the nearest training points usually measured based on a non-linear kernel to map the problem to a linear solvation space. A RBF kernel based SVM is proposed here that Lagrangian optimization is performed using an adjustable ANFIS algorithm. It will be shown that this method leads to adjustable soft decision classification.

The combination of proposed model with Kalman filter can lead to the best identification performance. Combination of different feature sets has a great roll on classification rate, Therefore, more efficient methods rather than pseudo-synchronized hidden Markov chain could be used for better achievements. Application of feature extraction methods like sample entropy, fractal dimension and nonlinear model based approaches have shown appropriate performance and could result to better identification rates in this area.

Through the feature space constructed using approximate entropy and fractal dimension in addition to conventional spectral features, different stages of EEG signals can be recognized from each other expressly. These methods provide better performance in datasets with lower dimension. The RBF kernel of SVM modified by ANFIS can be a strong base for improved methods in the field of BCI cognition and analysis for therapeutic applications.