6 Relaxed Linear Separability ( RLS ) Approach to Feature ( Gene ) Subset Selection

Feature selection is one of active research area in pattern recognition or data mining methods (Duda et al., 2001). The importance of feature selection methods becomes apparent in the context of rapidly growing amount of data collected in contemporary databases (Liu & Motoda, 2008). Feature subset selection procedures are aimed at neglecting as large as possible number of such features (measurements) which are irrelevant or redundant for a given problem. The feature subset resulting from feature selection procedure should allow to build a model on the base of available learning data sets that generalizes better to new (unseen) data. For the purpose of designing classification or prediction models, the feature subset selection procedures are expected to produce higher classification or prediction accuracy. Feature selection problem is particularly important and challenging in the case when the number of objects represented in a given database is low in comparison to the number of features which have been used to characterise these objects. Such situation appears typically in exploration of genomic data sets where the number of features can be thousands of times greater than the number of objects. Here we are considering the relaxed linear separability (RLS) method of feature subset selection (Bobrowski & Łukaszuk, 2009). Such approach to feature selection problem refers to the concept of linear separability of the learning sets (Bobrowski, 2008). The term “relaxation” means here deterioration of the linear separability due to the gradual neglect of selected features. The considered approach to feature selection is based on repetitive minimization of the convex and piecewise-linear (CPL) criterion functions. These CPL criterion functions, which have origins in the theory of neural networks, include the cost of various features (Bobrowski, 2005). Increasing the cost of individual features makes these features falling out of the feature subspace. Quality the reduced feature subspaces is assessed by the accuracy of the CPL optimal classifiers built in this subspace. The article contains a new theoretical and experimental results related to the RLS method of feature subset selection. The experimental results have been achieved through the analysis, inter alia, two sets of genetic data.


Introduction
Feature selection is one of active research area in pattern recognition or data mining methods (Duda et al., 2001).The importance of feature selection methods becomes apparent in the context of rapidly growing amount of data collected in contemporary databases (Liu & Motoda, 2008).Feature subset selection procedures are aimed at neglecting as large as possible number of such features (measurements) which are irrelevant or redundant for a given problem.The feature subset resulting from feature selection procedure should allow to build a model on the base of available learning data sets that generalizes better to new (unseen) data.For the purpose of designing classification or prediction models, the feature subset selection procedures are expected to produce higher classification or prediction accuracy.Feature selection problem is particularly important and challenging in the case when the number of objects represented in a given database is low in comparison to the number of features which have been used to characterise these objects.Such situation appears typically in exploration of genomic data sets where the number of features can be thousands of times greater than the number of objects.
Here we are considering the relaxed linear separability ( RLS) method of feature subset selection (Bobrowski & Łukaszuk, 2009).Such approach to feature selection problem refers to the concept of linear separability of the learning sets (Bobrowski, 2008).The term "relaxation" means here deterioration of the linear separability due to the gradual neglect of selected features.The considered approach to feature selection is based on repetitive minimization of the convex and piecewise-linear (CPL) criterion functions.These CPL criterion functions, which have origins in the theory of neural networks, include the cost of various features (Bobrowski, 2005).Increasing the cost of individual features makes these features falling out of the feature subspace.Quality the reduced feature subspaces is assessed by the accuracy of the CPL optimal classifiers built in this subspace.The article contains a new theoretical and experimental results related to the RLS method of feature subset selection.The experimental results have been achieved through the analysis, inter alia, two sets of genetic data.

Linear separability of two learning sets
Suppose that m objects O j described in the database are represented by feature vectors x j [n] = [x j1 ,...,x jn ] T (j = 1,…,m).The feature vectors x j [n] can be treated as points in the ndimensional feature space F[n] (x j [n]  F [n]).The component x ji of the vector x j [n] is the numerical value of the i-th feature x i of the object O j .For example, in the case of clinical database, components x ji can be the numerical results of the i-th diagnostic examinations of a given patient O j .Consider two learning sets G + and G -built from n-dimensional feature vectors x j [n].The positive set G + contains m + feature vectors x j [n] and the negative set G -contains m -vectors x j [n]: where J + and J -are disjoined sets (J +  J -= ) of indices j.The Lemma 2 can be proved by using arguments related to the construction of bases in the feature space F[n] (Bobrowski, 2005).A base in the feature space F[n] can be created by any n feature vectors x j [n] which are linearly independent.Such n vectors x j [n] can be separated by the hyperplane H(w[n],) (3) for any subsets G + and G -(1).
It can be seen that the linear separability (2) can be formulated equivalently to (2) as (Bobrowski, 2005): where y j [n+1] are the augmented feature vectors, and v[n+1] is the augmented weight vector (Duda et al., 2001): The inequalities (4) are used in the definition of the convex and piecewise-linear (CPL)
The perceptron criterion function  (v[n+1]) is defined on the sets G + and G -(1) as the weighted sum of the penalty functions φ j + (v[n+1]) and φ j -(v[n+1]) (Bobrowski, 2005): where nonnegative parameters  j determine prices of particular feature vectors x j [n].We are interested in the finding minimum (v k * [n+1]) of the criterion function (v[n+1]): It has been proved that the minimal value * is equal to zero (* = 0) if and only if the sets G + and G -(1) are linearly separable (4) (Bobrowski, 2005).
Remark 1: The number n 1 of the independent vectors y j [n+1] in the matrix B k [n+1] (16) can be not greater than the rank r of the data set G +  G -(1).So, the number n 0 of the unit vectors The vertex v k [n+1] can be computed by using the basis B k [n+1] and the margin vector The criterion function ) is convex and piecewise-linear (CPL).The minimum of this function is located in one of the vertices v k [n+1] ( 17): The basis exchange algorithms allow to find efficiently the optimal vertex 12), even in the case of large, multidimentional data sets G + and G -(1) (Bobrowski, 1991).
which are related to the unit vectors e i [n+1] (iI k 0 ) in the basis B k [n+1] ( 16) are equal to zero (w ki = 0) (15).The n 0 features x i (i  I k 0 ) (15) with the weights w i equal to zero in the optimal vertex v k ^[n+1] (18) can be reduced without changing the separating hyperplane The following rule of feature reduction has been proposed on this base:  16) linked to the optimal vertex v k ^[n+1] (18) (Bobrowski, 2005).An arbitrary number n 0 of features x i can be omitted and the feature space F[n] can be reduced to the subspace F k ^[nn 0 ] by using of adequate value  k of the parameter  in the criterion function   (v[n +1]) (12).For example, the value  = 0 means that the optimal vertex v k ^[n +1] (18) constitutes the minimum of the perceptron criterion function (v[n+1]) (8) defined in the full feature space F [n]. On the other hand, sufficiently large value of the parameter  results in the optimal vertex v k ^[n+1] (18) equal to zero (v k ^[n+1] = 0).Such solution is not constructive, because it means that all the features x i have been reduced ( 19) and the separating hyperplane H(w[n],) (3) cannot be defined.For a given parameter value  =  k (12) the optimal vertex v k ^[n+1] ( 18) is determined unambiguously as the minimum (18) of the convex and piecewise linear function . This vertex is characterized by the subset of such nn 0 features x i which are not related to the unit vectors e can be also determined by such nn 0 features x i .Quality of the feature subspace F k ^[n k ] can be determined on the basis of the quality of the optimal linear classifier designed in this subspace of dimensionality n k .The optimal feature subspace F k * [n k ] can be identified as one that enables create the best linear classifier.The RLS method of feature subset selection is based on this scheme (Bobrowski, 2008;Bobrowski & Łukaszuk, 2009).
Comparing our approach with the approach based on the least-squares criterion, we can conclude that the discriminant function based on the least-squares criterion can be linked to www.intechopen.comthe Euclidean distance L 2 , whereas our method based on the convex and piece-wise linear criterion function (CPL) can be linked to the L 1 norm distance function.

Characteristics of the optimal vertices in the case of linear separabilty
Let us consider the case of long vectors in the exploratory data analysis.In this case, the dimensionality n of the feature vectors x j [n] is much greater than the number m (n  m) of these vectors (j = 1,…, m).We may expect in this case that the vectors x j [n] are linearly independent (Duda et al., 2001).In accordance with the Lemma 2, the arbitrary sets G + and G - (1) of linearly independent vectors x j [n] are linearly separable (6).The minimal value * (9) of the criterion function (v[n+1]) (8) defined on linearly separable sets G + and G -( 1) is always equal to zero (* = 0) (Bobrowski,2005).The minimum , where the below equations hold ( 15): where n = nn 0 is the dimensionality of the reduced feature vectors y j [n] obtained from y j [n+1] (5) after neglecting n 0 features x i related to the set I k 0 (15) and v k * [n] is the reduced vertex obtained from v k * [n+1] (9) by neglecting n 0 components w i equal to zero (w i = 0).The vectors y j [n] belong to the reduced feature subspace ).We can remark that if the learning sets G + [n] and G -[n] constituted from the vectors y j [n] are linearly separable (4) in a given feature subspace F k [n], there may be more than one optimal vertex v k * [n] creating the minimum (9) of the function ).In this case, each optimal vertex v k * [n] linearly separates (4) the sets G + [n] and G -[n] (Bobrowski, 2005): It can be proved that the modified criterion function   (v[n+1]) (12) with a sufficiently small cost level  (  0), has the minimal value (18) in the same vertex v k * [n+1] (9) as the perceptron criterion function (v[n+1]) (8) (Bobrowski, 2005):   12) gives possibility to introduce different feature costs  i (  i  0) related to particular features x i .As a result, the outcome of feature subset selection process can be influenced by the feature costs  i (12).

Relaxed linear separability (RLS) approach to feature selection
The The sequence (24) of the feature subspaces F k [n k ] is generated in a deterministic manner on the basis data sets G + and G -(1) in accordance with the relaxed linear separability (RLS) method (Bobrowski & Łukaszuk, 2009).
One of the problems in applying the RLS method is to assess the quality characteristics of successive subspaces F k [n k ] (24).In this approach, a quality of a given subspace F k [n k ] is evaluated on the basis of the optimal linear classifier designed in this subspace.The better optimal linear classifier means the better feature subspace where y[n k ]  F k [n k ], and the category (class)  + is represented by elements x j [n] of the learning set G + (1) and the category  -is represented by elements of the set G -. Definition 3: The CPL optimal linear classifier LC(v * [n k ]) is defined in the feature subspace  10)) (Bobrowski, 2005).
It has been proved that, if the learning sets G + [n k ] and G -[n k ] (26) are linearly separable (4), then the decision rule (25) based on the optimal vector v k * [n k ] (9) allocates correctly all elements y j [n k ] of these learning sets (Bobrowski, 2005).It means that ( 21):

Evaluation of linear classifiers
The quality of the linear classifier LC(v For the purpose of the classifiers bias reducing, the cross validation procedures are applied (Lachenbruch, 1975).The term p-fold cross validation means that the learning sets G + [n k ] and G -[n k ] (26) have been divided into p parts G i , where i = 1,…,p (for example p = 10).The vectors y j [n k ] contained in p -1 parts G i are used for definition of the criterion function  28) and as a result, to reduce the bias of the error rate estimation (28).The error rate (28) estimated during the cross validation procedure will be called the cross-validation error (CVE).
The CVE error rate e CVE (v * [n k ]) (28) of the linear classifier ( 25) is used in the relaxed linear separability (RLS) method as a basic criterion in evaluation of particular feature subspaces F k [n k ] in the sequence (24) (Bobrowski & Łukaszuk, 2009).Feature subspace F k [n k ] that is linked to the linear classifier LC(v * [n k ]) (25) with the lowest CVE error rate e CVE (v * [n k ]) can be considered as the optimal one in accordance with the RLS method of feature selection.

Toy example
The data set used in the experiment was generated by the authors.In the two-dimensional space seven points were selected.Four of them were assigned to the positive set G + , three to the negative set G -.The allocation of points to the sets G + and G -were made in a way that the linear separability of sets was preserved.After that each point was extended to 10 dimensions.The values the remaining coordinates were drawn from the distribution N(0,1).Table 1 contains the complete data set.Features x 2 and x 7 constitute the coordinates of points in the initial two-dimensional space.
Previously described the RLS method was applied to the data set presented in Table 1.Table 2 shows a sequence of feature subsets studied by the method and values of the apparent error (28) and the cross-validation error obtained in particular subsets of features.The best subset of features designated by the method is a subset F k [2] = {x 7 , x 2 }.It is characterized by the lowest value of the cross-validation error.

Experiment on synthetic data
The data set used in the experiment contained 1000 objects, each described by 100 features.Data were drawn from a multivariate normal distribution.The values of each feature had a mean equal to 0 and standard deviation equal to 1.All the features were independent of each other (diagonal covariance matrix).The objects were divided into two disjoined subsets G + and G -(1) in accordance with the values of the following linear combination: 3x 4 +4x 10 -7x 17 +2x 28 -6x 36 +3x 41 +3x 58 -8x 63 +x 75 -x 92 +5 (30) Objects corresponded to the value of expression (30) greater than 0 were assigned to subset G + .Objects corresponded to the value of expression (30) less than 0 were assigned to subset G -.The result was two linearly separable subsets G + and G -(1) containing 630 and 370 objects.
The RLS method of feature selection was applied in analysis of the so-prepared synthetic data.The expected result was the preference by the method the subset of features used in the expression (30).
Figure 4 shows the apparent error (AE) and cross-validation error (CVE) values in the various tested features subspaces generated by the RLS method.Each subspace larger than 10 features ships with all 10 features used in the expression (30).Subspace of size 10 consists only of the features used in the expression (30).The Leukemia (Golub et al., 1999) data set contains expression levels of 7129 genes taken over 72 samples.Labels of objects indicate which of two variants of leukemia is present in the sample: acute myeloid (AML, 25 samples), or acute lymphoblastic leukemias (ALL, 47 samples).
The Breast cancer (van't Veer et al., 2002) data set describes the patients tested for the presence of breast cancer.The data contains 97 patient samples, 46 of which are from patients who had developed distance metastases within 5 years (labelled as "relapse"), the rest 51 samples are from patients who remained healthy from the disease after their initial diagnosis for interval of at least 5 years (labelled as "non-relapse").The number of genes is 24481.

Conclusion
The problem of feature selection is usually resolved in practice through the evaluation of the usefulness (the validity) of individual features (attributes, factors) (Liu & Motoda, 2008).In this approach, resulting feature subsets are composed of such features which have the strongest individual influence on the analysed outcome.Such approach is related to the assumption about the independence of the factors.However, in a complex system, such as the living organism, these factors are often related.An advantage of the relaxed linear separability (RLS) method is that one may identify directly and efficiently a subset of features that influences the outcome and assesses the combined effect of these features as prognostic factors.
In accordance with the RLS method, the feature selection process involves two basic actions.The first of these actions is to generate the descending sequence (24) of feature subspaces This article also contains a description of the experiments with feature selection based on the RLS method.Experiments of the first group were carried out on synthetic data.The multivariate synthetic data were generated randomly and deterministically divided into two learning sets according of predetermined key.The given key was in the form of linear combination of the unknown number of selected features.The aim of the experiment was to find an unknown key, based on available sets of multidimensional data.The experiment confirmed this possibility.Experiments of the second group were carried out on the genetic data sets Leucemia (Golub et al., 1999) and Brest cancer (van't Veer et al., 2002).These experiments have shown, inter alia, that the RLS method enables finding interesting and not too large subsets of features, even if the number of features at the beginning is a huge.For example, in the case of the Brest cancer set, the feature space was reduced from the dimensionality n = 24481 till n k = 11 while the linear separability (27) of the learning sets G + [n k ] and G -[n k ] (26) were preserved.The results of calculations described in this paper were obtained by using its own implementation of the basis exchange algorithms (http://irys.wi.pb.edu.pl/dmp).

Remark 3 :
A sufficiently large increase of the cost level  (  0) in the criterion function   (v[n+1]) (12) results in an increase of the number n 0 of unit vectors e i [n+1] in the basis B k ^[n+1] ( 21) Moreover, in the case of long vectors there may exist many such feature subspaces F k [n] of a given feature space F[n] (F k [n]F[n]) which can assure the linear separability (21).Therefore, a question arises which of the vertices v k * [n] constituting the minimum (9) of the perceptron function (v[n+1]) (8) is the best one.The answer for a such question can be given on the basis of minimization of the modified criterion function   (v[n+1]) (12).In contrary to the perceptron criterion function (v[n+1]) (8) the modified criterion function   (v[n+1]) (12) has only one optimal vertex v k ^[n+1] (16).The vertex v k ^[n+1] (16) which constitutes minimum (18) of the function   (v[n+1]) (12) is unambiguously determined and can be treated as the optimal one.

)
In other words, the replacement of the perceptron criterion function (v[n+1]) (8) by the modified criterion function   (v[n+1]) (12) does not necessarily mean changing the position of the minimum.The modified criterion function   (v[n+1]) (12) can be expressed in the following manner for such points v[n+1] which separate linearly (4) the sets G + and G -(1): minimization of the criterion function   (v[n+1]) (12) can be replaced by the minimization of the function   (v[n+1]) (23) under the condition that the point v[n+1] linearly separates (4) the sets G + and G -(1).Remark 5: If the sets G + and G -(1) are linearly separable, then the vertex v k * [n+1] constituting the minimum of the function   (v[n+1]) (23) with equal feature costs  i has the lowest L 1 norm v k * [n+1]|| L1 =  i |v ki | among all such vectors v[n+1] which linearly separate (4) these sets.The Remark 5 points out a possible similarity between the CPL solution v k * [n+1] (22) and the optimal vector v * [n+1] obtained in the Support Vector Machines (SVM) approach (Vapnik, 1998).But the use of the CPL function   (v[n+1]) (12) also allows obtain other types of solutions v k * [n+1] (22) by another specification of feature costs  i and the cost level  parameters.The modified criterion function   (v[n+1]) ( F k [n k ].The feature subspace F k [n k ] can be obtained from the initial feature space F[n] by reducing the nn k features x i .Such reduction can be based on the optimal vertex v k ^[n+1] (18) with the related basis B k ^[n+1] (16).The optimal vertex v k ^[n+1] (18) appoints the minimum of the criterion function   (v[n+1]) (12) with the adequate value  k of the cost level .Definition 2: The reduced feature vectors y j [n k ] (y j [n k ]F k [n k ]) are obtained from the feature vectors y j [n+1] = [x j [n] T ,1] T (5) after neglecting nn k features x i related to the set I k 0 (15) of the optimal vertex v k ^[n+1] (18).The reduced vertex (parameter vector) v ^[n k ]=[w ^[n k-1 ] T ,- ^]T (5) and v k ^[n k ] is obtained from the optimal vertex v k ^[n+1] (18) by neglecting of these nn k components w i , which are equal to zero (w i = 0).The reduced parameter vector v[n k ] = [w[n k-1 ] T ,-] T (5) defines the linear classifier LC(v[n k ]) in the feature subspace F k [n k ].The linear classifier LC(v[n k ]) can be characterized by the following decision rule: 26) Remark 6: The minimal value  k * of the criterion function  k (v[n k ]) (8) on reduced feature vectors y j [n k ] is equal to zero ( k * = 0 ) if and only if the sets G + [n k ] and G -[n k ] are linearly separable (4) in the feature subspace F k [n k ] (similarly as ( and computing of the parameters v * [n k ].The remaining vectors y j [n k ] are used as the test set (one part G i ) for computing (evaluation) the error rate e(v * [n k ]) (28).Such evaluation is repeated p times, and each time different part G i is used as the test set.The cross valid1ation procedure allows to use different vectors y j [n k ] (1) for the classifier (25) designing and evaluation (

Fig. 4 .
Fig. 4. Points in the feature space selected by the RLS method, hyperplane separated points falling within the sets G + (denoted by circles) and G -(denoted by squares)

Fig. 5 .
Fig. 5.The apparent error (AE) and the cross-validation error (CVE) in different feature subspaces F k [n k ] of the synthetic data set

Fig. 6 .
Fig. 6.The apparent error (AE) and the cross-validation error (CVE) in different feature subspaces F k [n k ] of the Leukemia data set feature name F k [7] weights w i F k [3] weights w i attribute4951 -0,99614 -1,71845 attribute1882 -0,73666 -11,6251 attribute3847 -0,55316 -attribute6169 -0,47317 -attribute4973 0,41573 -attribute6539 -0,25898 -attribute1779 -0,1519 -1,69028 threshold  -0,55316 2,53742 F k [n k ].The second of the these actions is to evaluate the quality of the individual feature subspaces F k [n k ] in the sequence (24).Generation of descending sequence (24) of feature subspaces F k [n k ] is done in the deterministic manner by multiple minimization of the criterion function the criterion function   (v[n+1]) (12) combined with gradual increasing of the parameter  value.The criterion function   (v[n+1]) (12) depends on the three nonnegative parameters:  jprices of the particular feature vectors x j [n] (1),  i -feature costs, and  -the cost level.The composition of the consecutive feature subspaces F k [n k ] (24) depends on the choice of these parameters.For example, the costly features x i should have a sufficiently large values of the parameter  i .A high value of the parameter  i increases the chance for elimination of a given feature x i .Evaluation of the quality of the individual feature subspaces F k [n k ] is based in the RLS method on the cross-validation of the CPL optimal (Definition 3) linear classifier (25) designed in this subspace.The optimal linear classifier (25) is designed in the feature subspace F k [n k ] through the multiple minimization of the perceptron criterion function  k (v[n k ]) defined (8) on the reduced feature vectors y j [n k ] (y j [n k ]F k [n k ]) or the modified criterion function  k (v[n k ]) (12) with a small value (22) of the cost level .
The positive set G + usually contains vectors x j [n] of only one category.For example, the set G + may contain feature vectors x j [n] representing patients with cancer and set G -may represent patients without cancer.
Definition 1: The sets G + and G -(1) are linearly separable, if and only if there exists such a weight vector w[n] = [w 1 ,...,w n ] T (w[n]R n ) and threshold  (R), that all the below inequalities are fulfilled: Lemma 1: Such sets G + and G -(1) which are linearly separable (2) in the feature space F[n], are also linearly separable in any greater feature space F[n], where F[n]  F[n].The proof of the Lemma 1 is self-evident.The Lemma 1 shows, inter alia, that for any constant c the sets G + = {x j [n]: x ji  c} and G -={x j [n]: x ji  c} are linearly separable in each feature space F[n].Lemma 2: The sets G + and G -(1) constructed of linearly independent feature vectors x j [n] are always linearly separable (2) in the feature space F[n].
26) are not linearly separable (4), then not all but only a majority of the vectors y j [n k ] fulfil the above inequalities.According to the considerations of the previous paragraph, if the learning sets G + [n k ] and G -[n k ] (26) are linearly separable (4), then there is more than one vertex v i * [n k ] forming a minimum of the function  k (v[n k ]) (8).To avoid such ambiguity, the criterion function  k (v[n k ]) (8) can be replaced by the modified criterion function  k (v[n k ]) (12) with the small value (22) of the parameter .
[n k ]) = m e (v * [n k ]) / m (28) where m is the number of all elements y j [n k ] of the learning sets G + [n k ] and G -[n k ] (26) x j [n], and m e (v * [n k ]) is the number of elements y j [n k ] wrongly allocated by the rule (25).The parameters v * [n k ] of the linear classifier LC(v * [n k ]) (25) are estimated from the learning sets G + [n k ] and G -[n k ] (26) through minimization of the perceptron criterion function  k (v[n k ]) (8) determined on elements y j [n k ] of these sets.It is known that if the same datay j [n k ]is used for classifier designing and for classifier evaluation, then the evaluation results are too optimistic (biased).The error rate (28) evaluated on the elements y j [n k ] of the learning sets is called the apparent error (AE).For example, if the learning sets G + [n k ] and G -[n k ] (26) are linearly separable (4), then the relation (27) holds and, as a result, the apparent error (28) evaluated on elements y j [n k ] is equal to zero (e a (v * [n k ]) = 0).But it is observed in practice that the error rate of the classifier (25) evaluated on new vectors y[n k ] is usually greater than zero.
* [n k ]) (25) can be evaluated by using the error estimator (apparent error rate) e a (v * [n k ]) as the fraction of wrongly classified elements y j [n k ] of the learning sets G + [n k ] and G -[n k ] (26): e a (v *

G - Table 1 .
Feature vectors x j [10] constituting the sets G + and G -

Table 2 .
Subsets of features evaluated by the RLS method, apparent rate (AE) and crossvalidation error rate (CVE) obtained in particular subsets of features

Table 3 .
Features x i constituting the optimal subspace F k[7]characterised by the lowest crossvalidation error (CVE) and features x i constituting the lowest subspace F k[3]with apparent error (AE) equal to 0 of the Leukemia data set Original data sets come with training and test samples that were drawn from different conditions.Here we combine them together for the purpose of cross validation.Data have also been standardized before experiment.
Fig. 7.The apparent error (AE) and the cross-validation error (CVE) in different feature subspaces F k [n k ] of the Breast cancer data set

Table 4 .
[11]ures x i constituting the optimal subspace F k[14]characterised by the lowest cross-validation error (CVE) and features x i constituting the lowest subspace F k[11]with apparent error (AE) equal to 0 of the Breast cancer data set Figures6 and 7show the apparent error (AE) and cross-validation error (CVE) obtained in different feature subspaces generated by the RLS method.Full separability of data subsets is preserved in feature subsets much smaller than the initial very large sets of genes.