Classification of Soft Tissue Tumors by Machine Learning Algorithms

MR imaging is currently regarded as the standard diagnostic tool for detection and grading of soft tissue tumors (STT ) (De Schepper et al. (2005)). Soft tissue is a term describing all the supporting, connecting or tissues surrounding other structures and organs of the body such as fat, muscle, blood vessels, deep skin tissues, nerves and the tissues around joints (synovial tissues). Soft tissue tumors can grow almost anywhere in the human body. Soft tissue sarcomas, which are the malignant type of STT , are grouped together because they share certain microscopic characteristics, have similar symptoms, and are generally treated in similar ways. Radiologists often look for certain features in the MR image to differentiate benign from malignant STT tumors (Juan et al. (2004); Mutlu et al. (2006)). Although the signal characteristics of both benign and malignant tumors frequently overlap, some MR image features are more highly correlated to the benign or the malignant types of STT , see De Schepper et al. (2000) and De Schepper & Bloem (2007). For example, the most commonly used individual parameters for predicting malignancy are the inhomogeneity (texture) and the intensity (gray level) of the MRI signal with different pulse sequences (De Schepper et al. (2005); Hermann et al. (1992)). Inhomogeneity of the tumor region on T1-weightedMR images is a very good indicator of the malignancy of the tumor because 90% of malignant tumors are inhomogeneous and show a disorganized textured pattern of the MRI signal intensity (Weatherall (1995)). This pattern is formed as a result of the losses of tissue structure and the changes of the extracellular matrix (ECM) by cancer. The study by ( Hermann et al. (1992)) reported a sensitivity of 72% and specificity of 87% in predicting malignancy based on visual comparison of texture in the tumor regions in T1-MR images. The reason for the large difference between the sensitivity and the specificity in this study is the difficulty of perceiving texture in some of the malignant tumors. The limited ability for human to perceive and discriminate between textures is well known for quite some time (Julesz (1975); Julesz et al. (1973)). Computer aided diagnostic systems can improve the radiologists performance in identifying the pathological type (i.e. benign or malignant) of a soft tissue tumor from MR images (Meinel et al. (2007)). Eventhough visually comparing the textures of benign tumor and malignant tumor sometimes show no difference, the extracted numerical values by texture analysis are quite different. Figure 1 shows subimages of a benign and a malignant tumors and the values of some of the extracted texture features. Such an example shows that 3


Introduction
MR imaging is currently regarded as the standard diagnostic tool for detection and grading of soft tissue tumors (STT ) (De Schepper et al. (2005)).Soft tissue is a term describing all the supporting, connecting or tissues surrounding other structures and organs of the body such as fat, muscle, blood vessels, deep skin tissues, nerves and the tissues around joints (synovial tissues).Soft tissue tumors can grow almost anywhere in the human body.Soft tissue sarcomas, which are the malignant type of STT , are grouped together because they share certain microscopic characteristics, have similar symptoms, and are generally treated in similar ways.Radiologists often look for certain features in the MR image to differentiate benign from malignant STT tumors (Juan et al. (2004); Mutlu et al. (2006)).Although the signal characteristics of both benign and malignant tumors frequently overlap, some MR image features are more highly correlated to the benign or the malignant types of STT , see De Schepper et al. (2000) and De Schepper & Bloem (2007).For example, the most commonly used individual parameters for predicting malignancy are the inhomogeneity (texture) and the intensity (gray level) of the MRI signal with different pulse sequences (De Schepper et al. (2005); Hermann et al. (1992)).Inhomogeneity of the tumor region on T1-weighted MR images is a very good indicator of the malignancy of the tumor because 90% of malignant tumors are inhomogeneous and show a disorganized textured pattern of the MRI signal intensity (Weatherall (1995)).This pattern is formed as a result of the losses of tissue structure and the changes of the extracellular matrix (ECM) by cancer.The study by ( Hermann et al. (1992)) reported a sensitivity of 72% and specificity of 87% in predicting malignancy based on visual comparison of texture in the tumor regions in T1-MR images.The reason for the large difference between the sensitivity and the specificity in this study is the difficulty of perceiving texture in some of the malignant tumors.The limited ability for human to perceive and discriminate between textures is well known for quite some time (Julesz (1975); Julesz et al. (1973)).Computer aided diagnostic systems can improve the radiologists performance in identifying the pathological type (i.e.benign or malignant) of a soft tissue tumor from MR images (Meinel et al. (2007)).Eventhough visually comparing the textures of benign tumor and malignant tumor sometimes show no difference, the extracted numerical values by texture analysis are quite different.Figure 1 shows subimages of a benign and a malignant tumors and the values of some of the extracted texture features.Such an example shows that texture analysis can be used for obtaining information that is not visible to the human eye.The reader can refer to (Materka & Strzelectky (1998); Tuceryan & Jain (1998); Wagner (1999)) as excellent references to texture analysis.In the last few years there has been growing interest in the use of machine learning classifiers for analyzing MRI data.The main aim of this chapter is to train and test several machine learning classifiers with texture analysis features extracted from MR images of soft tissue tumors.The present chapter will also serve as an introductory tutorial by providing a systematic procedure to build and evaluate a machine learning classifier that can be used for practical applications.The typical steps to build machine learning classifier consist of feature extraction, feature selection, classifier training and evaluation of the results.Several studies have tackled the problem of texture analysis for discriminating between benign and malignant tumors for specific type of malignancy, for example, the brain (Mahmoud-Ghoneim et al. ( 2003)) the liver (Jirák et al. (2002)) and the breast (Huang et al. (2006)).However, most papers did not follow the recommended approach for building machine learning systems (for an example see Salzberg (1997)) and left some unanswered questions.This research aims at answering some questions related to the problem of texture analysis of STT , such as the classifiers complexity, the effect of the training data set on the classifier behaviour and the appropriate size of the training data that can be used to train a machine learning classifier and obtain good generalization performance.In the following sections, we will go through the process of building and testing several machine learning classifiers as shown in Fig. 2. We warn the reader that the training dataset is not meant to train the classifier per se,a s the name implies, but should be considered as a representative statistical sample from the population of STT .We assume that the training and testing data samples are randomly, identically and independently sampled from the population of STT (i.e, it is an idd sample).The process of training and testing the classifier is a sort of statistical parameter estimation problem where in that case the parameter of interest is the error rate of the classifier performance in unseen data.As such, all the experiments in the following sections are in fact to study how the classifier perform in other unseen data from the same STT population.To put a classifier in real practice, the classifier should be trained and tested with several datasets sampled from the same population with the same procedure as outlined in the following sections.Once the classifier evaluation is finished, all the available data can be used to train the final classifier.The classifier should be comprehensively tested based on a prospective study before using the classifier.A shorter preliminary version of this chapter was published in Juntu et al. (2010).

Patients data set and the MR images
A large database of multicenter, multimachine MR images was collected by the University Hospital Antwerp (UZA) from different radiology centers for the purpose of conducting scientific research.At the start of this study, there was a real concern that texture features could be more sensitive to image variation due to imaging with different MRI systems or changes in MRI acquisition parameters than variation due to changes in texture as a result of pathological changes.However, a recent study by Mayerhoefer et al. (2005), clearly showed that the difference in texture features extracted from MR images obtained with different machine units seems to have only small impact on the results of tissue discrimination.In the present study, a database of T1-MR images of 86 patients having benign soft tissue tumors and 49 patients having malignant tumors were used in this retrospective study.All malignant and benign masses were histologically confirmed.We discarded all MR images that showed severe Fig. 1.An example of benign and malignant tumors texture imaging artifacts or that were corrupted by a high level of bias field inhomogeneity signal.From the tumor regions in the MR images, we cut square subimages of size 50 × 50 pixels for texture features computation.The physical size of that area is not fixed but it depends on the image acquisition parameters.However, the actual size of that area will not effect the values of the extracted features.To increase the size of the training dataset, we selected several tumor regions from the MR images for every patient.Hence, the total size of the dataset available for training consisted of 253 benign and 428 malignant subimages of size 50 × 50 pixels each.In order to preserve texture information, we avoid preprocessing the subimages.However, histogram equalization was applied to all the tumor subimages since some texture features such as the first order texture features are sensitive to graylevel variation.

Texture computation
Texture can be characterized and described in different ways using various sets and combinations of parameters.Most texture features computation was done using the software package MaZda 3.20 which allows the computation of texture features based on statistical, wavelet filtering, and model-based methods of analyzing texture (Castellano et al. (2004)).We also wrote other Matlab programs to calculate some texture features such as the Haralick's texture features to have a better and fine control of adjusting the parameters that effect the extracted features.To ensure the consistency of the calculated texture feature across all the tumor subimages, we wrote a MaZda macro script that reads the tumor subimages and calculates tumor texture with the same texture analysis parameters setting.The extracted texture features were saved in a text file for feature selection and classification.The following is a short description of the texture features that were computed from the tumor subimages, which are also summarized in Table 1 for easy reference: • First order statistics: extract texture statistics based on a function of a single pixel.• Second order statistics: the Haralick's texture features and the absolute gradient distribution are used in this study.In this method of texture analysis the correlation between two or more neighborhood pixels is taken into account.Since complex texture patterns are formed by the interaction between more than one pixel, second order statistics might provide extra texture information that can not be extracted based on first order statistics of the texture.The Haralick's texture analysis (Haralick et al. (1973)) is probably the most famous technique of second order texture analysis methods.It is based on the calculation of statistics from a function of two variables that measures the probability of occurrence of a pair of pixels that are separated by d pixels with an angle θ.We calculated 11 different Haralick's features from the co-occurrence matrix.The co-occurence matrix is calculated for every two pixels inclined by an angle θ and separated by a distance d.T o take the scaling and rotation of texture into account, we calculated the Haralick's features from the co-occurrence matrices calculated with angles {0 • ,45 • ,90 • , 135 • } and distances of {1, 2, 3, 4, 5} pixels.The absolute gradient texture features are also included to incorporate texture features that are invariant to gray-level scaling caused by bias field inhomogeneity.Every pixel in the image was replaced by the absolute gradient which was calculated from a window of size 3 × 3 around the pixel by calculating the absolute of the squared summation of the difference between the two pixels above and down the center pixel and the two pixels on the right and left.Doing that for all pixels resulted in a gradient image from which several statistical parameters could be obtained: the mean, the variance, the skewness, and the kurtosis.
• Higher order statistics: used to capture texture information which are dependent on the interaction between several neighborhood pixels.We selected two different approaches, -the run-length gray-level matrix approach were a consecutive set of pixels with the same gray level value are counted and the result is stored in a 2D matrix indexed by the gray-level value and length of the gray-level run.Several statistics are calculated from the 2D matrix.-write a mathematical function or model that describes the texture, for example the autoregressive texture model.The basic idea of autoregressive models for texture is to express a gray level of a pixel as a function of the gray levels of its neighborhood pixels Mao & Jain (1992).The related model parameters for one image are calculated using a least squares technique and are used as texture features.This approach is similar to the Markov random fields.
• Filtering method: The image is split into subbands with bandpass filters such as the wavelet transform.The energy of the sub-bands are used as a texture features.
After the texture analysis step, each tumor subimage is encoded by a feature vector as shown in Fig. 3.The texture features are labeled as { f 1 , f 2 , ......., f 290 } (see Table 1).

Feature selection
Feature selection was used to remove redundant features.This step is very important because it improves the performance of the learning models and reduces the effect of the curse of dimensionality.Feature selection also speeds the learning process and improves the model interpretability.Deciding which feature to keep, because it is relevant, and which one to discard, is largely dependent on the context.To perform an unbiased feature selection, we tested several feature selection techniques.We experimented with the following feature selection methods: Table 1.Texture analysis methods used in this study and the corresponding texture features • Unsupervised feature selection techniques: these methods do not use the class labels and the selected features are strongly dependent on the sample distribution of the pixels graylevel values.We selected texture features subsets by forward, backward, bidirectional, and greedy stepwise search methods and two feature ranking methods, namely, the chi-squares statistics and the information gain criteria ranking methods.
• Supervised selection techniques: these techniques use class labels for guiding the feature selection process, thus, the selected features are the ones that improve the discrimination between benign and malignant tumors.We used the C4.5 decision tree algorithm and the support vector machines as a wrappers.
Table 2 lists all the feature selection techniques that were tested in this study and their selected subset features.It is not surprising that the 8 feature selection methods selected different features subsets because each one has a different measure for feature relevance.However, feature selection methods that belong to the same group generally selected almost similar features.The selected features subsets were used as an input to a simple Bayes classifier to evaluate the efficacy of the texture features subsets.The results of the classification are listed in Table 2.We also listed the classification accuracy (Acc%), the True Positive (TP), the True Negative (TN) and the Area Under the Curve (AUC) of the ROC.The measure that is generally recommended to use is the AUC, since it is a global measure and insensitive to the data distribution.In the last row of Table 2, we included the performance of the Bayes classifier using the full textures features set for comparison.Looking at Table 2, one can notice that the classification results with the feature subsets selected by the feature ranking methods are worse than classification using the full texture feature since their AUC values are 0.72 and 0.75, respectively, while the full texture features classification has an AUC value of 0.78.The best texture features subset was the one that had the highest AUC value.The texture features subset with the highest AUC is the forward selection method which was used for training and testing the classifiers.

The trained classifiers
The main purpose of the training data is to infer a mathematical decision function or an algorithm for making prediction.Thereby, a given training data set is used to optimize the parameters of a machine learning classifier, which then results in a simple mathematical function or expression that can be used for making prediction.From several machine algorithm groups, we selected the following classifiers: Linear classifier: This classifier assumes that the benign and the malignant classes have the same covariance matrix but different means.It estimates the covariance matrix from the full training data and assigns a new case to the class with the highest probability.Such classifier is able to separate benign and malignant tumors by a simple linear decision surface.The probability distribution of the full training dataset is assumed to be normally distributed.
Quadratic classifier: This classifier is more complex than the linear classifier since it estimates different matrices for the means and covariance of the benign and the malignant classes.Such classifier is able to separate the benign and the malignant tumors by a quadratic nonlinear decision surface.The probability distributions of the benign and the malignant classes are assumed to be normally distributed but not necessary with the same covariance matrices.

Back-propagation neural network:
The NN-classifier separates the tumors by high nonlinear decision surface.The neural network uses an iterative optimization algorithm to find the weights of the neural network from the training data.

Support vector machine classifier:
The SVM classifier simplifies the classification problem by transforming the input space into high dimensional space such that the classification problem become a linear one and easier to solve.The SVM classifier does not depend on the probabilistic distribution of the training dataset and has the ability to generalize quite well for classification problems of varied degrees of complexities.During the training process, a quadratic optimization algorithm is used to iteratively adjust the complexity of the decision function to adopt to the problem domain.
In the following sections, we describe several tests that were performed to study the effect of the size of the training data set on the classifier performance.Additionally, we tested the complexity of the decision function, analyzed the classifier performance and statistically compared the performance of two classifiers.Finally, we tested the classifier performance against the radiologists' performance.

The size of the training data and the classifiers performance
The classifier learns the classification function from the training data.The training data represents a small sample from the population of soft tissue tumors and hence the size of the training data has an impact on the trained classifier.We run the learning curve test to study the effect of the size of the training data set on the classifier performance.Using a small subset of the training data, we tuned the parameters for each classifier as follows.
The back-propagation neural network has two hidden layers, an input layer of 12 nodes (i.e, number of selected texture features by the forward selection method) and an output layer with two nodes corresponding to the benign and the malignant classes.The SVM classifier is trained with an RBF kernel which is tuned with a grid search algorithm that resulted in a (σ = 10000) and a cost coefficient (C = 1.0).We used the PRTOOLS 4.0 matlab toolbox to run this experiment.We left the parameters of the decision trees and the Parzen classifier to their default values, which forces the PRTOOLS toolbox to tune them automatically to their best values.We trained the 7 classifiers with different sizes of the training data set.At each specific size of the training data set, we measured the error rate of all the classifiers.For each specific size of the training data, we repeated the experiment 10 times and the average error rate was calculated.Figure 4 shows the learning curves of the 7 trained classifiers.The learning curves show some interesting facts about the problem domain.First, the learning curves are smooth which is a good indicator of the classifiers stability against changes in the training data distribution .The smoothness of the learning curves is also a necessary condition for carrying some statistical tests that we used to compare the classifiers performance (Dietterich (1998)).

The complexity of the decision function
The learning curves from the last section showed that classifiers which produce simple decision functions generalize better since they have the smallest error rate on the testing samples.To check that conclusion we ran a test using an SVM classifier with a polynomial kernel that produces a polynomial decision function with a varied degree of complexity.We varied the degree of the polynomial kernel gradually from 1 to 20 and at each degree of the polynomial, we run the experiment 10 times using a crossvalidation procedure.Each point in the learning curves is the average of the error rates of ten different experiments.Figure 5 shows the error rate of the polynomial classifier versus the degree of the polynomial kernel function.The plot clearly shows that the error rate is minimum at a polynomial decision function of the 4 th degree.The error rates for the linear classifier (a 1 st degree polynomial) and the quadratic classifier (a 2 nd degree polynomial) are large since they under-fit the training data.A polynomial classifier higher than the 4 th degree also have high error rate since it 61 Classification of Soft Tissue Tumors by Machine Learning Algorithms www.intechopen.comoverfit the training data.This explains why in Fig. 4 that the simple linear classifier and the neural network classifier both have high error rates compared to other classifiers, because the linear classifier is too simple and the neural network classifier is too complex for the problem domain.That also explains why the SVM classifier has a good classification performance because it is very flexible and can adept to classification problems of varied complexity.

Analyzing the classifiers performance
To gain more insight into the classifiers' performance, we trained the 7 classifiers using the full data set with a 10-folds crossvalidation procedure.In Fig. 6 and Fig. 7, we plotted the ROC curves and the Cost curves of the 7 classifiers.In the ROC curves plot, the best curves are at the top of the plot.In the ROC curves, we see that the classifiers are ranked, according to an increase in performance, as follow: the decision trees, the neural networks, the linear classifier, the quadratic classifier and the k-NN classifier.However, there is an ambiguity about the ranking of the Parzen and SVM classifiers because their ROC curves intersect.In the Cost-curve plot, the classifiers are ranked in the same order as the ROC curves.However, this time the curves of the best classifiers are at the bottom of the plot.The Cost-curves of the Parzen classifier and the SVM classifier have the same normalized expected cost value for a probability cost function (PCF) between 0.45-0.75where both curves intersect.For a value of PCF < 0.45, the SVM classifier performance is better than the Parzen classifier while for the value of PCF > 0.75 the Parzen classifier performance is better.In other words, both classifiers perform equally well if the cost of classifying benign and malignant tumors is kept the same.However, if we would like to change the cost of classifying benign and malignant tumors, for example, we decided to give more cost for missing malignant tumors than missing benign tumors then both classifiers perform differently (see Holte & Drummond (2011)).The later observation explains why the SVM and Parzen classifier have an overlapping performance which is easy to explain from the ROC curves.and test the classifiers using multiple training and testing data.This procedure is repeated several times and then some statistical tests should be applied to assess their performance.Dietterich (1998) described an 5 × 2 cv algorithm that can be used to statistically compare the performance of two machine learning classifiers in the same classification problem.The name of the test is an abbreviation for "5 iterations 2-fold crossvalidation paired t-Test".The same test can be used to check if one classifier outperforms another classifier on a specific classification task.Let D be a dataset which is divided into five folds F 1 , F 2 , .., F 5 and let A and B be two classifiers that their performance will be compared.Let p {i} j stands for the difference in errors between the two classifiers in iteration j fold replication i.Then, the steps of the algorithm are as follows: • divide the first fold F 1 into two equal-sized parts t 1 and t 2 .Train both classifiers A and B using t 1 and test them using t 2 to obtain two error estimations e • repeat the same procedure for the remaining folds {F 2 , ..., F 5 } Let p (1) 1 denotes the difference p (1) from the first run, and s 2 i denote the estimated variance for run i, i = 1, ..., 5. Calculate the t-statistics using: Note that only one of the ten differences is used in the above expression.Dietterich (1998) has shown that under the null hypothesis, t is approximately a t-distributed with 5 degrees of freedom.The test can be used to check if two constructed classifiers have a similar error rate on new example.The null hypothesis indicates that the two classifiers have the same error rate and the alternative hypothesis indicates different error rates.We reject the null hypothesis with 95 percent confidence if t is larger than the tabulated t-statistics.Note that, there are 10 different values that can be placed in the numerator of Eq.(1) leading to 10 possible statistics.Selecting different values in the numerator of Eq.( 1) should not effect the results of the test.Practically, this is not always the case as shown in Alpaydin (1999), which proposed a modified test called the combined 5 × 2c v.The modified Dietterich test combines the results of the 10 possible statistics and uses more degrees of freedom which promises to be more robust and has better statistical power than the original Dietterich test.The new test calculates: and tests the estimated f against an F-statistics with 10 and 5 degrees of freedom.Reject the null hypothesis if f is larger than the tabulated F-statistics value (i.e., F = 4.74), otherwise, accept the null hypothesis.3. Error rates, differences and variances s 2 of the SVM classifer (A) and the Parzen (B) using 5 × 2-fold crossvalidation on tumors' texture.
We selected two classifiers from Fig. 7, namely, the SVM and the neural networks classifiers.We run the test to check whether both classifiers have similar performance or have different performance.The results of running the 5-iterations 2-fold crossvalidation algorithm are summarized in Table 3.Using Eq.( 2), we calculated f = 5.58 which is larger than the the theoretical F-statistics value.Hence, the null hypothesis that both classifiers have similar error rates was rejected.Therefore, according to the combined 5 × 2c vtest, the SVM classifier had better performance than the neural network classifier with 95% statistical confidence.
In conclusion, the test shows that some classifiers can have better performance than other classifier when trained with the same training dataset.

Machine learning versus radiologists performance
An important question is how machine learning classifiers perform compared to radiologists.
In the previous section, we used the modified 5 × 2 cv Dietterich test to compare two classifiers.However, we can not use the same test to compare a classifier performance against the radiologists diagnosis since the radiologist results can not be repeated.Instead, we applied the McNemar's test (Alpaydin (2001) construct two hypothesis: the null hypothesis H 0 is that there is no difference between the error rates or accuracies of the radiologists and the classifier and the alternative hypothesis H 1 is that the radiologists and the classifier have different performance.If the null hypothesis is correct, then the expected counts for both off-diagonal entries in Table( 4) are 1 2 (N 01 + N 10 ).The discrepancy between the expected and the observed counts is measured by the following statistics: which is, approximately, distributed as χ 2 with 1 degree of freedom.First, we run several experiments to find an optimal classifier.The best classifier so far was the SVM classifier.
The results of the SVM classifier against the radiologists are summarized in Table 5.Using Eq.3, we obtained χ2 = 12.85 which is larger than the tabulated χ 2 = 3.48.Hence, we rejected   the null hypothesis that both the radiologists and the SVM classifier have similar error rates.Therefore, the SVM seems to perform slightly better than the radiologist.This last conclusion should, however, be taken with a grain of salt because it is based on statistical analysis of the SVM classifier with a limited training data set that does not represent the full distribution of the soft tissue tumors.The McNemar's test does not tell us about the strength between the agreement or the disagreement between the radiologists and the SVM classifier to validate the previous test so we evaluated the kappa statistics ( κ = 0.5) which is larger than 0 which shows that the results of the McNemar's test is correct.Finally, the confusion matrix of the SVM classifier is shown in Fig. 8.The radiologist performance is also shown in Fig. 8.

Conclusions
We demonstrated that texture analysis of soft tissue tumors and machine learning algorithms can be used as a tool for objective evaluation of MR images and the results correlate well with the laboratory results.We ran several tests and come up with some interesting observation related to the problem of texture analysis of soft issue tumors.First, texture features combined with machine learning algorithms seems to perform as well as radiologists since computer can extract more information related to signal homogeneity in T1-MRI than what human can do based only on visual perception.Second, we do not need a large training data set to train a machine learning classifier and obtain a good classification performance since texture features correlate very well with the pathology of the tumor.Moreover, simple classifiers such as a Parzen classifier or an SVM classifier can effectively separate benign from malignant tumors.

Acknowledgments
Thanks to the University Hospital Antwerp (UZA), Dept. of Radiology for providing the MR images.The authors would like to thank Prof. Robert Holte for providing the Cost Curve software.

Fig. 2 .
Fig. 2. Block diagram of the chapter number of pixels in the image.A set of statistical parameters from the probability density function are calculated such as the mean, the variance, the skewness, and the kurtosis.

Fig. 3 .
Fig. 3. Texture analysis features First, we trained several classifiers that belong to different machine learning algorithms on the same texture features data.The selected classifiers are trained with crossvalidation procedure to make better use of the training data.The crossvalidation procedure also tries to minimize the effect of the probability distribution of a specific training dataset on the classifier performance.Second, we study the effect of changing the size of the training data set on the classifiers performance by plotting the learning curves that show the error rate of the trained classifiers as a function of the size of the training data set.Third, we used some statistical tests for comparison between the classifiers performance.We also plotted the ROC (Receiver Operating Curve) and the Cost curves to analyze the classifiers' performance.Finally, we applied the McNemar's statistical test to compare the performance of the best classifier against the radiologists' performance.
Parzen classifier and k-NN nearest neighborhood classifier.Both classifiers estimate the empirical probability density function of the benign 59 Classification of Soft Tissue Tumors by Machine Learning Algorithms www.intechopen.comand the malignant classes from the training data instead of assuming certain probability distribution function such as the linear and quadratic classifiers.Decision trees classifier: Such classifier uses logical rules to separate the benign form the malignant tumors regardless of the probability distribution of the training data.
Fig. 4. The learning curves of the 7 trained classifiers

Fig. 5 .
Fig. 5.The error rate versus the complexity of a polynomial classifier

65
Classification of Soft Tissue Tumors by Machine Learning Algorithms www.intechopen.com

Table 2 .
If the same classifier is trained f 6 , f 7 , f 8 , f 66 , f 169 , f 255 , f 263 , f 274 , f 279 , f 282 , f 286 76.80 0.80 0.74 0.87 Backward selection f 4 , f 6 , f 7 , f 8 , f 114 , f 253 , f 263 , f 274 , f 279 , f 281 , f 282 , f 286 77.70 0.80 0.74 0.85 Bidirectional search f 4 , f 6 , f 7 , f 8 , f 66 , f 169 , f 255 , f 263 , f 274 , f 279 , f 282 , f 286 77.10 0.79 0.73 0.86 Greedy stepwise search f 4 , f 6 , f 7 , f 8 , f 66 , f 253 , f f 263 , f 274 , f 279 , f 282 , f 286 78.00 0.83 0.69 0.83 Ranking with chi-squares statisticsf 7 , f 16 , f 37 , f 45 , f 46 , f 52 , f 251 , f 253 , f 255 , f 263 , f 265 , f268 67.99 0.65 0.73 0.72 Ranking with information gain f 7 , f 16 , f 37 , f 45 , f 46 , f 52 , f 251 , f 253 , f 254 , f 255 , f 268 , f 282 , f 286 65.34 0.56 0.81 0.75 C4.5 decision tree wrapper f 6 , f 21 , f 38 , f 49 , f 56 , f 64 , f 118 , f 164 , f 253 70.77 0.70 0.73 0.78 Best features with SVM wrapper f 5 , f 6 , f 13 , f 98 , f 172 , f 178 , f 216 , f 217 , f Bayes classifier results for the best selected texture features subsetson a different training data drawn independently and identically from the same problem domain, we expect to obtain a decision function with a similar performance.If the classifier performance stays the same independent of training with a specific training dataset, the classifier then learned how to differentiate benign from malignant tumors from the training data.However, if the classifier performance changes considerably by changing the training dataset, then that classifier can not be used for prediction.However, in principle the decision function (i.e. the classifier) can not be made completely independent from the structure of the training data and the complexity of the learning algorithm.To isolate all contributing factors that might interfere with training the classifier and to minimize the bias in the stated results, we systematically applied several machine learning evaluation strategies.
• swap t 1 and t 2 such that the classifiers are trained with t 2 and tested with t 1 .Re-train both classifiers and calculate new errors and new difference in errors p (2) = e 2 A − e 2

Table 4 .
).To apply McNemar's test, we first have to express the results of the radiologists and the SVM classifier as depicted in Table 4: Second, we A table used to perform McNemar's test.

Table 5 .
A table constructed for the McNemar's test