Fusion of Two-view Information: Svd Based Modeling for Computerized Classification of Breast Lesions on Mammograms

1 Universidade do Grande ABC 2 Universidade Federal do ABC 3 Universidade Federal do Vale do Sao Francisco Brazil


Introduction
Over the past few years, the cancer has been one of the most responsible for the high number of deaths, and could become one of the main responsible for most deaths in the next decades.According to the World Health Organization, the number of deaths due to cancer, which was just 13% in 2008, is currently having a significant increase and one estimates that this number could reach approximately 12 million until 2030 (Tang et al., 2009).
Breast cancer is the second most common and leading cause of cancer death among women in Brazil.The National Cancer Institute (INCA) reports more than 50,000 new cases of the disease, with risk of 51 cases per 100,000 women.In the southeastern region of the country, this number is about 34% higher than the national average, with an estimate of 68 new cases per 100,000 women.For this reason, studies have shown that early detection is the key to improve breast cancer prognosis (Brasil, 2009).Some works have shown that early detection of disease is a crucial factor for reducing mortality from breast cancer.Among the most important medical imaging methods (for example, MRI, ultrasonography and screen/film mammography), screen/film mammography is the method most easily accessible and has proved to be an effective aid for radiologists in the early detection of breast cancers and in the reduction of mortality rates.In this examination, four images are obtained, two corresponding to the right breast and two to the left breast of the projections cranio-caudal (CC) and medio-lateral oblique (MLO).The images acquisition improves visualization of breast tissue and increases the chances for detecting signs that characterize the presence of non-palpable lesions such as nodules, calcification, signs of bilateral asymmetry and architectural distortion (Rangayyan et al., 2007).
Retrospective evaluations of previous screening films of cancers, detected between screening rounds (interval cancers), show evidence of abnormality in 16% among 27% of cases.Missed cancers are due to many reasons: low disease prevalence, breast structure complexity, finding subtleties, and radiologist fatigue.To improve the accuracy of mammography, radiologists employed double reading of the same screening mammogram to increase the sensitivity rate (Kinoshita et al., 2007).Second reading of the screening mammograms by a human reader can increase cancer detection rates (Thurfjell et al., 1994) (Warren & Duffy, 1995).Obviously, this procedure is too expensive, complex, and time consuming particularly in screening programs where a huge number of mammographic images have to be read (Mencattini et al., 2010).
The development of computerized systems as second readers represents an alternative.Computer-Aided Detection (CADe) and Computer-Aided Diagnosis (CADx) systems have been applied to mammographic images to assist radiologists on lesions analysis such as microcalcification, mass and architectural distortions.Algorithms for image processing together with artificial intelligence techniques, such as neural network and fuzzy logic, are used in order to enhance, segment, extract features and classify abnormalities (Jiang et al., 2007), (Balleyguier et al., 2007), (Doi et al., 1999).CADe schemes are systems that automatically detect suspicious lesions in mammograms, being used as a localization task.CADx systems extend the computer analysis to yield as output the characterization of a region or the estimated probability of lesion malignancy.The present chapter is focused on the classification task.
Although CADe and CADx systems have provided a large number of research and high rates of sensitivity, the majority of these works analyzes the MLO and CC views independently.In some situations, this system detects abnormalities in only one of the views.Radiologists believe that there is an inconsistency if a particular lesion is similar in both views and the system does not have the capability to find it.Studies have shown that these limitations have changed their impressions and radiologists are ignoring the results provided by these systems (Doi et al., 1999).
Several computer algorithms are used for identification of abnormalities in the breast by extracting features directly from digitized mammograms.Typically, two classes of features are extracted from mammograms with these algorithms, namely morphological and non-morphological features.Morphological features are intended to describe information related to the morphology of a lesion, such as lesion size and shape.Image texture analysis is an important class that represents gray level properties of images used to describe non-morphological features that are not easily interpreted by humans.This information can be obtained through a variety of image processing algorithms, calculated using a variety of statistical, structural and spectral techniques including co-occurrence matrices, fractal dimensions and multiresolution techniques.Using a different space by special data transform such as Fourier transform or wavelets transform could be helpful to separate a special data that contain specific characteristics.Multiresolution analysis allows for the preservation of an image according to certain level of resolution, i.e., it allows for the zooming in and out on the underlying texture structure.Therefore, the texture extraction is not affected by the size of the pixel neighborhood (Eltoukhy et al., 2010).
In this chapter, we present a method for extraction and selection of texture-related attributes and classification using the fusion of information from both CC and MLO visions.In the extraction stage, the wavelet transform method was applied to provide texture-related attributes for the considered images.Following, the singular value decomposition (SVD) technique was used to reduce the number of attributes.The application of analysis of variance (ANOVA) was also conducted for further reduction of the attributes after the SVD application.In the final step, we used the Random Forest and SVM classifiers to analyze mammogram lesions.The overall performance of the proposed method was evaluated by means of the area under the ROC curve.

Data set
The database used in this work encompasses mammographic screen/film digitized images taken from the Digital Database for Screening Mammography (DDSM) (Balleyguier et al., 2007).The DDSM project is a joint effort of researchers from the Massachusetts General Hospital (D. Kopans, R Moore), the University of South Florida (K.Bowyer), and the Sandia National Laboratories -EUA (P.Kegelmeyer).The DDSM database has been widely used as a benchmark for numerous articles on the mammographic area, for being free of charge and having a vast and diverse quantity of cases.It is constituted of mammographic images and its corresponding technical and clinical information, including exam dates, age of patients, digitalization equipment (as well as resolution, number of rows, pixels per row and bits per pixel of the acquired images), lesion types (according to BIRADS®), and existent pathologies.
For the evaluation of the algorithms, a data set was used comprising 480 mammographic images from 160 patients (being 80 with no lesions and 80 with malignant lesion) which were randomly selected from DDSM dataset.These lesions have different sizes, densities, and margins types.For each case, we used four mammograms, taken from the left and the right breasts, obtained in CC and MLO views.We selected images digitized with a Lumisys laser film scanner at 50 mm pixel size.Each image has a resolution of 4096 gray level tones.The location and size of a mass, when it exists, were taken from the code-chain of the ".ics" file available at the DDSM project and were used to automatically extract square sub-images called regions of interest (ROIs) from the original image.The images used in the experiments were cuttings of size 128×128 pixels done in the sub-image, whose centers correspond to the centers of the presented abnormalities.An example of the cropping process that eliminates image label and background is given in Figure 1.To obtain subimages with no mass, we have followed the same procedure, except that the location was randomly taken from a healthy part of the mammogram.With this approach, a total of 480 sub-images were acquired, and each of these cropped images was used for texture feature extraction and subsequent classification.

Multiresolution analysis
Image analysis on multiple scales allows image resolution to be modified in order to process as little data as possible.This is accomplished by selecting relevant details for a given task (Mallat, 1996).The basic idea is to represent the image on several resolutions and analyse them on frequency domain, through the application of function transforms.In this chapter, we analyse the performance of wavelet transforms on a multiresolution environment.The basics of wavelet transforms are presented as follows.

Wavelet transforms
For a given function f (x) ∈ R, the wavelet transform Wf(a, b) ∈ R is obtained from the inner product of f (x) with a wavelet family, i.e.: where and centered in the neighborhood of x = 0. Examples of wavelets can be found on reference (Mallat, 1998).The scale parameter a is referred to as the transform resolution level.When Wf(a, b) is known only for a < a 0 , to recover f (x) it is necessary a complement of information corresponding to Wf(a, b) for a > a 0 .This is obtained by introducing a scaling function φ(x), from which an approximation of f (x) at scale a is achieved: In this way, Wf(a, b) and Lf(a, b) are defined as the detail and approximation coefficients of the wavelet transform, respectively.
To allow fast numerical implementations, it is common to impose that the scale and translation parameters vary only along discrete values, the most common being the dyadic wavelet decomposition, which is achieved when a = 2 j and b = k 2 j , for integers j and k.Therefore, one can construct wavelet and scaling families given, respectively, by: which are orthonormal bases of some subspaces of L 2 (R) related to the resolution 2 j .Many families of wavelets have been developed, such as Haar, Daubechies, Coiflet, cubic splines, among others.
For discrete signals, the discrete wavelet transform (DWT) is obtained by the discretization of time as well as the translation and scale parameters.Mallat has proved that the dyadic DWT of a signal is equivalent to its decomposition through high-pass and low-pass filter banks, as many filters as is the desired resolution.The DWT achieves a multiresolution decomposition of a discrete signal f [n] on J octaves (resolutions) labeled j = 1, 2, ••• , J and given by: The sequences h[n] and g[n] are low-pass and high-pass synthesis filters, respectively.They are called reconstruction filters.
The wavelet coefficients, namely the approximation coefficients a j,k and the detail coefficients d J,k , are obtained by the convolution operations: where h j [n] and g J [n] are the analysis discrete filters, related to the approximation and detail coefficients, respectively.They are called decomposition filters.Considering two filters, a low-pass h[n] and a high-pass g[n], the analysis filters can be obtained iteratively as: where N is the number of filter coefficients.In this case, as it can be seen, the filter h[n] plays an important role for wavelet transforms.
Different wavelet and scaling families can be constructed from the wavelet and scaling filters through: As an example, the Daubechies wavelets are distinguished from each other by the number of vanishing moments.The Daubechies with 3 vanishing moments, referred to as Db3, was used in this work and has non-zero filter coefficients shown in Table 1.In what concerns images, which are bidimensional (2D) signals, the DWT can be computed as given by Figure 3, which shows an image decomposition from the j-th level to the (j + 1)-th level of the DWT.The coefficients LL j represent the pixel values of the original image.Firstly, the image is passed through a pair of filters on each row, followed by a downsampling of 2. The results are used as inputs of two filter banks, which are applied at the columns of the image, followed by downsampling.Four sub-images are generated in this process: the approximation LL j+1 , which represents the original image with a smaller resolution, and the details LH j+1 , HL j+1 , HH j+1 , which represent the horizontal, vertical and diagonal directions, respectively.For each ROI, the 2D-DWT was applied using three different wavelet functions, Coiflet 5, Daubechies 3 e Symlet 4, with 2 resolution levels.The application of the first decomposition level yields the coefficient matrices LL 1 ,LH 1 ,HL 1 and HH 1 .The second decompositon level applied in subband LL 1 resulted in the coefficient matrices LL 2 ,LH 2 ,HL 2 and HH 2 , and so on.Once most of the information is contained in the detail coefficients, only the detail subimages are evaluated in this work.

Nonlinearity operator
Implementation of discrete wavelet transformation involves linear convolution of images with coefficients of filter meant for the wavelet basis function considered (Selvan & Ramakrishnan, 2007).However, linear convolution increases the size of subband images.This causes distortions at boundaries of the image, when the image is reconstructed.
To overcome this problem, we applied the method proposed by Selvan and Ramakrishnan to make the subband coefficients less sensitive to local variations (Selvan & Ramakrishnan, 2007).In this technique, each row and column of a Q × Q image is periodically extended by R/2 pixels on both sides, where R is the number of filter coefficients.That results in a (Q + R) × (Q + R) image.Addition of any value lower than R/2 will not yield the required core samples, after removing excess samples on the boundaries.An addition of any value higher than R/2 will lead to more number of samples than required.Hence, R/2 pixels are added on all four sides of the image.Convolution of the samples are considered for decimation, which results in [(Q + 1)/2] core samples after decimation.Hence, after one level of wavelet decomposition, a Q × Q image yields four (Q/2) × (Q/2) subbands, resulting in nonexpansive samples.In this case, the coefficient matrices were adjusted to the size 64 × 64 for one decomposition level.For two decompositon level, this method yieds subband matrices of size 32 × 32.
In order to make the subband coefficients less sensitive to local variations, nonlinearity and smoothing operators must be applied on wavelet transformation coefficients, before extracting the parameters.For nonlinearity and smoothing operations, the total energy of wavelet transformation coefficients in each sububand was calculated using the equation: where E i is the overall energy in the i-th subband, w i (j, k) is the wavelet transformation coefficient at locations (j, k) in the i-th subband, and P and Q are the number of rows and columns of the i-th subband, respectively.Here, parameters P and Q have similar values.
In the present work, we also consider 3 × 3 neighborhoods, compute their average, and normalize the average, as proposed by (Selvan & Ramakrishnan, 2007).At each location L i (j, k), the local energy was computed in a 3 × 3 neighborhood using equation 17.
where the local energies L i (j, k) were normalized by: The central wavelet coefficient from the 3 × 3 neighborhood was then replaced by the corresponding normalized energy.
Figure 4 represents the process of wavelet decomposition for each ROI to obtain the wavelet coefficient features.This process produced 12,288 wavelet coefficients for the first decomposition (64 × 64 in 3 subimages) and 3,072 coefficients for the second decomposition (32 × 32 in 3 subimages).
Gathering together all detail coefficients from the subbands resulted on a feature vector of 15,360 attributes.

Singular value decomposition
In this work, the number of wavelet coefficients is very large and the estimation of model parameters is computationally demanding.Singular value decomposition (SVD) based method is a very powerful mathematical tool which is mainly used here to reduce a large dataset to a dataset with significantly fewer values, still containing a large fraction of the variability present in the original data.SVD is extraordinarily useful and has many applications such as data analysis, signal processing, pattern recognition and image compression (Pedrini & Schwartz, 2008).It is a linear algebra tool that allows for the decomposition of a matrix into the product of three more simple matrices (Selvan & Ramakrishnan, 2007) (Ramakrishnan & Selvan, 2007a) (Ramakrishnan & Selvan, 2007b).
To explain the aplication of SVD, consider the matrix I i with size P × Q, whose entries are the subband wavelet coefficients after the introduction of nonlinearity.The application of the SVD based method decomposes matrix I i into the product of three matrices given by: where U i , with size P × Q, and V i , with size Q × Q, are orthogonal matrices whose columns are the eigenvectors of matrices I i I T i and I T i I i , respectively, and S i , with size Q × Q,i s a diagonal matrix whose non-zero entries are the singular values (square roots of the eigenvalues) of matrix I i I T i , defined by: where σ i are the non-zero singular values, with σ 1 ≥ σ 2 ≥ ... ≥ σ Q .Once the SVD is unique for each matrix, the singular values completely represents the subband images.
A method of truncation of the lower singular values, which is equivalent to a filter based approach, was applied to matrix S i for dimensionality reduction with images with noise.In (Selvan & Ramakrishnan, 2007), the authors have shown that the effect of noise is more intense on singular values with lower magnitudes.Therefore, the diagonal matrix can be truncated to a dimension K × K, where K is given empirically by: where σ n is the n-th singular value and σ 1 is its highest value.
the wavelet transform in this work was considered for 2 resolution levels, there are subband images with sizes 64 × 64 and 32 × 32 pixels, what leads to a different number of truncated singular values (different K) for each mammogram.Therefore, Equation 21was used to get a value of K r for each resolution level, and the overall K was obtained by averaging the number of truncated singular values obtained.Having defined the average value K, values were extracted from each of the eight wavelet subimages, resulting in a feature vector of 6K elements representing texture characteristics of the original mammograms.Figure 5 shows an example of the S i diagonal matrix obtained from the subband LH 2 of a Coiflet wavelet mother used for texture analysis.The values of the diagonal matrix was truncated by the average value K, which corresponds to the attributes in the region marked in green, representing the texture features.

Attributes reduction
In order to verify the relevance of the attributes obtained after the SVD procedure, we applied the technique of Analysis of Variance (ANOVA).This technique is a statistical model that compares the means of two or more experiments.This evaluation is performed to the extent that the observed differences between the means are significant for the comparison of two estimates Vieira (2006).For our experiments, this comparison was carried out on the mass and normal mammographic images.Then, mean and standard deviation were calculated for sets of texture data of the masses and normal images.This set was used to evaluate if each attribute belonged to a particular class.However, when there are differences between the attribute and these classes, this information was eliminated because they did not suit as a reference in the definition of classes in the classification stage Pedrini & Schwartz (2008) Susomboon et al. (2008).
The one-way ANOVA method was firstly applied on the singular values attributes generated by the SVD procedure to determine the statistical significance of these values.In this method, the null hypothesis is that all attribute means are the same.The alternative hypothesis is that at least two of them are different.An F-test is applied to generate this test with a confidence interval of p = 5%.The set with singular values was tested to evaluate if each attribute is different from each other.If any two groups are statistically the same, both are discarded since they do not contribute to the classification step.
In many cases, the alternative hypothesis is quite general and one needs some information about which pairs of means are significantly different, and which are not.This is done with a test called multiple comparison procedure.The multiple comparison method was applied recurrently to determine a minimum value for the number of reduced attributes.This step occurred with the change in the p value down to the value p = 1.10 −16 .
The selection of sets for ANOVA evaluation and a graphic with the analysis of the attributes are illustrated in Figure 6 Fig. 6.Method of application of the ANOVA method on attribute 2.

Fusion of textural information
Several studies have demonstrated that the fusion of information extracted from two views, CC and MLO, allow for the reduction of false positive compared to the use of a single image Paquerault et al. (2002) Gupta et al. (2006).
For the process of fusion of information from two views, we applied the ANOVA technique to analyze the characteristics that are relevant in both images, i.e., information that was statistically different.The procedure resulted in an array of textural data with 50% of information from vision CC and 50% from MLO view.Only the common characteristics of the two views were used in the classification stage.
Figure 7 shows the block diagram of the algorithm to fusion information from the mammograms obtained in both CC and MLO view.Fig. 7. Block diagram of the algorithm for obtaining the vector with the fusion of information from mammograms.

Classification stage
As a final stage of the computerized algorithm, after the information fusion step, a classification procedure was performed on the characteristic vectors outcome obtained from both views.In this work, we used the Random Forest and SVM classifiers, separately, to categorize the mammogram ROIs either as normal or abnomal tissues.These algorithms were implemented in WEKA (Waikato Environment for Knowledge Analysis) software (Vibha et al., 2006).
The Random Forest algorithm is built from a collection of classification trees.It is a concept of regression trees, bootstrap samples induced by a set of training data, using features selected in the random process of tree induction (Ramos & Nascimento, 2008).
The Support Vector Machine algorithm based on Sequential Minimal Optimization for implementation (SMO) is a machine learning technique based on statistical learning theory, which seeks to find a hyperplane with maximum separation between the classes, assuming that the data are linearly separable.If not, the SVM maps the data using a kernel function in a feature space of higher dimension, where the data becomes linearly separable (Osta et al., 2008).
To train and test the proposed computerized method, a cross validation procedure was performed on a dataset comprising 240 normal ROI (240 images) and 240 abnormal ROI (240 images).To obtain the performance for each classification we implemented a 10-fold cross validation procedure, in which the dataset were split into N parts.N − 1 parts served as the training data to fit the classification model.The remaining part was used as the test data for the estimation of performance measures.Each of the N parts was used as test data in turn.The resulting N estimates were averaged to obtain the final expected value.Figure 8 presents the scheme of division into parts for the application of cross-validation procedure.

System evaluation
To give support to the performance evaluation of proposed method, we computed sensitivity (true positive rate), specificity (true negative rate) and accuracy for each of the confusion Table 2. Confusion matrix for L = 2 classes.
in Table 3, which shows the most commonly used formulas.Specificity measures the percentage of positive instances that were predicted as positives, while sensitivity measures the percentage of negative instances that were predicted as negatives.Overall accuracy is the probability that a diagnostic test is correctly performed.
Performance evaluation was accomplished by means of the receiver operating characteristic (ROC) curves.An ROC curve is a two-dimensional graph of test sensitivity, plotted on the y-axis, versus its false positive rate (or 1 − specificity), plotted on the x-axis.An ROC graph depicts relative trade-offs between benefits (TP) and costs (FP), and is produced by varying the decision threshold of the classifier.Although it sometimes can lose some subtle information on classification results, the area under the ROC curve (AUC) is a good accurate measure of a given classifier performance.A test classification with an AUC of 1.0 is a perfect test and an AUC of 0.0 results in a perfectly inaccurate test.

Extraction and reduction from wavelet transform
In the proposed method, texture information was extracted by wavelet tranform application the SVD method was employed to reduce the number of coefficients before the classification stage.These methods were applied to each ROI obtained in the CC and MLO views.Table 4 shows the number of truncated singular values (K parameter) for the first and second levels of each subband image of each view, as well the resulting number of attributes after SVD application.The resulting number of coefficients is obtained multiplying the second and third rows of the table by 3 and summing the result, since it was employed a two level decomposition that results in 3 subimages for each level.transform is a natural filtering process derived from distinct discrete filters, the use of different wavelet mothers conducted to a different number of coefficients, reflected in the truncated K parameter, for each analyzed image.As seen in the previous table, in summary, the SVD method can effectively reduce the number of coefficients obtained from wavelet subimages.As stated before, each wavelet transformed ROI initially encompassed a total of 15,360 wavelet coefficients extracted from a two-level decomposition procedure.The application of the SVD method provided an average reduction of 99.3% on the number of attributes necessary for evaluation of texture information.Comparing the different wavelet mothers evaluated, it can be observed that the Coiflet 5 wavelet provided the smaller number of coefficients.
To reduce the number of attributes and provide a better separation of diseased and normal classes, the method of analysis of variance (ANOVA) with significance level of 5% was applied on the coefficients resulting from the SVD method.Table 5 shows the number of coefficients resulting from the application of ANOVA method with a significance level of 0.05. it can be noticed that there is a significant reduction on the number of coefficients.These values are even more expressive when the ANOVA was applied recursively with the limiting significance level set to p = 1 × 10 −16 , which led to an average reduction of 81.34% compared to the first application of ANOVA.

Classification of CC and MLO views independently
Tables 7 and 8 show the mean values of sensitivity,specificity, accuracy and standard deviation (σ) for the Random Forest and SVM classifiers, respectively, for CC e MLO views independent after the reduction process of wavelet coefficients.Sensitivity and specificity are the basic measures of the precision for a given diagnostic test.Observing the data on Tables 7 and 8 we can see a great variation on the results for the different wavelets considered.From Table 7, concerning the Random Forest classifier, it can be noticed that the MLO view provided the best results for all three wavelet mothers used.Among the wavelet functions considered, there is little difference on the results (approximately 1%), and the best results were obtained with the application of the Symlet 4 wavelet mother.
A similar result can be observed for the SVM classifier, as it can be seen on Table 8 that shows better results for the MLO views.Comparing the values for the MLO and CC views, the first one resulted on average values 3% superior than the latter.These results show that an inconsistency can occur if a particular lesion is similar in both views and the system does not have the capability to find it.As stated in (Doi et al., 1999), studies have shown that these limitations have been considered mistrusted and radiologists tend to ignoring the results provided by these systems.
When comparing both classifiers, this inconsistency can also be noted, since there is a large variability of results.The Random Forest presented better results for false positive rates than the SVM classifier, but the latter, on average, performed better for the false negative tests.
Considering the accuracy results (overall system performace), the SVM classifier presented the better results on average.
Those results are corroborated when considering the AUC as a measure of the system performance.Figure 9 shows the results of AUC for the Random Forest classifier applied to the CC and MLO views independently, after the reduction procedure, parameterized by the different wavelet functions.In Figure 10, the same results of AUC are shown when applying the SVM classifier.As it can be noticed, SVM performed better for almost all cases, the exception being the Daubechies 3 wavelet on the CC view.

Classification of two-view information fusion (CC and MLO)
Table 9 shows the number of texture data after application of the ANOVA technique in each view separately and the number of attributes after the two-view attributes fusion for each type of wavelet mother.Tables 10 and 11 show the mean values of sensitivity, specificity, accuracy As seen on Tables 10 and 11, the results for the two classifiers using two-view CC and MLO information fusion were increased for all wavelet functions considered.As an example, for the Coiflet 5 wavelet, the fusion procedure led the sensitivity (fraction of the number of regions detected as positive against those that actually have a mass) to be increased about 5% compared to the result of the same wavelet for individual visions.The same occurred to the other two wavelets.Furthermore, the results also show an increase in the specificity and accuracy parameters due to the two-view information fusion procedure.The results on the tables also show that the Daubechies 3 wavelet function have reached the best results compared to the other two applied wavelets.Since accuracy reflects the relation between sensitivity and specificity, that descriptor permited to note the similar overall performance between the two classifiers.This result is also visible when evaluating the AUC performance of the classifiers.Figure 11 shows the results of AUC after CC and MLO information fusion for the three different types of wavelet mother and the Random Forest and SVM classifiers.

Conclusion
In this work, we develop a method for extraction and selection of attributes using the information fusion from both CC and MLO views for breast cancer classification of mammogram images.Comparing the results of Tables 7 with 10 and 8 with 11, we observe that the use of textural information, obtained from the fusion of the CC and MLO views, might raise the rates of the descriptors evaluated.
Experimental results showed that the three types of wavelet mother had superior performance over the area under the ROC curve for the SVM classifier.Performance evaluation was conducted in relation to the AUC and the results obtained showed that the best classification rates were obtained using the Daubechies wavelel, giving an AUC = 0.831 for the Support Vector Machine classifiers.Future experiments may use other types of feature descriptors to evaluate possible improvements in image classification.

Fig. 1 .
Fig. 1.Procedure applied for the selection of ROI's: images with malignant lesion and images no lesions.are wavelets obtained from scaling and time shifting operations on a mother wavelet ψ(x), with a and b being the scaling and translation factors, respectively.The mother wavelet ψ(x) is an integrable function with zero mean, i.e.: ∞ −∞ ψ(x)dx = 0 (3)

)
Fig. 2. DWT analysis filter bank.reconstruction filter h[n] are considered to be equal, this defines conjugate mirror filters and h[n] and g[n] completely characterizes the dyadic DWT, with g[n]=(−1) n h[N − 1 − n],where N is the number of filter coefficients.In this case, as it can be seen, the filter h[n] plays an important role for wavelet transforms.

Fig. 8 .
Fig. 8. Division scheme for the application of training and test classification step.matrices (Tsai & Lee, 2002).A confusion matrix shows the predicted and actual classifications accomplished by a classifier.It has dimensions L × L, where L is the number of classes evaluated by the classifier.In our case, L = 2 and the confusion matrix can be stated as shown in Table2.The parameters employed for performance evaluation can be summarized Class Positive Negative Positive True Positive (TP) False Negative (FN) Negative False Positive (FP) True Negative (TN)

Fig. 9 .
Fig. 9. Results of AUC after application of the reduction methods for different wavelets and Random Forest classifier.

Fig. 11 .
Fig. 11.Results of AUC after the two-view information fusion process for the Random Forest and SVM classifiers.

Table 1 .
Daubechies filter for a wavelet with 3 vanishing moments.

Table 4 .
Value of truncated K parameter and resulting number of coefficients after reduction for each subband image of each CC and MLO views.

Table 5 .
Number of wavelet coefficients after application of ANOVA method with p = 0.05.The procedure to determine the significance was applied again to determine a minimum value in reducing the number of attributes.Table6presents the obtained results in relation to the number of coefficients after the application of ANOVA method with the limiting value set to p = 1.10 −16 Applying the ANOVA method on the attributes resulting from the SVD method,

Table 6 .
Number of the wavelet coefficients after application of ANOVA method with p = 1 × 10 −16 .

Table 7 .
Measurements obtained with the Random Forest classifier for each view of mammogram.

Table 8 .
Measurements obtained with the SVM classifier for each view of mammogram.

Table 9 .
Number of relevant textural information after application of the ANOVA technique with p = 1 × 10 −16 .and standard deviation (σ) for the process of two-view information fusion, using the Random Forest and SVM classifiers, respectively.

Table 10 .
Results for two-view fusion obtained using the Random Forest classifier.

Table 11 .
Results for two-view fusion obtained using the SVM classifier.