Tumor Malignancy Characterization in Clinical Environments: An Approach Using the FYC-Index of Spiculation and Artificial Intelligence

According to the World Health Organization, cancer is the second leading cause of death in the world. The myriad of variations, paths of development, and muta-tions make this abnormality challenging to treat. With the advent of medical imaging, complex qualitative information is collected with the aim of characterizing the pathology; however, the uncomfortable and time-consuming histology remains the state of care within hospitals. This manuscript presents a strategy to extract quan-tifiable features from the images. The method captures shape perturbation as vari-ations in reference to a perfect circle that is used in a standardized dimensional space. A multifeatured scheme is created when the quantification is applied in all slices produced by imaging modalities such as computed tomography, magnetic resonance imaging, and tomosynthesis. Later, the numbers obtained by the intro-duced algorithm are used in an artificial intelligence pipeline that correlates spicularity with aggressiveness using the histology as supervising factor.


Introduction
Classifying cancer lesions in form and intensity from the images is of interest in radiology units [1][2][3]. Currently, histology is the gold standard to define cancer type, stage, and grade; nevertheless, histology comes with its associated costs and delays and has been reported to increase morbidity [4,5]. When diagnosing from the images, the desired classification is accurate and repeatable only if the operator includes the quantitative domain to the set of available tools that are mostly from the qualitative domain. The quantification is accomplished by separating the neomass from the anatomical parts in the image employing segmentation.
Regarding segmentation, authors have proposed assisting techniques that partially or fully accomplish the tasks with different levels of accuracy [6][7][8].
After segmentation, the challenge is finding a repeatable and performant method for all kinds of cancer manifestations. Some quantifying approaches target cancer in specific parts of the body [9,10], while others focus on particular kinds of cancer [11,12].
Although technology has invaded the medical facilities, currently assisting tools are not of help in diagnosing cancer. The tasks are still performed by human experts employing purely qualitative judgment. There is a need to quantify and thus abandon the uncertainty produced by human variability.
In practice, qualitative features suggested by X-rads [13,14] such as roughness and stiffness are difficult to conceptualize with mathematical models; therefore, indexes based on these features are complicated to model [15]. However, the shape of the captured objects is a stable feature in the field of view [16,17] and, conveniently, has the required sensibility across all cancer manifestations because it captures the core manifestations of the disease, the disordered growth pattern [18,19]. More importantly, tumor shape is quantified in a feature-enriched scheme to favor further machine-learning implementation. In this document, we employ the FYC-Index of spiculation [20] to assert quantification on the edges of breast tumors imaged with tomosynthesis [21,22]. The numbers yielded by the FYC-Index strategy are fed to an artificial intelligence classifier that initially differentiates between benign and malign neo-masses, showing a high degree of accuracy in supervised experiments. The presented strategy is equally performant in all imaging techniques that generate volumetric representations by slicing, including MRI, CT, and tomosynthesis.

Clinical data
A cohort of 48 breast tomosynthesis images underwent segmentation performed by an expert radiologist. Histology was performed on the 48 masses yielding 29 malignant cases and 19 benign. The resulting masks hold the specifications of the original images regarding the field of view and spatial resolution. Since the algorithm explained in Section 2.2 is immune to resolution changes and the field of view is standardized, records of the images' specifications are not provided in this document.

The FYC-Index of spiculation
The reader is invited to refer to Figure 1. Recall that the procedure explained below is used on both views, axial and sagittal.
Block I. Images of CT, MRI, or tomosynthesis are suitable for this processing due to their slicing nature. The FYC-Index re-sample all masks to isometric voxels of 1 mm before selecting the biggest mask through area calculation. Next, the dimensions of the biggest bounding box are used as a dimensional template. Then, the other slices in the study-including those of other tumors in case we are working with a population-are scaled to the dimensions of the biggest bounding box. This process also centers the masks. Distortion in the mask growing process is avoided by using the adaptive supersampling method [23,24]. After scaling, all the images share the same field of view (FOV) and therefore, the same planar coordinates for the center point. Block II. Then, the edges of the masks are detected using the Canny edge detector [25]. Block III. As the Canny detector does not create single-pixel edges, the system detects two paths corresponding to the outer and inner edges. The Euclidean distance from the artificially created center of coordinates to each point in the edge is saved in two arrays, one corresponds to the outer edge and the other to the inner edge. The two arrays are averaged in an ordered array of distances (AoD). The run along the edge that creates the AoD is standardized by starting the distance calculation at the top center of the image and taking the edging points in a clockwise fashion until the starting point is located at a distance of ffiffiffiffiffiffi ffi 2 ð Þ p mm of the current point or below. Recall that voxels are all set to 1 mm. Block IV. The AoD is Gaussian filtered creating the FAoD. This filtering is intended to eliminate the high-frequency components produced by the digital grid. The filter is implemented in the frequency domain, keeping 80% of the original spectral power. According to [26], maintaining the 80% of the signal spectral power assures that the important content of the signals is kept. Block V. A five-point differentiation is applied to find the regions of rapid change; next, a second five-point differentiation is executed to recover the inflection points. These operations are generalized to each point in FAoD as shown in Eq. 1: The second derivative "p"obtained with the second pass of Eq. (1)-is where peaks are detected. The peak elements on FAoD are exalted, while regions of low dynamics in the same array are diminished when raising FAoD to the fourth power. Block VI. A moving window integration selects peaks in the exponential second derivative. Given points p s ð Þ ¼ s; d ð Þ, the area A ð Þ under the curve section with a width N ð Þ is calculated for step s ð Þ, as it is shown in Eq. (2). If A s ð Þ>T for a chosen value of T, s is added to this list of locations of peaks: Block VII. The spiculation quantifying process is executed in axial and sagittal views. Block VIII. The location of the detected peaks is then crossed; we only kept the points that coincide in both views. Block IX. These points when mapped back in the images uplift the regions where the tumor presents a highly disorganized growing pattern. Block X. Each slice in the study contributes to the histogram signature of the tumor. The FYC-Index defines the span of the histograms by using the maximum and minimum amount of points found in the slices when working on a single mass or among all analyzed tumors when working on populations.
Under the FYC-Index domain, while more spiculation, the histogram profiles are more populated in the right side.

Proof of concept on synthetic data
Testing on extremes is a common practice in engineering. Unfortunately, finding extremes on clinical data is cumbersome. The difficulty relies on the nature of the information; in the clinics, where the patients are imaged on the presumption that some abnormality is present, the images often yield moderately spiculated masses, posing a problem overall for the lower extreme reference. Regarding the highly spiculated reference, one can use the histogram signature to pinpoint the slice yielding the most right-filling pattern. However, a sounded proof of concept should comply with the common complexity found in the clinics, where two masses can have similar volumes and have a different nature regarding malignancy; thus, conventional methods are unable to detect differences. To overcome this problem, we have created a synthetic framework where lower references are created by stacking the less spiculated slice among all the data analyzed. A mildly spiculated mass is created by stacking a mildly spiculated slice among the study, and, analogously, the extreme spiculated sample is created by stacking the most spiculated slice found in the study. For the three samples, the stacking is driven in a manner that the masses end by having a similar volume.

Artificial intelligence (AI) implementation
Every column in the histogram signature created by employing the procedure in Section 2.2 is seen as a feature in classification postulate that aims to distinguish between malign and benign samples. This is possible due to the independence of the peaks counting in a slice by slice fashion. In general, the perturbations on the slice n do not have any correlation with the perturbations on slice m; therefore, orthogonality is granted. In addition to the bins counting, the number of bins fulfilledsome bins may end empty-those filled from the middle bin to the right and those filled to the middle bin to the left, is also used in the featuring space.
Every tumor population has a different span in the histogram signature; however, the amount of peak-counting-derived features have been set constant by forcing seven equally spaced bins regardless of the peak-counting range. Thus, the experiments always create an analyzing matrix containing 11 columns, 10 columns for the features, and 1 column to register the supervising factor provided by the histology. The current exercise presents a boolean support vector machine (SVM) classifier, where the machine is trained to provide a benign or malign verdict.
The data matrix is scaled and normalized using Python-Pandas [27,28] As it is shown in Listing 1, the SVM classification is done after progressively adding features which are grabbed from the mdata matrix using the indexes saved in the features' array. The accuracy records presented in Section 4 correspond to the experiment that yielded the highest accuracy values per folding. Figure 2 shows how the algorithm yields two different outcomes based on the tortuosity of the two analyzed shapes. The small shape refers to a mostly rounded region of interest (ROI), therefore, does not present abrupt changes in the distances from the edging points to the center of FOV. In contrast, the same measure yields rapidly changing distances in the big ROI. Those rapidly changing distances are captured by the first derivate and framed in their inflection points by the second derivate. Later, those points are amplified and made all positive by the fourth power function, while the same fourth power function diminishes changes in which the derivate yielded values in the range (À1, 1). As the moving window adds up all values encountered in its domain, the regions of rapid change represented by large values compute to higher numbers within the domain of the moving window, and that is where the enhanced points appear in the plot. As all the points are mapped with their original coordinates, a crossing of 3D positions among the selected points in two image views filters out positions erroneously selected. Finally, the presented procedure allocates an item of frequency in a histogram where the bins contain ranges of point counting. Naturally, highly spiculated slices contribute mostly to the right bins of the histogram. When all slices in a tumor have been analyzed, the operator could be sure that the histogram is descriptive of the degree of homogeneity of the mass which is also associated with aggressiveness (see Figure 3).

FYC-Index extraction
A sample of the process where the 3D reconstruction of the masses together with the respective normalized FYC-Index histogram is presented in Figure 4.

Analysis of synthetic data
As explained in Section 2.3, extreme references are created to demonstrate the span of the method and the capacity to deliver a representation of easy interpretation. The synthetic creations are shown in Figure 5.
The results obtained on synthetic data corroborates that the FYC-Index is sensible to the changes in the edges that distinguish between malign and benign masses. In contrast, commonly used geometrical indexes are not sensitive to changes. In this exercise, we have isolated the spiculation by equalizing the volumes of the studied software objects. A complete set of 3D geometrical functions are applied on the clinical data in use, with the aim of comparing the performance of standard of care tools in the clinics, and the FYC-Index is shown in Figure 6.

Verdicts dictated by (AI) implementation
The fivefolding SVM exercise proposed in Section 2.4 was executed using a Python-Pandas dataframe and Scikit-Learn SVM. The results are registered in Table 1.
The strong-force algorithm presented in Section 2.4 executed the supervised classification with a high degree of accuracy. The design of the experiments turns the classification into the capacity to differentiate whether a mass is benign or malign.

Discussion
The proposed method is sensitive to slight changes in the edges of the masses that are characteristically malignant. The same method includes a stage of quantification that has proven to be descriptive at a simple glance even for nonspecialized operators. Since the procedure has been automated, it is compliant with the confidentiality regulations and, therefore, can be easily implemented in hospitals and clinics. The FYC-Index is a flexible method equally performant when analyzing masses in individuals and populations. The method presents a signature which results in a measure of lobularity. This strategy works regardless of factors such as size and spatial resolution. Moreover, the results are direct and easy to interpret. The specifications of the FYC-Index make it suitable to analyze all sort of cancer manifestations, regardless of localization or pathogenic roots. The presented strategy uses a machine-learning classifier to rapidly characterize the malignancy of a mass. However, the real challenge consists of defining malignancy together with aggressiveness. Such an approach requires more rounds of training/testing sessions  The two boxes per colored column correspond to the clinical data detailed in Section 2.1. Normality was discarded by Kolmogorov test [30] done in the two groups separately. As normality was not met, the nonparametric Kruskal-Wallis test [31] was employed. The p-values are mapped back and forward in the chisquare distribution.
with sufficient samples in all grading range. This multilevel classification should be designed to follow the classification directives presented in the X-RADS standards; thus, the existing automatic tools can also provide insights for selecting more accurate treatments. To the best of our knowledge, no other authors are integrating the tools as we have proposed. The use of the features we have proposed is a novel view of the solution; therefore, we do not include in this report a comparison with other methods.

Conclusion
Cancer is the second most threating disease which humanity has not been able to neutralize. Other diseases that were considered pandemics in the past, costing millions of human lives, have been eradicated through vaccination. Rapidly mutating diseases such as AIDS have been downgraded from mortal to chronic. Maladies like high blood pressure, stroke, or cirrhosis among several other chronic afflictions have been associated with race, genetics, habits, or exposition factors, providing a way to reduce the probability of acquiring them or a path of development where scientists still have space to explore. Cancer instead affects all humans regardless of any factor. The only aspect that increases the surviving expectations, without a doubt, is early detection, and it is here where the method presented in this manuscript gains relevance. Detection from the images is possible, and automatic diagnosis not only avoids the painful and uncomfortable biopsy, but it also contributes to faster and more accurate verdicts.