Predictive Solution for Radiation Toxicity Based on Big Data

Suk Lee; Kwang Hyeon Kim; Choi Suk Woo; Jang Bo Shim; Yuan Jie
Cao; Kyung Hwan Chang; Chul Yong Kim

doi:10.5772/67059

Abstract

Radiotherapy is a treatment method using radiation for cancer treatment based on a patient treatment planning for each radiotherapy machine. At this time, the dose, volume, device setting information, complication, tumor control probability, etc. are considered as a single-patient treatment for each fraction during radiotherapy process. Thus, these filed-up big data for a long time and numerous patients’ cases are inevitably suitable to produce optimal treatment and minimize the radiation toxicity and complication. Thus, we are going to handle up prostate, lung, head, and neck cancer cases using machine learning algorithm in radiation oncology. And, the promising algorithms as the support vector machine, decision tree, and neural network, etc. will be introduced in machine learning. In conclusion, we explain a predictive solution of radiation toxicity based on the big data as treatment planning decision support system.

Keywords

big data
machine learning
radiation toxicity
predictive solution
radiation treatment planning

Author Information

Show +

Suk Lee*
- Department of Radiation Oncology, College of Medicine, Korea University, Seoul, Korea
Kwang Hyeon Kim
- Department of Radiation Oncology, College of Medicine, Korea University, Seoul, Korea
Choi Suk Woo
- Department of Radiation Oncology, College of Medicine, Korea University, Seoul, Korea
Jang Bo Shim
- Department of Radiation Oncology, College of Medicine, Korea University, Seoul, Korea
Yuan Jie Cao
- Department of Radiation Oncology, College of Medicine, Korea University, Seoul, Korea
Kyung Hwan Chang
- Department of Radiation Oncology, College of Medicine, Korea University, Seoul, Korea
Chul Yong Kim
- Department of Radiation Oncology, College of Medicine, Korea University, Seoul, Korea

*Address all correspondence to: sukmp@korea.ac.kr

1. Introduction

1.1. Definition of big data and each clinical application overview

Trifiletti et al. [1] describe the big data as follows: a lot of information and massive data sets or number of grains of sand in the earth for human analysis with 10¹²–10¹⁸ bytes [1].

Murdoch listed that the big data are the inevitable application in healthcare field as four things [10]:

Expanding capacity to create new knowledge
Helping with knowledge dissemination
Translating personalized medicine in clinical practice with EHR data
Allowing for a transformation of health care by transferring information to patient [10]

This trend is called to be “big bang” to adapt and research for big data and machine learning in medicine. Especially, machine learning is widely used [4–6]. Radiotherapy is a treatment method using radiation for cancer treatment based on a patient treatment planning for each radiotherapy machine. At this time, the dose, volume, device setting information, complication, tumor control probability, etc. are considered as a single-patient treatment for each fraction during radiotherapy process. Thus, these filed-up big data for a long time and numerous patient cases are inevitably suitable to produce optimal treatment and minimize the radiation toxicity and complication. Thus, we describe various clinical cases and key machine learning algorithms in radiation oncology in this chapter.

First, what is the big data for a single patient in hospital? The data type and its size for each patient can be summarized in Table 1 . In case of radiation oncology, imaging and treatment planning information could be a major treatable data [15].

Data type	Format	Approx. size
Clinical features	Text	10 MB
Blood tests	Numbers	1 MB
Administrative	ICD-10 codes	1 MB
Imaging data	DICOM	450 MB
Radiation oncology data (planning and onboard imaging)	DICOM, RT-DICOM	500 MB
Raw genomic data	BAM: position, base, quality	6 GB
Total		7.9 GB

Table 1.

Data type and its size for each patient. In case of radiation oncology, imaging and treatment planning information could be the major treatable data [15].

Second, we would like to explain radiation treatment planning and decision support system in radiation oncology. When we set up treatment planning with parameters for patient cure in radiotherapy, it is based on the radiation treatment planning (RTP) system. The clinical target volume (CTV) and planning target volume (PTV) have to be targeted by maximum radiation, and critical organs have to be radiated by minimum. It is established based on the correlation between the dose and volume, also known as dose-volume histogram (DVH). At this process, considered parameters are the prescription dose (PD), dose distribution, dose fractionation, dose constraints at normal tissue, target volume, treatment machine setting values, etc. [2, 16].

Third, when the finish treatment planning has been completed, the DVH is acquired. The dose-volume distribution will be the basic information whether it could be use or not. But, these limited information do not give hot spot for target volume, conformity, homogeneity, and so on. And, the tumor control probability (TCP) and normal tissue complication probability (NTCP) have to be analyzed in parallel. As the knowledge-based judgment, other rival plans could be generated again [32]. Thus, some decision support system is needed to select the best treatment plan for personalized patient care. These decision support systems (BIOPLAN, CERR, DRESS, Slicer RT, etc.) that provide different functions to analyze treatment efficiency. And these were being researched and studied as the software program since the early 2000s to up to date [3, 26–28].

But now, these decision support systems are needed to add to specific function using machine learning and historical treatment results and previously mentioned big data information to predict patient toxicity or complication after radiation treatment.

2. Clinical application using big data in radiation oncology

2.1. Prostate cancer

Çınar et al. [25] describe prostate cancer as follows:

Prostate cancer occurs most frequently in men over 50.
Prostate cancer is currently most common in men except lung cancer [25].

Thus, this clinical application is meaningful to deal with machine learning in big data. Coates et al. [4] studied the integrated big data research for prostate cancer in radiation oncology. The parameters are dose-volume metrics (EUD), clinical parameter [gastrointestinal (GI) toxicities or rectal bleeding and genitourinary (GU) toxicities or erectile dysfunction (ED)], spatial parameters (zDVH), biological variables (genetic variables), etc., and the risk quantification modeling of TCP and NTCP has performed. These modeling methods are various, and the neural network and kernel-based methods are widely used. Figure 1 shows that the toxicity prediction results using principle component analysis (PCA) [4].

Figure 1.
The predicted NTCP via principle component analysis (PCA) (reproduced from James Coates et al. [4]).

De Bari et al. [5] have done the pilot study for the prediction of pelvic nodal status using machine learning of prostate cancer. A 1555 cN0 and 50 cN+ prostate cancer patients enrolled, and decision tree and machine learning algorithm were used to study for performance results of Roach formula and Partin table. The accuracy, specificity, and sensitivity ranging between 48–86%, 35–91%, and 17–79%, respectively, were showed through this study ( Figure 2 ).

Figure 2.
A decision tree example for prediction of pelvic nodal status in prostate cancer patients [5].

In addition, several analysis articles have been reported for prostate cancer with index results, which could be the example for adding above machine learning algorithm in the next step [30, 31].

2.2. Lung cancer

Das et al. [6] describe radiation-induced pneumonitis as a serious problem around thorax including the lung as follows:

Important problem for the incident radiation to the adjacent or surrounding normal lung.
Occurrence of high grade in 15–36% with retrospective studies.

Das et al. [6] conducted prediction modeling based on 234 lung cancer patients and Lyman normal tissue complication probability (LNTCP) by decision tree analysis. Table 2 shows injury prediction by various settings for a male patient.

Plan name	Histological type	Chemotherapy before RT	Once/twice-daily treatment	LNTCP	Injury output (simplified model)	% Injured patients below	% Uninjured patients above
A1	Nonsquamous	No	Either	0.5	0.38	3	72
A2	Nonsquamous	No	Either	0.73	0.49	36	29
A1	Squamous	No	Either	0.5	0.5	37	28
A1	Any	Yes	Twice	0.5	0.51	43	24
A1	Any	Yes	Once	0.5	0.55	64	13
A2	Squamous	No	Either	0.73	0.61	88	4
A2	Any	Yes	Twice	0.73	0.62	91	3
A2	Any	Yes	Once	0.73	0.66	97	1

Table 2.

Comparison table of injury prediction for combinations of radiotherapy plan and various settings for a male patient [6].

RT, radiotherapy; LNTCP, Lyman normal tissue complication probability.

2.3. Head and neck cancer

Head and neck cancer patients undergo anatomical change during radiotherapy for a few weeks. Thus, kilovoltage cone-beam computed tomography (kV-CBCT) and mega-voltage computed tomography (MVCT) combined with a linear accelerator (LINAC) permit to control patient’s daily anatomical change for treatment fractions in recent radiotherapy [7]. The adaptive radiotherapy (ART) could fix the anatomical variation for the patient through the dose distribution adjustment. Finally, reducing unexpected toxicity can be possible. But, This ART accompanies time and labor for daily setup about the variation fixing. At this time, when replanning has to be done daily/weekly for numerous patients, then it is laborious and time-consuming for this process.

Guidi et al. [7] studied the prediction of replanning benefit using unsupervised machine learning on retrospective data considering this process and patient characteristics. Figure 3 is the algorithm architecture for this study. From the DVH input, clustering which classifies into data group, support vector machine (SVM) training which analyzes the parotid gland, and clinical acceptance level with test and output process are shown in Figure 3 [7]. Thus, the results suggest that the replanning for 77% patients is needed because the significant morpho-dosimetric changes affect them when the fourth week of treatment starts.

Figure 3.
Algorithm architecture for prediction using clustering and support vector machine training [7].

3. Machine learning methodology

When the machine learning method has to be selected in radiation oncology, input and output variables are considered to predict expected analysis results by accuracy validation. Kang et al. [14] describe the principles of modeling as follows ( Figure 4 ).

Figure 4.
Core principles for modeling [14].

3.1. Machine learning introduction

Ethem Alpaydin [8] defines machine learning as the computer program for optimizing performance factor using data, and Mitchell also describes that a computer program can be said to be learned in experience (E), task (T), and performance (P) [9].

A machine learning algorithm can be divided into the unsupervised learning and supervised learning [8, 11]. For unsupervised and supervised learning process is little different as with training and test in Figure 5 . A differentiation is the feedback loop for training and test difference between supervised and unsupervised learning in Figure 5(a) and (b).

Figure 5.
Unsupervised learning and supervised learning algorithm process and types. (a) Unsupervised learning process; (b) supervised learning process; and (c) Supervised and unsupervised learning algorithm types.

3.2. Supervised learning

A supervised learning is a machine learning method to find a result from training data. For example, we know beforehand about the doughnut and bagel classification group. Doughnut is classified from the training. Then, we classify the group whether this doughnut belongs to doughnut group or bagel. This is the example of supervised learning.

Generally, the training data include input characteristics with vector type; the vector presents wanted results. Thus, this continuous trial showing the result process is the regression. A classification is the division of input vector whether this value comes from several groups. When the supervised learner is executed, training data have to be measured by proper method to achieve final goal. The accuracy and validation for classification are needed to count numerically to measure its performance.

3.2.1. Decision tree

A decision tree consists of node and branch. If the nodes have more complicated hierarchy, leaf nodes and braches follow by certain decision. Thus, a diagram formed into the unknown condition at the nodes and the decision “yes” or “no” goes to a direction in a tree. This is beneficial to trace for a created hypothesis with the results. Figure 6 shows that a decision tree and it is shown that its rules for their conditions whether patient characteristics about chemotherapy, cell, treatment, and sex for RT radiotherapy.

Figure 6.
A decision tree and its rules for their conditions whether patient characteristics about chemotherapy, cell, treatment, and sex for RT radiotherapy (reproduced from Das et al. [7]).

A hyperplane h(x) defines Eq. (1) for the points x [12]:

h ( x ) : w T x + b= 0 E1

where w is the weight vector and b is the offset. The generic form of a separate point for a numeric attribute X_i is given in Eq. (2):

X i ≤ v E2

where v = −b is the certain value in the domain of X_i. The decision point X_i ≤ v thus divides R, the input data space into two regions R_YY and R_NN. Each split of R into R_YY and R_NN also induces a binary partition of the corresponding input data point D. That is, a split point of the form X_i ≤ v induces the data partition in Eqs. (3) and (4):

D YY = { x | x ∈ D,x i ≤ v } E3

D NN = { x | x ∈ D,x i >v } E4

where D_YY is the subset of data points that lie in region R_YY and D_NN is the subset of input points that line in R_NN [12].

3.2.2. Support vector machine

A support vector machine (SVM) is a machine learning method for pattern recognition and information analysis. Generally, it is used for classification and regression analysis. The SVM makes the decision about input data to determine whether a given set of data belongs to any category. For understanding the SVM, data group and hyperplane terms have to be defined.

A hyperplane in d dimensions is given as the set of all points x ∈ Rd that satisfies the equation h(x) = 0, where h(x) is the hyperplane function, defined as follows in Eq. (5) [12]:

h ( x ) = w T x + b E5

Here, w is the d dimensional weight vector and b is the scalar, called the bias. For points that lie on the hyperplane, it gives us Eq. (6):

h ( x ) = w T x + b = 0 E6

The hyperplane is defined as the set of all points w^Tx = −b. If the input data group is linearly able to classify, then a dividing hyperplane h(x) = 0 could be found for all points classified as yi = −1, h(xi) < 0 and for all points classified as yi = +1, thus h(xi) > 0:

y = { + 1 if h ( x ) < 0 − 1 if h ( x ) < 0 E7

w T ( a1 − a2 ) = 0 E8

The weight vector w can be designated at the direction that is normal to the hyperplane, however, b; the bias fixes the offset of the hyperplane in the d-dimensional space. Because w and −w are normal to the hyperplane, the vagueness that h(xi) > 0 where yi = 1 and h(xi) < 0 where yi = −1 can be removed.

Thus, let xp be the orthogonal projection, x the hyperplane, and let r ₁ = x − xp:

x = x p ` + r 1 E9

x = x p + r 1 w ‖ w ‖ E10

where r is the directed distance of x from x_p, r₁ is the x from x_p, w ‖ w ‖ is the unit weight vector.

r₁: + when r₁ is in the same direction as w; r₁ : – when r₁ is in an opposite direction to w ( Figure 7 ) [12].

Figure 7.
The support vectors and hyperplane (reproduced from Zaki and Wagner Meira [12]).

In case of nonlinear SVM, the classes are not separable by linear SVM. The shape is in Figure 8 , and some kernels include polynomial, Gaussian, etc.

Figure 8.
A nonlinear SVM (reproduced from Zaki and Wagner Meira[12]).

There is the library for various programming languages using the support vector machine in Table 3 .

Programming language	Library name	Library diversity
MATLAB	MATLAB toolbox and open library	●
C/C++	Open library	●
JAVA	Open library	○
Python	Open library	●
LabVIEW	Machine learning toolkit	◑

Table 3.

Various programming languages to implement SVM algorithm (Good, ○; Better, ◑; Best, ●).

3.2.3. Neural network

A neural network example in radiation oncology is shown in Figure 9 . A three-layer neural network defines as follows, and this would have the following model for the approximated function as [11]

Figure 9.
Neural network for head and neck cancer of 3-class classification example [17].

f ( x ) =y T w (2) + b (2) E11

where the elements are the output of the neurons:

v=s ( x T w i (1) + b (1) ) E12

(where x: the input vector; w^(j), b^(j): the interconnect weight vector, and j: the bias of layer)

3.3. Unsupervised learning

Unsupervised learning, otherwise supervised learning, does not know the specific group information. But the learning algorithm infers the results such as doughnut and bagel example. That is, there is no target value in unsupervised learning. It is related to density estimation on statistics. This unsupervised learning is beneficial to data characteristics analysis and its explanation. Typical example is clustering. Another one is an independent component analysis.

3.3.1. Principal component analysis (PCA)

Zaki and Wagner Meira defined the PCA as follows:

Finding r-dimensional basis that take the data variance.
It is called that the largest projected variance direction is the first principal component.
In case of orthogonal direction, then it is the second principal component and so forth.

And also, the mean squared error can be minimized by maximizing the data variance [12].

Principal component analysis (PCA) is applied to the normalized X to identify a set of principal components (PCs) [11]:

PC = U T X = ∑ V T E13

where UΣV^T is the singular value decomposition of X.

3.3.2. Clustering

Clustering is an unsupervised learning method, and that is finding the cluster without data label. The data and data label are required to classify. Thus, it needs different classification methods for unlabeled data. There are several ways to define cluster. One simple way is that we can define as “the data in same cluster inside” is close to each other, and the closest distance data could be selected. k-Means assume the data is close in same cluster. One center exists, and cost which is a distance between center and each data can be defined. Thus, k-means is an algorithm to reduce and minimize cost in cluster.

Given a clustering C = {C₁, C₂, …, C_k}, the scoring function evaluates its quality. This sum of squared error scoring function is defined as [12]

S S E ( C ) = ∑ i = 0 k ∑ X j ∈ C i | | X j − u i | | 2 E14

The goal is to find the clustering that minimizes the SSE score, thus,

C*=argmin c { S S E ( C ) } E15

k-Means employs a greedy iterative approach to find a clustering that minimizes the SSE objective [12].

Here is the advantage and disadvantage of various machine learning algorithms in radiation oncology in Table 4 .

Algorithm	Advantages	Limitations
Decision tree	Easy to understand	Classes must be mutually exclusive
	Fast	Results depend on the order of attribute selection
		Risk of overly complex decision trees
Naïve Bayesian	Easy to understand	Variables must be statistically independent
	Fast	Numeric attributes must follow a normal distribution
	No effect of order on training	Classes must be mutually exclusive
		Less accurate
k-Nearest neighbors	Fast and simple	Variables with similar attributes will be sorted in the same class
	Tolerant of noise and missing values in data	All attributes are equally relevant
	Can be used for nonlinear classification	Requires considerable computer power as the number of variables increases
	Can be used for both regression and classification
Support vector machine	Robust model	Slow training
	Limits the risk of error	Risk of overfitting
	Can be used to model nonlinear relations	Output model is difficult to understand
Artificial neural network and deep learning	Tolerant of noise and missing values in data	Output model is difficult to understand (black box)
	Can be used for classification or regression	Risk of overfitting
	Can be easily updated with new data	Requires a lot of computer power
		Requires experimentation to find the optimal network structure

Table 4.

The advantages and disadvantages by various machine learning algorithms in radiation oncology [15].

4. Conclusion

We summarized various clinical applications such as head, neck, lung, and prostate cancer using machine learning algorithm in radiation oncology [13, 18, 19]. And those machine learning algorithm introductions and several definitions were listed. For the precision medicine in radiation oncology, radiation toxicity and complication factors are inevitable parameters for patients after radiotherapy. The dose-volume distribution will be the basic information, but this limited information does not give the tumor control probability (TCP) and normal tissue complication probability (NTCP) and grade level. Thus, some decision support system is needed to select the best treatment plan for personalized patient care. But now, although this decision support system is needed to add specific function using machine learning and historical treatment results and previously mentioned big data information to predict patients toxicity or complication after radiation treatment [29].

Another current big data trend is the research for the medical imaging such as DICOM RT in radiotherapy. The images have a lot of information for current patient status and future undergoing information as prediction of patient’s quality of life. Thus, lung cancer and breast cancer applications are good applications in case of using simple chest X-ray or low-cost imaging method for big data research in clinical application.

Thus, we explain a predictive solution of radiation toxicity based on the big data as treatment planning decision support system in Figure 10 . From this block diagram, the input part gives treatment data (i.e., rival plans with DVH) through a radiation treatment planning system. After this process, the dosimetric and biological index analysis process is performed by program. The normal tissue complication probability (NTCP) model could be adaptable, and it is used to consider central lung distance (CLD) and maximal heart distance information to be measured such as two-dimensional radiation therapy indicators between the three-dimensional conformal radiation therapies in case of lung cancer. Dose-volume relationship and tolerance dose in organ-at-risk information are analyzed by some machine learning algorithm in decision support system. At this time, numerous patient treatment “big data” could be used to evaluate machine learning results and predict toxicity and normal tissue complication versus know-based approach. Thus, this will be the evidence-based decision to finalize treatment plan for customized patient cure [20–24].

Figure 10.
An example of the big data based on patient-specific treatment prediction in radiation oncology (a), its block diagram (b), and overview (c).

Therefore, current decision support system can be modified and developed to predict complication and toxicity after radiotherapy by adding not only dosimetric index and biological index function but also clinical big data analysis with various machine learning algorithms. This is the fusion solution for customized patient cure method in big data era in radiation oncology.

References

1. Trifiletti DM, Showalter TN. Big data and comparative effectiveness research in radiation oncology: synergy and accelerated discovery. Front Oncol. 2015; 5: 274
2. Khan FM. Treatment planning in radiation oncology. 2nd ed. Philadelphia: Lippincott Williams & Wilkins; 2007.
3. Lee S, Cao YJ and Kim CY. Physical and radiobiological evaluation of radiotherapy treatment plan, evolution of ionizing radiation research. Dr. Mitsuru N (Ed.), Croatia, InTech; 2015, DOI: 10.5772/60846.
4. Coates J, Souhami L and El Naqa I. Big data analytics for prostate radiotherapy. Front Oncol. 2016;6:149.
5. De Bari B, Vallati M, Gatta R, Simeone C, Girelli G, Ricardi U, Meattini I, Gabriele P, Bellavita R, Krengli M, Cafaro I, Cagna E, Bunkheila F, Borghesi S, Signor M, Di Marco A, Bertoni F, Stefanacci M, Pasinetti N, Buglione M, Magrini SM. Could machine learning improve the prediction of pelvic nodal status of prostate cancer patients? Preliminary results of a pilot study. Cancer Investig. 2015 Jul;33(6):232–40.
6. Das SK, Zhou S, Zhang J, Yin FF, Dewhirst MW, Marks LB. Predicting lung radiotherapy-induced pneumonitis using a model combining parametric Lyman probit with nonparametric decision trees. Int J Radiat Oncol Biol Phys. 2007 Jul 15;68(4):1212–21.
7. Guidi G, Maffei N, Vecchi C, Ciarmatori A, Mistretta GM, Gottardi G, Meduri B, Baldazzi G, Bertoni F, Costi T. A support vector machine tool for adaptive tomotherapy treatments: prediction of head and neck patients criticalities. Phys Med. 2015 Jul;31(5):442–51.
8. Alpaydin E. Introduction to machine learning. 3rd ed. Cambridge, MA: The MIT Press; 2014.
9. Mitchell TM. Machine learning. New York: McGraw-Hill; 1997.
10. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309(13):1351–1352.
11. El Naqa I, Li R, Murphy MJ. Machine learning in radiation oncology: theory and applications. Switzerland, Springer; 2015.
12. Zaki MJ, Wagner Meira JR. Data mining and analysis. USA, Cambridge University Press; 2014.
13. El Naqa I, Bradley JD, PE L, Hope AJ, Deasy JO. Predicting radiotherapy outcomes using statistical learning techniques. Phys Med Biol. 2009;54(18):S9.
14. Kang J, Schwartz R, Flickinger J, Beriwal S. Machine learning approaches for predicting radiation therapy outcomes: a clinician's perspective. Int J Radiat Oncol Biol Phys. 2015 Dec 1;93(5):1127–35.
15. Bibault JE, Giraud P, Burgun A. Big Data and machine learning in radiation oncology: state of the art and future prospects. Cancer Lett. 2016 May 27. pii: S0304-3835(16)30346–9.
16. Videtic GMM, Woody N, Vassil AD. Handbook of treatment planning in radiation oncology. 2nd ed. New York: Demos Medical; 2015.
17. Kang S, Cho S. Approximating support vector machine with artificial neural network for fast prediction. Expert Syst Appl. 2014;41:4989–95
18. Dean JA, Wong KH, Welsh LC, Jones AB, Schick U, Newbold KL, Bhide SA, Harrington KJ, Nutting CM, Gulliford SL. Normal tissue complication probability (NTCP) modelling using spatial dose metrics and machine learning methods for severe acute oral mucositis resulting from head and neck radiotherapy. Radiother Oncol. 2016 Jul;120(1):21–7.
19. Chen S, Zhou S, Yin FF, Marks LB, Das SK. Investigation of the support vector machine algorithm to predict lung radiation-induced pneumonitis. Med Phys. 2007 Oct;34(10):3808–14.
20. Tiziana Rancati, et al. Factors predicting radiation pneumonitis in lung cancer patients: a retrospective study. Radiother Oncol. 2003;67:275–283
21. George Rodrigues, et al. Prediction of radiation pneumonitis by dose–volume histogram parameters in lung cancer—a systematic review. Radiother Oncol. 2004;71:127–138
22. Milano MT, et al. Normal tissue tolerance dose metrics for radiation therapy of major organs. Semin Radiat Oncol. 2007;17:131–140.
23. Weytjens R, et al. Radiation pneumonitis: occurrence, prediction, prevention and treatment. Belg J Med Oncol. 2013;7(4):105–10
24. Emami B, et al. Tolerance of normal tissue to therapeutic irradiation. Int J Radiation Oncol Biol Phys. 1991;21:109–22
25. Çınar M, Engin M, Engin EZ, Ziya Atesçi Y. Early prostate cancer diagnosis by using artificial neural networks and support vector machines. Expert Syst Appl. 2009;36:6357–6361.
26. Sanchez-Nieto B, Nahum AE. BIOPLAN: software for the biological evaluation of radiation therapy. Med Dosim. 2000;25(2):71–6.
27. Pinter C, Lasso A, Wang A, Jaffray D, Fichtinger G. SlicerRT: radiation therapy research toolkit for 3D Slicer. Med Phys. 2012;39(10):6332–8.
28. Sanchez-Nieto B, Nahum AE. BIOPLAN: software for the biological evaluation of radiotherapy treatment plans. Med Dosim. 2000;25(2):71–6.
29. Bentzen SM, Constine LS, Deasy JO, Eisbruch A, Jackson A, Marks LB, et al. Quantitative analyses of normal tissue effects in the clinic (QUANTEC): an introduction to the scientific issues. Int J Radiat Oncol Biol Phys. 2010;76(3 Suppl):S3–S9.
30. Cao YJ, Lee S, Chang KH, Shim JB, Kim KH, et al. Patient performance-based plan parameter optimization for prostate cancer in tomotherapy. Med Dosim. 2015;40(4):285–9.
31. Cao YJ, Lee S, Chang KH, Shim JB, Kim KH, et al. Optimized planning target volume margin in helical tomotherapy for prostate cancer: is there a preferred method? J Korean Phys Soc. 2015;67(1):26–32.
32. Luxton G, Keall PJ, King CR. A new formula for normal tissue complication probability (NTCP) as a function of equivalent uniform dose (EUD). Phys Med Biol. 2007;53(1):23–36

[1] 1. Trifiletti DM, Showalter TN. Big data and comparative effectiveness research in radiation oncology: synergy and accelerated discovery. Front Oncol. 2015; 5: 274

[2] 2. Khan FM. Treatment planning in radiation oncology. 2nd ed. Philadelphia: Lippincott Williams & Wilkins; 2007.

[3] 3. Lee S, Cao YJ and Kim CY. Physical and radiobiological evaluation of radiotherapy treatment plan, evolution of ionizing radiation research. Dr. Mitsuru N (Ed.), Croatia, InTech; 2015, DOI: 10.5772/60846.

[4] 4. Coates J, Souhami L and El Naqa I. Big data analytics for prostate radiotherapy. Front Oncol. 2016;6:149.

[5] 5. De Bari B, Vallati M, Gatta R, Simeone C, Girelli G, Ricardi U, Meattini I, Gabriele P, Bellavita R, Krengli M, Cafaro I, Cagna E, Bunkheila F, Borghesi S, Signor M, Di Marco A, Bertoni F, Stefanacci M, Pasinetti N, Buglione M, Magrini SM. Could machine learning improve the prediction of pelvic nodal status of prostate cancer patients? Preliminary results of a pilot study. Cancer Investig. 2015 Jul;33(6):232–40.

[6] 6. Das SK, Zhou S, Zhang J, Yin FF, Dewhirst MW, Marks LB. Predicting lung radiotherapy-induced pneumonitis using a model combining parametric Lyman probit with nonparametric decision trees. Int J Radiat Oncol Biol Phys. 2007 Jul 15;68(4):1212–21.

[7] 7. Guidi G, Maffei N, Vecchi C, Ciarmatori A, Mistretta GM, Gottardi G, Meduri B, Baldazzi G, Bertoni F, Costi T. A support vector machine tool for adaptive tomotherapy treatments: prediction of head and neck patients criticalities. Phys Med. 2015 Jul;31(5):442–51.

[8] 8. Alpaydin E. Introduction to machine learning. 3rd ed. Cambridge, MA: The MIT Press; 2014.

[9] 9. Mitchell TM. Machine learning. New York: McGraw-Hill; 1997.

[10] 10. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309(13):1351–1352.

[11] 11. El Naqa I, Li R, Murphy MJ. Machine learning in radiation oncology: theory and applications. Switzerland, Springer; 2015.

[12] 12. Zaki MJ, Wagner Meira JR. Data mining and analysis. USA, Cambridge University Press; 2014.

[13] 13. El Naqa I, Bradley JD, PE L, Hope AJ, Deasy JO. Predicting radiotherapy outcomes using statistical learning techniques. Phys Med Biol. 2009;54(18):S9.

[14] 14. Kang J, Schwartz R, Flickinger J, Beriwal S. Machine learning approaches for predicting radiation therapy outcomes: a clinician's perspective. Int J Radiat Oncol Biol Phys. 2015 Dec 1;93(5):1127–35.

[15] 15. Bibault JE, Giraud P, Burgun A. Big Data and machine learning in radiation oncology: state of the art and future prospects. Cancer Lett. 2016 May 27. pii: S0304-3835(16)30346–9.

[16] 16. Videtic GMM, Woody N, Vassil AD. Handbook of treatment planning in radiation oncology. 2nd ed. New York: Demos Medical; 2015.

[17] 17. Kang S, Cho S. Approximating support vector machine with artificial neural network for fast prediction. Expert Syst Appl. 2014;41:4989–95

[18] 18. Dean JA, Wong KH, Welsh LC, Jones AB, Schick U, Newbold KL, Bhide SA, Harrington KJ, Nutting CM, Gulliford SL. Normal tissue complication probability (NTCP) modelling using spatial dose metrics and machine learning methods for severe acute oral mucositis resulting from head and neck radiotherapy. Radiother Oncol. 2016 Jul;120(1):21–7.

[19] 19. Chen S, Zhou S, Yin FF, Marks LB, Das SK. Investigation of the support vector machine algorithm to predict lung radiation-induced pneumonitis. Med Phys. 2007 Oct;34(10):3808–14.

[20] 20. Tiziana Rancati, et al. Factors predicting radiation pneumonitis in lung cancer patients: a retrospective study. Radiother Oncol. 2003;67:275–283

[21] 21. George Rodrigues, et al. Prediction of radiation pneumonitis by dose–volume histogram parameters in lung cancer—a systematic review. Radiother Oncol. 2004;71:127–138

[22] 22. Milano MT, et al. Normal tissue tolerance dose metrics for radiation therapy of major organs. Semin Radiat Oncol. 2007;17:131–140.

[23] 23. Weytjens R, et al. Radiation pneumonitis: occurrence, prediction, prevention and treatment. Belg J Med Oncol. 2013;7(4):105–10

[24] 24. Emami B, et al. Tolerance of normal tissue to therapeutic irradiation. Int J Radiation Oncol Biol Phys. 1991;21:109–22

[25] 25. Çınar M, Engin M, Engin EZ, Ziya Atesçi Y. Early prostate cancer diagnosis by using artificial neural networks and support vector machines. Expert Syst Appl. 2009;36:6357–6361.

[26] 26. Sanchez-Nieto B, Nahum AE. BIOPLAN: software for the biological evaluation of radiation therapy. Med Dosim. 2000;25(2):71–6.

[27] 27. Pinter C, Lasso A, Wang A, Jaffray D, Fichtinger G. SlicerRT: radiation therapy research toolkit for 3D Slicer. Med Phys. 2012;39(10):6332–8.

[28] 28. Sanchez-Nieto B, Nahum AE. BIOPLAN: software for the biological evaluation of radiotherapy treatment plans. Med Dosim. 2000;25(2):71–6.

[29] 29. Bentzen SM, Constine LS, Deasy JO, Eisbruch A, Jackson A, Marks LB, et al. Quantitative analyses of normal tissue effects in the clinic (QUANTEC): an introduction to the scientific issues. Int J Radiat Oncol Biol Phys. 2010;76(3 Suppl):S3–S9.

[30] 30. Cao YJ, Lee S, Chang KH, Shim JB, Kim KH, et al. Patient performance-based plan parameter optimization for prostate cancer in tomotherapy. Med Dosim. 2015;40(4):285–9.

[31] 31. Cao YJ, Lee S, Chang KH, Shim JB, Kim KH, et al. Optimized planning target volume margin in helical tomotherapy for prostate cancer: is there a preferred method? J Korean Phys Soc. 2015;67(1):26–32.

[32] 32. Luxton G, Keall PJ, King CR. A new formula for normal tissue complication probability (NTCP) as a function of equivalent uniform dose (EUD). Phys Med Biol. 2007;53(1):23–36

Predictive Solution for Radiation Toxicity Based on Big Data

Radiotherapy

Abstract

Keywords

Author Information

Suk Lee*

Kwang Hyeon Kim

Choi Suk Woo

Jang Bo Shim

Yuan Jie Cao

Kyung Hwan Chang

Chul Yong Kim

1. Introduction

1.1. Definition of big data and each clinical application overview

Table 1.

2. Clinical application using big data in radiation oncology

2.1. Prostate cancer

Figure 1.

Figure 2.

2.2. Lung cancer

Table 2.

2.3. Head and neck cancer

Figure 3.

3. Machine learning methodology

Figure 4.

3.1. Machine learning introduction

Figure 5.

3.2. Supervised learning

3.2.1. Decision tree

Figure 6.

3.2.2. Support vector machine

Figure 7.

Figure 8.