InTech uses cookies to offer you the best online experience. By continuing to use our site, you agree to our Privacy Policy.

Computer and Information Science » Artificial Intelligence » "Reinforcement Learning", book edited by Cornelius Weber, Mark Elshaw and Norbert Michael Mayer, ISBN 978-3-902613-14-1, Published: January 1, 2008 under CC BY-NC-SA 3.0 license. © The Author(s).

Chapter 20

Application on Reinforcement Learning for Diagnosis Based on Medical Image

By Stelmo Magalhaes Barros Netto, Vanessa Rodrigues Coelho Leite, Aristofanes Correa Silva, Anselmo Cardoso de Paiva and Areolino de Almeida Neto
DOI: 10.5772/5291

Article top

Application on Reinforcement Learning for Diagnosis Based on Medical Image

Stelmo Magalhães Barros Netto1, Vanessa Rodrigues Coelho Leite, Aristófanes Corrêa Silva, Anselmo Cardoso de Paiva and Areolino de Almeida Neto

1. Introduction

The benefit of a medical imaging examination in terms of its ability to yield an accurate diagnosis depends on the quality of both the image acquisition and the image interpretation. During the past century, radiology has grown tremendously due to advances in image detector systems and computer technology.

On the other side, the image interpretation is an error prone task. The number of lawsuits filed against medical imaging professionals that are related to the miss of a diagnosis is close to 70% (Berlin, 1995). The most common errors are perceptual errors that lead to diagnoses misses, representing about 60% of the cases (Renfrew et al., 1992). In cancer diagnoses, some studies show that the risks of false negative diagnoses are up to 75%.

These number of diagnosis errors due to misinterpretation of medical images and the current development in automation methods to help the specialist in this error prone task, lead to the development of novel devices and techniques to assist the specialist in the diagnosis achievement. These systems can provide a second opinion and may be used as a first stage of radiological interpretation (Nab et al, 1992). These systems are commonly named Computer-Aided Diagnosis (CAD) systems and have been developed to assist radiologists and other specialized physicians in the diagnostic setting like early detection of lung cancer in radiographs and CT images.

The system aid may contribute to understand essential features and information hidden in the images which are not readily apparent. Also, the effects between the different image aspects are not distinguishable. The image information and the extracted parameters may be too complex to be solved with conventional techniques. Thus we may use other techniques like Machine Learning methods. Machine Learning methods use these complex sets of data, and can help to model the nonlinear relationships that exist between them, improving medical care.

Machine Learning (ML) aims at providing techniques and methods for accumulating, changing and updating knowledge in computational systems, and in particular mechanisms to help the system to induce knowledge from examples or new data. These methods are appropriated when we do not have algorithmic solutions, in the absence of formal models, or when we do not know precisely the application domain.

One of the approaches of Machine Learning is reinforcement learning, which emphasizes the individual’s learning through interactions with his environment, contrasting with classical machine learning approaches that privilege learning from a knowledgeable teacher, or on reasoning from a complete model of the environment.

In Reinforcement Learning the learner is not told which action to take, but instead must find which actions yield a better reward after trying them. The most distinguishing features of reinforcement learning are trial-and-error search and delayed reward.

In the Machine Learning community, reinforcement learning has been used to solve many complex tasks normally thought of as quite cognitive. For example, a reinforcement learning algorithm has performed the medical diagnosis (Stensmo & Sejnowski, 1996), bioinformatics (Sahba et al, 2006), speech recognition (Rabiner, 1989), spell recognition (Raedt & Bruynooghe, 2004), computational vision and even robots locomotion (Smart, 2002).

The purpose of this chapter is to investigate the adequacy of the reinforcement learning technique to classify lesions based on medical image. We will show the application of this technique with the goal of lung nodules classification between malignant from benign. We will use a set of 3D geometric measures extracted from the lung lesions Computerized Tomography (CT) images.

This work is organized as follows. Section 2 presents a brief overview about the main concepts of Reinforcement Learning theory. Following this, in Section 3 the medical imaging main modalities are described and its use for cancer detection/diagnosis is shown. Specially, we describe the lung cancer problem and some of the methods applied for its diagnose based on medical images and computer supported. Section 4 describes some works proposed in the literature that apply reinforcement learning to medical images, presenting a more detailed description of an application of reinforcement learning for lung cancer lesions classification. Finally, Section 5 presents the final remarks

2. Reinforcement Learning

Reinforcement learning (Sutton & Barto, 1998) is a formal mathematical framework in which an agent manipulates its environment through a series of actions, and in response to each action receives a reward value. An agent stores its knowledge on how to choose reward maximizing actions in a mapping from agent internal states to actions. In essence, the agent’s “task” is to maximize its reward over time. Good task performance is precisely and mathematically defined by the reward values.

Reinforcement learning is a problem formulation, not a solution technique. The siblings of reinforcement learning are supervised learning and unsupervised learning.

One can assert that Reinforcement Learning (RL) is a training which uses a tip or clue that can be positive or negative. The apprentice is not taught which action he must realize, but some signals are given to him as to allow him to decide/choose a better road.

Is at this point where the RL differentiates from the supervised learning, which necessitates a teacher to teach what is the more appropriated action for each state.

Reinforcement learning is frequently used when there is no good exemplar behavior to be followed, when the environment is unknown or when one wishes to satisfy various goals at the same time. This kind of learning is inspired in children’s learning. Children perform random actions and discover cause and effect relationships according to environmental responses (food, water, a teacher’s smile, a praising word or a very loud sound, an electric shock, the awful teacher’s face or a complaint), they learn which actions are good and which are bad.

Formally in the Reinforcement Learning problem there is an agent and an environment that interact in a sequence of discrete steps, t = 0; 1; 2; 3;... On each step, the agent perceives that the environment is in a state, st, and selects an action, at. In response, the environment makes a stochastic transition to a new state, st+1, and stochastically emits a numerical reward, rt+1 . The agent seeks to maximize the reward it receives in the long run. For example, the most common objective is to choose each action as to maximize the expected discounted return.

For each pair state/action, (s,a), there is a reinforcement signal, R(s,a)→, which is given to the agent when whenever the agent performs the action a in the state s. The agent’s relationship with the environment is illustrated in Figure 1.

media/image1.jpeg

Figure 1.

Reinforcement learning elements.

The reinforcement signal is the agent’s learning basement. The reinforcement must indicate the goal to be reached. For example, when playing draughts the reinforcement can be given to the agent just at the end of the game, being positive if the agent wins and negative when he loses or be drawn. Doing so, the reinforcement is showing to the agent that his goal is to win the game and not to lose or be drawn. The reinforcement learning problem is to choose actions policy that maximizes the totality of the rewards received by the agent. An actions policy corresponds to a function (s) → a, that states which action for each state must be realized by the agent. An agent can follow several action policies, but the learning goal is to calculate the policy that maximizes the sum of the future rewards, i.e., the total of rewards received after adopting that policy. That optimal policy is called *.

2.1. Q-Learning

Q-Learning (Watkins & Dayan, 1992) is one of the methods for solving the reinforcement learning problem. That technique iteratively estimates a function Q(s,a) →, which determines the sum of expected future rewards when the agent performs the action a in the state s, continuing from there on to act optimally. As that sum can be infinite, whether there is not a final state to be attained, it used a discount factor in the sum parcels. That discount factor also differentiates the rewards far away from the actual state, giving a higher value to the closest rewards. This way, the function Q defines the sum of the discounted future rewards.

Once it is estimated the function Q, the agent’s behavior (actions policy) can be easily determined. As highly valued rewards are linked to a good performance, it is enough for the agent to choose, at each state s, the action a, holding the highest value of Q(s,a). Nevertheless, following this approach, the agent loses a part of his learning capacity. As at each state is always executed the same action (the highest valued action of Q), only that action will have their value updated, being able of perpetuating itself as the best action. This way, during the training phase, when the Q function is being built, it becomes necessary to exchange usufruct and exploitation phases.

To usufruct means that the agent will choose the best action, in the current Q estimative; exploitation means that the agent will choose a random action a’, as to have its Q(s,a’) value updated and, possibly, may became the best action. However, the decision on which strategy must be adopted at each moment is not trivial, yielding the exploitation /usufruct dilemma.

The inter-change between usufruct and exploitation also occurs in dynamic environments. In these cases, the agent just learns the optimal politics. As the environmental variables are subject to change, it is necessary the agent be constantly updated, updating its optimal policy estimative, which changes with the time.

The value of Q(s,a) is updated along the agent’s learning, using the following rule:

Q(s,a)Q(s,a)+α[r+γ.maxaQ(s',a)Q(s,a)]
(1)

Equation 1: Updating rule in the Q-Learning method

where α is the agent’s learning rate, r is the reinforcement received by the realization of action a in the state s, and γ is the discount rate. The variable s’ indicates the opponent’s current state; i.e., the state he is in after having performed the action a in the state s. An important characteristic of that rule is that it does not use any knowledge on the environment dynamics, just the knowledge on the variables defined by the reinforcement learning problem (s, a, s’, e r) and about the learning parameters ( α and γ ). Another particularity is the rule’s computational efficiency, which just uses basic operations and comparisons.

media/image6.jpeg

Figure 2.

Learning elements in a Q-Table.

One particularity of the Q-Learning method is the way in which the Q function is approximated. The simpler approach, and also the most popular, represents the Q function by a bi-dimensional matrix, called Q-Table, with the states in one dimension and the actions in another. That representation makes possible to easily access to tables’ items (for consultation and updating), besides being easier to implement, but has the disadvantage of using a lot of space, turning its use unfeasible in applications with large actions or large states space. In addition, that representation does not generalize the learned knowledge, thus its training needs to simulate all possible situations, becoming very slow. The representation of the reinforcement elements into a Q-Table is illustrated in Figure 2.

3. Medical Image

Medical imaging has been undergoing a revolution making possible the execution of medical procedures faster, more accurate, and less invasive. The imaging techniques have the potential to broaden our observation capabilities and understand the biophysical world, leading to a dramatic increase in our ability to apply new algorithms and techniques to model physiological functions and dysfunctions in the patient’s body.

From the discovery of X-rays in 1895, images are used as a way of acquiring information on the patients’ health state. In 1917 J. Radon elaborated mathematical theories that would allow the tomography reconstruction of images. The use of images spread out from 1967 with the building of the first tomography by G. N. Hounsfield. Nowadays, there are several imaging modalities in the medical area.

In the last two decades there have been significant advances in computerized medical imaging. Such developments led to new imaging modalities in two, three and multi-dimensions, which became important clinical tools in the radiological diagnosis. The various modalities of radiological images are very significant in the medical practice and are also decisive in illnesses treatment. While in the beginning of the last century radiological images were the only way of images acquisition, several new modalities were developed up to now, and are widely used to acquire anatomical, physiological, metabolic and functional information of the human body. Currently, the most common ways of acquisition of medical images are: Computerized Tomography (CT), Magnetic Resonance Imagery (MRI), Single Photon Emission Computerized Tomography (SPECT), Positron Emission Tomography (PET) and Ultrasound.

Today, medical imaging is an essential part of medicine. The pathologies can be observed rather than inferred from symptoms. For example, a specialist can monitor the healing of damaged tissue or the growth of a tumor, and determine an adequate therapy.

Many different imaging techniques are available nowadays and are commonly used in clinical daily practice. Each imaging modality is proper to revealing a particular organ or pathology characteristics (Hendee & Ritenour), but they are complementary as they offer different views of the same tissues or functionalities.

Some imaging modalities are appropriate to image the anatomical morphology. They include radiography, ultrasound (US), computed tomography (CT), magnetic resonance imagery (MRI). On the other side we have the functional modalities that are used to study the metabolism of the tissues. In this class we have scintigraphy, single photon emission computerized tomography (SPECT), positron emission tomography (PET) and the functional magnetic resonance imagery (fMRI). As new techniques are being added every few years the list becomes outdated very quickly (Brooks, 2001). We will briefly describe the principal imaging modalities, for a more detailed description see [http://www.sprawls.org/resources/].

The Ultrasonography is based on high frequency sound waves sent by a transmitter that bounce off the different tissues and organs to produce distinctive patterns of echoes that are captured by a receiver and forwarded to a computer that translates them into an image on a screen. Ultrasound is suitable for abdomen imaging as it distinguish subtle variations among soft, fluid-filled tissues. Additionally, it does not damage tissues with ionizing radiation, but generates very noisy images.

Computerized Tomography (CT) is generated by a number of 2D radiographs transversally acquired around the patient body. The 3D image is then reconstructed by an algorithm using the Radon transform (Helgason, 1980). CT offers high contrast between bone and soft tissue and low contrast among different soft tissues. As CT uses X-Rays we must take into account the effects of ionizing radiation. CT is much proper for imaging the thoracic cage.

The Magnetic Resonance Imaging relies on the relaxation properties of magnetically-excited hydrogen nuclei of water molecules in the body. The patient is briefly exposed to a burst of radio-frequency energy, which, in the presence of a magnetic field, puts the nuclei in an elevated energy state. As the molecules undergo their normal, microscopic tumbling, they shed this energy into their surroundings, in a process referred to as relaxation. Images are created from the difference in relaxation rates in different tissues. MRI uses magnetic fields and non-ionizing radiation in the radio frequency range. Thus, according to actual medical knowledge, is harmless to patients and has much better soft tissue contrast than X-rays, being adequate for brain and spinal cord scans.

It can be noticed that those methods involve sophisticated instrumentation and equipment based on computers for data collecting, image reconstruction and visualization.

Those forms of imaging are valuable because they are not invasive, that is, instruments do not penetrate the patient’s body. Besides that, there is no doubt on the quality of the images generated by such equipments, benefiting medical practices such as diagnosis, surgical planning and therapy.

Such images have a high medical content once they store relevant information for the exercise of diverse medical specialties: oncology, gynecology, radiology, pneumology and cardiology, just to cite some. However, as to take the maximum advantage of those images content, specialist of the medical area need to use the computer.

In addition, those images can be processed and handled as to allow the visualization of characteristics initially imperceptible, turning possible better accuracy and important characteristics checking used in diagnosis elaboration. Next, those main features can be quantified and analyzed through programs and computational models to understand their behavior, thus contributing in the diagnosis or just to evaluate the evolution of therapeutic protocol.

Thus, we verify that is necessary to develop computational programs and methods for processing and handling the data obtained through the different medical images acquisition techniques, allowing the enhancement and preservation of the clinical data present in the exam.

The current degree of development reached by the computational modeling techniques together with the fast growing of the computers calculation performance, has allowed the study, development and solution of highly sophisticated models able of aiding, with a rather fair degree of accuracy, in the results of important medical procedures, such as cancer diagnosis, for example.

3.1. Cancer Diagnosis

Cancer is the name given to all malignant tumors, and when their size is small, in the form of a nodule. The word derives from the Latin cancer, which means crab. That name is due to the similitude between the crustacean legs and the tentacles of the tumor that infiltrate like roots into the healthy tissue of the body.

There is great difficulty to qualitatively define the benignant or malignant characteristics of the nodule, as well as in the tracking and following of its growth in a reliable way. Commonly, the evaluation of the nodular growth is done by measuring the nodule in the computerized tomography printed film or x-rays using a ruler passed over the image, what results in not so accurate measurements. Even though more accurate measurements could be directly taken with the digital data, many times they are available to physicians, who usually can access only the printed film.

Surgical nodule extraction is a practice applied to the majority of the patients presenting asymptomatic nodule with undetermined etiology, in a patient with etiological data compatible with higher susceptibility to cancer. Nevertheless, many of those interventions could be avoided once most of the times the nodules are benign. Hence, it is fundamental to use more precise techniques to better evaluate the nodular growing and their characteristics, to make possible a more reliably determination of the nodule’s benignity or malignancy.

Some factors difficult the nodule’s identification and diagnosis, among these are:

The organ’s structures present similar characteristics (shape, densities, etc.) which mixes up one another, turning them confuse;

In its initial phase, if the nodule is small and has no well defined shape, is hard to diagnostic it;

Measurements taken by physicians to analyze the nodule’s evolution, as for example its diameter, are done handmade, usually using a ruler sweeping over the image;

Physicians’ visual fatigue, emotional factors and experience may influence the diagnostic;

Finally, in many cases, the image’s quality is bad.

One of the most common cancers in the world is lung cancer. It is a leading cause of cancer death in men and women. Cigarette smoking causes most lung cancers.

3.2. Lung Cancer

Lung cancer is a serious problem of public health in Europe, United States and many other countries around the world because it is becoming the cancer mortality leader for men and women. The disease is also known as one of the shortest means for survival among other malignancies (Tarantino, 1997). As soon as the diagnosis is made it has been estimated that only 13% of the patients will be alive after 5 years (Lag l, 2002). In Brazil, lung cancer occupies the first place of cancer’s death in men and the second in women. It is estimated that it caused 27.170 deaths (17.850 men and 9.320 women) in 2006 (INCA, 2003).

In spite of lung cancer earns the benefit for one of the most efficacious measures of primary prevention, it has been expected that results of recent campaigns against smoking will become apparent only after two or three decades. While this, lung cancer continues to be present in advanced stages with a global outcome close to 13% in five years (Lag l, 2002).

From the biological point of view, lung cancer is an uncontrolled growth of abnormal cells in one or both lungs, which can reproduce very fast and produce regional and distant metastasis even when the patient is asymptomatic. Lumps of cancer cells (tumors) impair lung’s function when obstructs the bronchi but not when they are located in other areas. The pulmonary parenchyma does not have painful nervous terminations and unless the tumor invades precociously the parietal pleura, pain will be a late sign of lung cancer ( Silva et al, 2004 ).

Nowadays, the main chance to discover a lung cancer in its initial stage is an incidental finding of a solitary pulmonary nodule disclosed by Chest X ray or Computed Tomography (CT), indicated to explore some abnormal thoracic clinical manifestation or routine preoperative evaluation. Other possibility, that has become important in recent years, is a CT Screening Lung Cancer Program in high risk patients like heavy smokers that have smoked for more than 30 years (Henschke et al, 2003).

The main problem of the solitary pulmonary nodule is the identification of its nature. Sometimes this is possible only with radiological findings that allow diagnosis of benignity like total, central, lamellar or popcorn calcifications and high fat contents (hamartoma). Alternatively, some data are highly suggestive of malignity like specular margins and pleural tail but unfortunately around 15% of these findings also occur in benign nodules. In many other cases is not possible with simple radiological criteria to know the true nature of the nodule which is classified as undetermined. This situation is particularly frequent in nodules shorter than 1 cm of diameter where benign aetiology can respond for more than 90% of the total (Lillington & Caskey, 1993), (Henschke et al, 2003).

The top row in Figure 1 shows the texture from a slice of two benign (a and b) and two malignant (c and d) nodules. The bottom row in Figure 1 shows their respective 3D shape.

media/image7.png

Figure 3.

Benign and malignant lung nodules examples.

At radiological examination, solitary pulmonary nodules are approximately round lesions shorter than 3 cm in diameter, completely surrounded by lung parenchyma and can represent a benign or malignant disease. Any larger lesion is named pulmonary mass and should be considered as malignant until counterproof is found. In all of these situations, etiologic definition is paramount to the medical decision. In spite of the gold standard diagnosis be the histological examination - normally obtained by invasive procedures - image methods and in special CT can aid diagnostic process in analyzing nodule’s attributes (Ost et al, 2003).

Radiological characteristics of benignity are well known and based in calcifications or fat texture patterns which change the mean radiological density out of soft tissues range. Malignity does not have similar texture criteria and the diagnosis is normally suggested by an irregular shape associated to some clinical data, like tobacco’s load. Venous iodine contrast administration during CT adds some improving in texture resolution in order to discriminate between benign and malignant nodules (Swensen, 1997). Recently, there is a renewed attention to quantify wash-in and washout after contrast injection to obtain a nodule characterization (Jeong et al, 2005). Unfortunately, short diameters and allergic reactions are limiting factors of these techniques. Even the most modern metabolic image method in clinical use, that is the Positron Emission Tomography (PET) superposed to helical CT examination (PET - CT) with images acquisitions before and after 18-fluoro-deoxyglucose intravenous administration, also has important limitations represented by false positive of some inflammatory processes and false negativity of small or indolent cancers (Gould, 2003), (Pepe, 2005), (Giger, 1999).

Some authors have been hypothesizing that quantitative CT data derived from geometric and texture parameters may contribute to differential diagnosis between benign and malignant solitary pulmonary nodules, even without contrast utilization. McNitt-Gray et al. (McNitt-Gray et al, 1999a), (McNitt-Gray et al, 1999b) extracted measurements from nodule’s shape, attenuation coefficient, attenuation distribution and texture. They used a discriminant analysis technique with stepwise variable selection procedure to separate benign from malignant nodules. Kawata et al. (Kawata et al, 1997 ), (Kawata, 1998), ( have presented a method to characterize the internal structure of 3-D nodules using computerized tomography images shape index and density to locally represent each voxel. They created a histogram of characteristics based on shape index, called shape spectrum measurements, to store voxels with a given index to define the nodule. Matrices similar to those of the texture-analysis method (Co-occurrence matrix) were also created for shape index and density. The statistical technique discriminant analysis was employed to classify benign and malignant nodules. In Silva et al. (Silva et al, 2004 ) it was showed that geo-statistical functions as semivariogram, covariogram, correlogram and madogram or some indices of spatial autocorrelation as Moran’s Index and Geary’s Coefficient, supply good results to discriminate malignant from benign nodules.

4. Application of reinforcement learning on medical image

The diffusion of Reinforcement Learning in various application areas is a subject of considerable ongoing research. It is argued that the successful implementation of such method can help the integration of computer-based systems in the healthcare environment, providing opportunities to facilitate and enhance the work of medical experts and ultimately to improve the efficiency and quality of medical care.

One of the main applications area is the use of Reinforcement Learning to build systems that support and help the specialist in the diagnose task. Based upon patient’s information, Fakih & Das (2006) developed a learning based methodology and recommend test(s) that optimize a suitable measurement of diagnostic performance. A comprehensive performance measurement that accounts for the costs of testing, morbidity, and mortality associated with the tests, and time taken to reach diagnosis is developed. The performance measurement also accounts for the diagnostic ability of the tests. The methodology combines tools from the fields of data mining (rough set theory, in particular), utility theory, Markov decision processes (MDP) and reinforcement learning (RL). The rough set theory is used in extracting diagnostic information in the form of rules from the medical databases. Utility theory is used in bringing various non homogenous performance measurements into one cost based measurement. An MDP model together with an RL algorithm facilitates obtaining efficient testing strategies. The methodology is implemented on a sample problem of diagnosing solitary pulmonary nodule (SPN). The obtained results are compared with those from four alternative testing strategies.

(Stensmo & Sejnowski, 1996) used decision and probability theory to construct such systems from a database of typical cases. This simplifies the task of knowledge extraction from physicians who often do not know how they have come to a certain diagnosis. Probability models are constructed using mixture models that are efficiently learned by the Expectation-Maximization algorithm. Problems with missing data are then solved, both for missing data in the case database and during diagnosis (missing data are then not yet conducted observations). Decision theory is used to find the most informative next question to ask, observation to make, or test to do in order to minimize the diagnosis total cost. It is also used to decide when to stop requesting more information. To automatically find good utility values for the decision theoretic model, temporal difference reinforcement learning is used to increase the system accuracy. Results are presented on a case database for heart disease.

On the other hand, we also found in the literature some works using Reinforcement Learning to help the segmentation of medical images. (Shokri & Tizhoosh, 2003) introduced a reinforcement-learning concept to find the optimal threshold for digital images. The proposed approach can integrate human experts knowledge in an objective or subjective way to overcome the shortcomings of the existing methods.

(Peng & Bhanu, 1998) presented a system that achieves robust performance by using reinforcement learning to induce a mapping from input images to corresponding segmentation parameters. This is accomplished by using the confidence level of model matching as a reinforcement signal for a team of learning automata to search for segmentation parameters during training. The use of the recognition algorithm as part of the evaluation function for image segmentation gives rise to significant improvement of the system performance by automatic generation of recognition strategies. The system is verified through experiments on sequences of indoor and outdoor color images with varying external conditions.

(Sahba et al, 2006) introduced a new method for medical image segmentation using a reinforcement learning scheme. They use this novel idea as an effective way to optimally find the appropriate local threshold and structuring element values and segment the prostate in ultrasound images. Reinforcement learning agent uses an ultrasound image and its manually segmented version and takes some actions (i.e., different threshold and structuring element values) to change the environment (the quality of segmented image). The agent is provided with a scalar reinforcement signal determined objectively. The agent uses this objective reward/punishment to explore/exploit the solution space. The values obtained using this procedure can be used as valuable knowledge to fill a Q-matrix. The reinforcement learning agent can use this knowledge for similar ultrasound images as well. The results demonstrated high potential for applying reinforcement learning in the field of medical image segmentation.

4.1. Reinforcement Learning for Classifying Lesions based on Medical Imaging

4.1.1. Image Acquisition

The images herein used were provided by the Fernandes Figueira Institute and the Pedro Ernesto University Hospital - both from Rio de Janeiro city – for a project of CAD tools development. They were obtained from different real patients, providing a total of 39 nodules (29 benign and 10 malignant).

The images were acquired with a Helical GE Pro Speed tomography under the following conditions: tube voltage 120 kVp, tube current 100 mA, image size 512×512 pixels, voxel size 0.67 × 0.67 × 1.0 mm. The images were quantized in 12 bits and stored in the DICOM format (Clunie, 2000).

It is important to stand out that the CT exam was performed with no contrast injection, which may be clinically used in order to increase the diagnosis readiness but also carries some morbidity and occasional mortality by allergic complications.

It is also necessary to highlight that the nodules were previously diagnosed by physicians and that the final diagnosis of benignity or malignancy was further confirmed by histopathological exam of the surgical specimen or by radiological 3-year stability, which explains the reduced size of our sample.

4.1.2. The lung nodules segmentation

In most cases, lung nodules are easy to be visually detected by physicians, since their shape and location are different from other lung structures. However, the nodule’s voxel density is similar to that of other structures, such as blood vessels, which makes difficult any kind of automatic computer detection.

Generally speaking, the solitary pulmonary nodule is normally found in Chest X Ray or CT as an unexpected finding. The main reason for this is because the nodule (> 1cm) is easily distinguished from the surrounding structures. If this distinction is difficult the nodule’s diagnosis is difficult too. It is common that in an evaluation of an automatic process for segmentation or in a created program to contribute for lung cancer screening, the gold standard is the analysis made by one or more radiologists. For example: it can be difficult for an automatic segmentation program to distinguish between a nodule and the chest wall, but is relatively easy for an experienced observer to separate the two structures. The same occurs with a nodule close to a vascular structure.

Unless the nodule be in a central position close to a hilar vessel, the distinction is not difficult. However, another level of situation is represented for a small nodule (< 1cm). In this setting the radiological diagnosis can be difficult and separation from vascular structures either. References which express the radiologist as a reference point to evaluate computer’s analysis are for example (Takashima, 2003).

A semi-automatic segmentation process was performed using a Pulmonary Nodule Analysis System (Silva et al, 2002) called Bebúi. In this, beyond the 3D region growing algorithm with voxel aggregation, two resources help and provide greater control in the segmentation procedure: the barrier and the eraser. The barrier is a cylinder placed around the nodule by the user with the purpose of restricting the region of interest and stopping the segmentation by voxel aggregation from invading other lung structures. The eraser is a resource of the system that allows physicians to erase undesired structures, either before or after segmentation, in order to avoid and correct segmentation errors (Silva et al, 2002).

The Marching Cubes algorithm (Lorensen & Clinie, 1987) is used to build an explicit representation of volume data. The measurements described along the present paper will use this representation. In order to remove irregularities of the reconstructed surface, the Laplacian smoothing technique (Ohtake et al, 2001) is used. Figures 2 (a) and (b) show the result of applying the Marching Cubes algorithm and the Laplacian technique, respectively.

media/image8.png

Figure 4.

a) Application of Marching Cubes. b) Application of Laplacian technique.

4.1.3. 3D Geometric Measurements

The measurements to be presented in this section seek to capture information on the nodule’s 3D geometry from the CT. The measurements should ideally be invariant to changes in the image’s parameters, such as voxel size, orientation, and slice thickness.

Sphericity Index: The Sphericity Index (SI) measures the nodule’s behavior in relation to a spherical object. It is defined as

where V is the surface volume and A corresponds to the surface area. Thus, if the nodule’s shape is similar to a sphere, the value will be close to 1. In all cases, SI ≤ 1.

Convexity Index: The Convexity Index (CI) (Smith, 1999) measures the degree of convexity, defined as the area of the surface of object B (A(B)) divided by the area of the surface of its convex hull (A(HB)). That is,

The more convex the object is, the closer to 1 will be the value of CI. For all objects, CI ≥ 1.

Curvature Index: The two measurements presented below are based on the main curvatures kmin and kmax, defined by

kmin,max=HH2K
(4)

where K and H are the Gaussian and mean curvatures, respectively. The values of H and K are estimated using the methods described in (Esse & Drury, 1997).

Intrinsic curvature: The Intrinsic Curvature Index (ICI) (Smith, 1999), (Esse & Drury, 1997) captures information on the properties of the surface’s intrinsic curvatures, and is defined as

ICI=14π|kminkmax|dA
(5)

Any undulation or salience on the surface with the shape of half a sphere increments the Intrinsic Curvature Index by 0.5, regardless of its size – that is, the ICI counts the number of regions with undulations or saliencies on the surface being analyzed.

Extrinsic curvature: The Extrinsic Curvature Index (ECI ) (Smith, 1999), (Esse & Drury, 1997) captures information on the properties of the surface’s extrinsic curvatures, and is defined as

ECI=14π|kmax|(|kmax||kmin|)dA
(6)

Any crack or gap with the shape of half a cylinder increments the ECI in proportion to its length, starting at 0.5 if its length is equal to its diameter - that is, the ECI counts the number and length (with respect to the diameter) of semi cylindrical cracks or gaps on the surface.

Types of Surfaces: With the values of extrinsic (H) and intrinsic (K) curvatures, it is possible to specify eight basic types of surfaces [13], [14]: peak (K > 0 and H < 0), pit (K > 0 and H > 0), ridge (K = 0 and H < 0), flat (K = 0 and H = 0), valley (K = 0 and H > 0), saddle valley (K < 0 and H >0), minimal (K <0 and H = 0), saddle ridge (K <0 and H <0).

The measurements described below were presented in (Koenderink, 1990) for the classification of lung nodules and the results were promising. However, they have computed curvatures H and K directly from the voxel intensity values, while here we compute them with respect to the extracted surface, which is composed by triangles.

In practice, it is difficult to determine values that are exactly equal to zero, due to numerical precision. Therefore we have selected only the types peak, pit, saddle valley and saddle ridge for our analysis (Koenderink, 1990).

Amount of each Surface Type:

This measurement indicates the relative frequency of each type of surface in the nodule, where APK (Amount of peak surface), API (Amount of pit surface), ASR (Amount of saddle ridge surface) and ASV (Amount of saddle valley surface).

Area Index for each Surface Type:

For each surface type, the area occupied in the nodule divided by the total nodule area is computed, where AIPK (Area Index for peak surface), AIPI (Area Index for pit surface), AISR (Area Index for saddle ridge surface) and AISV (Area Index for saddle valley surface). =>

Mean Curvedness for each Surface Type:

Curvedness is a positive number that measures the curvature amount or intensity on the surface [13]:

c=kmin2+kmax22
(7)

The measurements are based on the curvedness and the surface types. For each surface type, the mean curvedness is determined using the curvedness of each surface type, divided by the curvedness number in each surface type. Where, CPK (mean curvedness for peak), CPI (mean curvedness for pit), CSR (mean curvedness for saddle ridge) and CSV (mean curvedness for saddle valley).

4.1.4. Tests and results

The idea of classification is to encounter a path from the pattern presented to a known target, in the present case to a malignant or to a benign pattern. Furthermore the path found should be the shortest in some sense; in such way that the presented pattern seems to be nearer from a known target and therefore it can be considered of the type of the target. Considering the diverse techniques, the Reinforcement Learning (RL) was chosen to find this path, because it can learn without a mathematical model of a path and because it can learn a correct path only using a reward when the target is encountered.

The tests described in this work were carried out using a sample of 36 nodules, 29 benign and 7 malignant. It is important to note that the nodules were diagnosed by physicians and had the diagnosis confirmed by means of surgery or based on their evolution. Such process takes about two years, which explains the reduced size of our sample. The sample included nodules with varied sizes and shapes, with homogeneous and heterogeneous characteristics, and in initial and advanced stages of development.

The stepwise analysis (Lachenbruch, 1975) selected 5 out of the 13 measurements (states), described in Section 4.1.3, to be analyzed by the reinforcement learning classifier. The selected states were ICE, QPK, QSR, QSV and CPI. Each state was discretized in ten different values. The discretization of each state is shown in Table 1.

StateICEQPKQSRQSVCPI
145 - 492.619 - 145.885 - 167.5523 - 369.830.24 - 0.27
2492.61 - 1.39e+3145.89 - 419.67167.55 - 492.67369.83 - 1.06e+30.27 - 0.32
31.39e+3 - 2.29e+3419.67 - 693.44492.67 - 817.781.06e+3 - 1.76e+30.32 - 0.37
42.28e+3 - 3.18e+3693.44 - 967.22817.77 - 1.14e+31.76e+3 - 2.45e+30.37 0.43
53.18e+3 - 4.07e+3967.22 - 1,2401.14e+3 - 1,4672.45e+3 - 3.14e+30.43 - 0.48
64.07e+3 - 4.97e+31,241 - 1.51e+31,468 - 1.79e+33.14e+3 - 3.84e+30.48 - 0.53
74.97e+3 - 5.86e+31.51e+3 - 1.78e+31.79e+3 - 2.12e+33.84e+3 - 4.53e+30.53 0.58
85.86e+3 - 6.76e+31.79e+3 - 2.06e+32.12e+3 - 2.44e+34.53e+3 - 5.22e+30.58 0.64
96.76e+3 - 7.65e+32.06e+3 - 2.33e+32.44e+3 - 2.77e+35.22e+3 - 5.92e+30.64 - 0.69
107.65e+3 - 8,1022.33e+3 - 2,4732.77e+3 - 2,9315.92e+3 - 6,2660.69 - 0.719

Table 1.

Discretization of each state.

With these states and actions, the matrix Q has the following size: 105x35, because each state has ten different values and each action three possibilities.

The classificator try to reach the state that corresponds to geometric characteristics of the benign or malignant pattern. These are determined by the above defined states, by the increment, constancy and decrement actions over the current value of the corresponding state, at each varaiable state starting from a set of states as initial point. An action is randomly taken with 50% random rate aiming at finding the bests pair state/action for each set of states. The unitary value reinforcement is given to each state change in the case a right pattern is found, conversely, the reinforcement is null.

The learning is done by selecting 19 images of benign and 4 malingnat nodules, and we choose as target the most characteristic bening and malignant nodules. To each image is assigned a learning step for classification and the set of all those images forms an episode.

After training, the knowledge should have been acquired. This is verified with a test of classification with images not used during the training phase. For this purpose nine benign and two malignant images were selected. Figure 3 shows the results obtained, where we used the remaining nine benign and two malignant nodules. In this figure we represent in the x-axis the nodules case, being cases 1 to 9 benign and cases 10 and 11 malignant. On the other hand, the y-axis represents the number of steps from the start point to the target, which means the number of actions taking to reach the case target. When a given case takes a positive number of steps to reach the target, we have a successful classification. Otherwise a negative number represents an incorrect classification and when the classification is not determined we set the number of steps as zero. The obtained data was generated from four experiments, using 20000, 30000, 40000 and 50000 episodes in the training phase.

The number of right classification grows from 45% for 20000 episodes to 81% for 50000 episodes; as shown in Table 2, which indicates a good improvement in the classification success as the number of episodes grows.

Interesting information observed in the results involves the set of 40000 episodes, when the number of successful classification decreased, which should be derived from the random choices used in the training phase, that lead to a poor learning. But, as already proved in RL theorem (Barto et al., 1983), a very high number of episodes drives to a correct learning, generating a very high successful rate in the classification.

media/image15.png

Figure 5.

Application of reinforcement learning.

Due to the relatively small size of the existing CT lung nodule databases and the various CT imaging acquisition protocols, it is difficult to compare the diagnosis performance between the developed algorithms and others proposed in the literature.

SuccessErrorNon-DeterminedNumber of Episodes
45.45%36.36%18.18%20,000
72.72%9.09%18.18%30,000
45.45%45.45%9.09%40,000
81.81%9.09%9.09%50,000

Table 2.

Application of Reinforcement Learning.

The number of studied nodules in our dataset is too small to state definitive conclusions, but the preliminary results of this work are very encouraging, demonstrating that a reinforcement learning using nodule shape characteristics, can contribute to discriminate benign from malignant lung nodules on CT images.

Based on these results, we have observed that such shape characteristics provide significant support to a more detailed clinical investigation, and the results were very encouraging when nodules were classified with reinforcement learning.

Nevertheless, there is the need to perform tests with a larger database and more complex cases in order to obtain a more precise behavior pattern.

7. Conclusion

This work presents an overview of current work applying reinforcement learning in medical image applications, presenting a detailed illustration of a particular use for lung nodules classification.

The addressed application of reinforcement learning to solve the problem of lung nodules classification used the 3D geometric nodules characteristics to guide the classification. Even though the results are preliminary we may see that the obtained results are very encouraging, demonstrating that the reinforcement learning classifier using characteristics of the nodules’ geometry can effectively classify benign from malignant lung nodules based on CT images.

On the other side, we may observe that this is a machine learning that is not commonly applied to medical images and this is an opportunity for more intensive investment in the research for this area.

But some problems are well known in this application and must be more studied. We should research how to find out a way to shorter the training phase, while maintaining the learning quality. And also must be improved the tests to generate more definitive results and to make possible the comparison with other classifiers.

References

1 - A. Almeida, B. Heimann, L. Góes, C. Nascimento, 2003 Avoidance of multiple dynamic obstacles, Proceedings of International Congress of Mechanical Engineering, 17 São Paulo
2 - A. Almeida, B. Heimann, L. Góes, C. Nascimento, 2003 Obstacle avoidance in dynamic environment: A hierarchical solution, Proceedings of Redes Neurais, 289 294 , 6 São Paulo
3 - A. Barto, R. Sutton, C. Anderson, 1983 Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man and Cybernetics, 13 834 846
4 - L. Berlin, J. Berlin, 1995 Malpractice and radiologists in Cook County, IL: trends in 20 years of litigation. AJR Am J Roentgenol 165 781 788
5 - E. Camponogara, M. R. G. Serra, 2005 Aprendizagem por Reforço: Uma Primeira Introdução, CIPEEL, Universidade Federal de Santa Catarina Departamento de Automação e Sistemas, Florianópolis.
6 - D. A. Clunie, 2000 DICOM Structered Reporting, PixelMed Publishing, Pennsylvania
7 - D. Brooks, 2001 Emerging medical imaging modalities, IEEE Signal Processing Magazine, 18, 6 6 12 13
8 - K. Doya, 2999 Reinforcement learning: Computational theory and biological mechanisms, Neural Computation Unit, Okinawa Institute of Science and Technology, Japan, 12 22
9 - R. O. Duda, P. E. Hart, 1973 Pattern Classification and Scene Analysis, Wiley-Interscience Publication, New York
10 - D. C. V. Essen, H. A. Drury, 1997 Structural and functional analyses of human cerebral cortex using a surface-based atlas, The Journal of Neuroscience, 17 7079 7102
11 - S. J. Fakih, T. K. Das, 2006 LEAD: A Methodology for Learning Efficient Approaches to Medical Diagnosis, IEEE Transactions on Information Technology in Biomedicine, 10, 2 April 2006.
12 - M. K. Gould, 2003 Cost-effectiveness of alternative management strategies for patients with solitary pulmonary nodules, Ann. Intern. Med. 9 138 724 735
13 - S. Helgason, 1980 The Radon transform, Birkhäuser, Boston, MA
14 - W. Hendee, R. Ritenour, 2002 Medical imaging physics, 4 ed., Wiley-Liss
15 - D. W. Henderson, 1998 Differental Geometry: A Geometric Introduction, Prentice-Hall, Upper Saddle River, New Jersey
16 - C. I. Henschke, et al. 2001 Early lung cancer action project: A summary of the findings on baseline screening, The Oncologist, 6 147 152
17 - L. T. Hoffman, J. D. S. da Silva, 2004 Um sistema de visão robótica baseada em aprendizagem por reforço, Instituto Nacional de Pesquisas Espaciais- INPE Programa de Pós-gradução em Computação Aplicada, São José dos Campos, SP.
18 - INCA 2003 Estimativas da incidência e mortalidade por câncer no Brasil, available in: http://www.inca.gov.br/estimativas/2003/versaofinal.pdf
19 - Y. Jeong, K. Lee, S. Jeong, M. Chung, S. Shim, H. Kim, O. Kwon, S. Kim, 2005 Solitary pulmonary nodule: characterization with combine wash-in and washout features of dynamic multidector row CT., Radiology 2 237 675 683 .
20 - Y. Kawata, N.. Niki, H. Ohmatsu, R. Kakinuma, K. Eguchi, M. Kaneko, N. Moriyama, 1997 Classification of pulmonary nodules in thin-section CT images based on shape characterization, In: International Conference on Image Processing, 3, IEEE Computer Society Press, 528 530
21 - Y. Kawata, N. Niki, H. Ohmatsu, R. Kakinuma, K. Eguchi, M. Kaneko, N. Moriyama, 1997 Classification of pulmonary nodules in thin-section CT images based on shape characterization, in: International Conference on Image Processing, 3 IEEE Computer Society Press, 528 530 .
22 - Y. Kawata, N. Niki, H. Ohmatsu, R. Kakinuma, K. Eguchi, M. Kaneko, N. Moriyama, 1988 Quantitative surface characterization of pulmonary nodules based on thin-section CT images, IEEE Transactions on Nuclear Science, 45 4 2132 2138
23 - Y. Kawata, N. Niki, H. Ohmatsu, M. Kusumoto, R. Kakinuma, K. Mori, H. Nishiyama, K. Eguchi, M. Kaneko, N. Moriyama, 1999 Computer aided differential diagnosis of pulmonary nodules using curvature based analysis, in: International Conference on Image Analysis and Processing, 2 IEEE Computer Society Press, 470 475 .
24 - J. J. Koenderink, 1990 Solid Shape, MIT Press, Cambridge, MA, USA
25 - P. A. Lachenbruch, 1975 Discriminant Analysis, Hafner Press, New York.
26 - R. Lag, Seer cancer statistics review, Tech, National Cancer Institute, Bethesda, MD, available at http://seer.cancer.gov/csr/1975 2002 accessed on July/2007
27 - E. Levin, R. Pieraccini, W. Eckert, Learning Dialogue Strategies within the Markov Decision Process Framework, AT&T Labs-Research, Florham Park, USA, 72 79 7279
28 - G. A. Lillington, C. J. Caskey, 1993 Evaluation and management of solitary and multiple nodules, Clin Chest Med, 14 1 111 119 .
29 - W. E. Lorensen, H. E. Cline, 1987 Marching cubes: A high resolution 3D surface construction algorithm, Computer Graphics, 21 163 169
30 - G. McGuiness, D. M. Libby, J. P. Smith, M. W. Pasmantier, O. S. Mietinnen, 2003 Ct screening for lung cancer: Significance of nodules at baseline according to size, Radiology, 30 1 11 15 .
31 - M. F. McNitt-Gray, E. M. Hart, N. Wyckoff, W. Sayre, J. G. Goldin, D. R. Aberle, 1999a A pattern classification approach to characterizing solitary pulmonary nodules imaged on high resolution CT: Preliminary results, Medical Physics 26 6 880 888
32 - M. F. McNitt-Gray, E. M. Hart, N. Wyckoff, J. W. Sayre, J. G. Goldin, D. R. Aberle, 1999b The effects of co-occurrence matrix based texture parameters on the classification of solitary pulmonary nodules imaged on computed tomography, Computerized Medical Imaging and Graphics, 339 348 .
33 - H. W. Nab, N. Karssenmeijer, L. J. T. H. O. Van Erning, J. H. C. L. Hendriks, Comparison of digital and conventional mammography: a ROC study of 270 mammograms. Medical Informatics, 17, 125 131
34 - N. Nikolaidis, I. Pitas, 2001 3-D Image Processing Algorithms. John Wiley, New York
35 - Y. Ohtake, A. Belyaev, A. Pasko, 2001 Dynamic meshes for accurate polygonization of implicit surfaces with shape features, In Press, I.C.S., ed.: SMI 2001 International Conference on Shape Modeling and Applications, 74 81
36 - D. Ost, A. M. Fein, S. H. Feinsilver, 2003 The solitary pulmonary nodule. N. Engl. J. Med. 25 2535 2542
37 - J. Peng, B. Bhanu, 1998 Closed-Loop Object Recognition Using Reinforcement Learning, IEEE Transations on Pattern Analysis and Machine Intelligentece, 20, 2 February 1998.
38 - G. . Pepe, C. Rosseti, S. Sironi, G. Landoni, L. Gianoli, U. Pastorino, P. Zannini, M. Mezzetti, A. Grimaldi, L. Galli, C. Messa, F. Fazio, 2005 Patients with known or suspected lung cancer: evaluation of clinical management changes due to 18 F- deoxyglucose positron emission tomography (18 F- FDG PET) study., Nucl. Med. Commun. 9 26 831 837
39 - L. R. Rabiner, 1989 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, 77 2 257 286
40 - L. Raedt, M. Bruynooghe, 2004 Relational Reinforcement Learning, Katholieke Universiteit Leuven- Faculteit Toegepaste Wetenschappen Arenbergkasteel, Belgium
41 - A. P. Reeves, W. J. Kostis, 2000 Computer-aided diagnosis for lung cancer, Radiologic Clinics of North America, 38 497 509
42 - D. L. Renfrew, E. A. Franken, K. S. Berbaum , Jr.et al. 1992 Error in radiology: classification and lessons in 182 cases presented at a problem case conference. RenfrewD. L.FrankenE. A.Jr BerbaumK. S.et al. (1992) Error in radiology: classification and lessons in 182 cases presented at a problem case conference. Radiology, , 183 145 150
43 - S. G. A. III., M. L. Giger, C. J. Moran, J. T. Blackburn, K. Doi, H. MacMahon, 1999 Computerized detection of pulmonary nodules on CT scans, Radiographics 19 5 1303 1311
44 - F. Sahba, H. R. Tizhoosh, M. M. A. Salama, 2006 A Reinforcement Learning Framework for Medical Image Segmentation, Proceedings of International Joint Conference on Neural Networks, IJCNN’06., 511 517
45 - M. Shokri, H. R. Tizhoosh, 2003 Using reinforcement learning for image thresholding, Department of Systems Design Engineering, Faculty of Engineering, University of Waterloo, Canada.
46 - A. C. Silva, P. C. P. Carvalho, 2002 Sistema de análise de nódulo pulmonar, In: II Workshop de Informática aplicada a Saùde, Universidade de Itajai, Itajaí.
47 - A. C. Silva, P. C. P. Carvalho, M. Gattass, 2004 Analysis of spatial variability using geostatistical functions for diagnosis of lung nodule in computerized tomography images, Pattern Analysis & Applications 7, 3, 227 234
48 - A. C. Silva, P. C. P. Carvalho, M. Gattass, 2004 Diagnosis of solitary lung nodule using semivariogram and skeletonization in computerized tomography images, Proceedings of 21st Meeting of the Society for Computer Applications in Radiology (SCAR 2004)
49 - A. C. Silva, P. C. P. Carvalho, M. Gattass, 2005 Analysis of spatial variability using geostatistical functions for diagnosis of lung nodule in computerized tomography images, Pattern Analysis and Applications, 7 227 234
50 - A. C. Silva, P. C. P. Carvalho, A. Peixoto, M. Gattass, 2004 Diagnosis of lung nodule using gini coefficient and skeletonization in computerized tomography images, Proceedings of 19th ACM Symposium on AppliedComputing (SAC 2004), 243 248 , NY, USA, ACM Press New York (2004)
51 - W. D. Smart, 2002 Making Reinforcement Learning Work on Real Robots, Ph.D. thesis, Department of Computer Science, Brown University
52 - A. C. Smith, 1999 The Folding of the Human Brain, from Shape to Function, PhD thesis, University of London. Available at http://carmen.umds.ac.uk/a.d.smith/phd.html
53 - M. Stensmo, T. Sejnowski, 1996 Automated Medical Diagnosis based on Decision Theory and Learning from Cases, World Congress on Neural Networks 1227 1231
54 - R. Sutton, A. Barto, 1998 Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA
55 - S. J. Swensen, 1997 The probability of malignancy in solitary nodules. application to small radiologically indeterminate nodules, Arch. Intern. Med. 8 157 849 855
56 - S. Takashima, S. Sone, F. Li, Y. Mauyama, M. Hasegawa, M. Kadoya, 2003 Indeterminate solitary pulmonary nodules revealed at population-based ct screening of the lung: using first follow-up diagnostic ct to differentiate benign end malignant lesions., Am J Roentology 180 5 1255 1263 .
57 - A. B. Tarantino, 1997 38. In: Nódulo Solitário Do Pulmão, 4 edn. Guanabara Koogan, Rio de Janeiro 733 753
58 - C. Watkins, P. Dayan, 1952 Q-Learning, Machine Learning, 8 3 279 292
59 - Peng- Yeng. Yin , 2002 Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization, Department of Information Management, Ming Chuan University Taiwan, 2002.