Open access peer-reviewed chapter

Can Reinforcement Learning Be Applied to Surgery?

Written By

Masakazu Sato, Kaori Koga, Tomoyuki Fujii and Yutaka Osuga

Submitted: 23 February 2018 Reviewed: 02 March 2018 Published: 27 June 2018

DOI: 10.5772/intechopen.76146

From the Edited Volume

Artificial Intelligence - Emerging Trends and Applications

Edited by Marco Antonio Aceves-Fernandez

Chapter metrics overview

1,125 Chapter Downloads

View Full Metrics

Abstract

Background: Remarkable progress has recently been made in the field of artificial intelligence (AI).Objective: We sought to investigate whether reinforcement learning could be used in surgery in the future.Methods: We created simple 2D tasks (Tasks 1–3) that mimicked surgery. We used a neural network library, Keras, for reinforcement learning. In Task 1, a Mac OS X with an 8 GB memory (MacBook Pro, Apple, USA) was used. In Tasks 2 and 3, a Ubuntu 14. 04LTS with a 26 GB memory (Google Compute Engine, Google, USA) was used.Results: In the task with a relatively small task area (Task 1), the simulated knife finally passed through all the target areas, and thus, the expected task was learned by AI. In contrast, in the task with a large task area (Task 2), a drastically increased amount of time was required, suggesting that learning was not achieved. Some improvement was observed when the CPU memory was expanded and inhibitory task areas were added (Task 3).Conclusions: We propose the combination of reinforcement learning and surgery. Application of reinforcement learning to surgery may become possible by setting rules, such as appropriate rewards and playable (operable) areas, in simulated tasks.

Keywords

  • hysterectomy
  • robotic surgical procedures
  • deep learning
  • artificial intelligence
  • deep Q network (DQN)
  • reinforcement learning

1. Introduction

Remarkable progress has recently been made in the field of artificial intelligence (AI), and new advancements are made almost every day [1, 2, 3]. Although it is not possible to discuss all developments in AI, such as image recognition, automatic driving or trials of the game of Go, a particularly promising AI task is game playing [4, 5, 6]. AI can be trained to play games, such as the game Breakout, and can perform as well as an experienced human subject. In the context of game play, reinforcement learning is a common strategy [6, 7]. For instance, the objective of the game Breakout is to break blocks to score points. While the performance of the player at the instance of breaking a block should not necessarily be evaluated, the performance immediately prior to successfully breaking a block, such as the player’s ability to direct the ball, should be evaluated. With this aim, the AI calculates the expected values of the scores (reward) that could be obtained by various actions and learns how to achieve the maximum reward via reinforcement learning [6]. Thus, the AI succeeds in breaking more blocks and obtaining higher scores.

The concept of reinforcement learning has a commonality with surgical procedures. A hysterectomy, which is one of the most basic gynecologic procedures [8], can be considered as an example. The aim of a hysterectomy is to enucleate the uterus; however, there are no specific guidelines regarding precisely how the knife should be used at the moment that the uterus is being removed. Rather, resecting the uterine arteries in advance, for instance, may reduce total blood loss, and this possibility should be evaluated. In addition, the resection of ligaments should also be evaluated, as this step precedes extraction of the uterus.

Altogether, we simulated and investigated whether reinforcement learning could be applied to surgery and whether AI could be possibly used to perform surgeries in the future.

In this study, we created simple 2D tasks that mimicked surgery (such as a hysterectomy simulation) and investigated whether the surgery could be performed as expected via reinforcement learning. During the task, the player (an imaginary knife) moved around the task areas. The player scored when passing target areas, such as imaginary ligaments and arteries, and lost points for other actions. The task was over when the player reached the uterus or moved outside the task area. We established rules and observed the process during which movement of the imaginary knife was learned and improved. In the task with a relatively small task area (Task 1), the knife finally passed all target areas, and the expected learning was achieved. In contrast, in the task with a large task area (Task 2), significantly more time was required to complete the task, suggesting that learning was not achieved. We addressed this problem by expanding the CPU memory and adding inhibitory task areas (Task 3), and some improvements were observed.

In this study, we applied the concept of reinforcement learning to surgical procedures and identified some commonalities between reinforcement learning and the way surgeons approach an operation. We found that various aspects of the techniques of efficient learning should be developed for applying reinforcement learning to surgery. These aspects include choosing a model that closely reproduces the surgical scene using a high-performance computer for deep learning and tuning of neural networks. Surgeons will need to understand the rules, such as when a reward can be earned and where the agent is allowed to move. Efficient learning would be possible by integrating these rules into the AI by engineers.

Advertisement

2. Materials and methods

2.1. 2D tasks

To create the 2D task, a publicly available game code for the game “Snake” was referenced and tuned (https://github.com/farizrahman4u/qlearning4k). The code for the operation task is provided in the S7 Text. The snake game was used for this investigation because it could be easily modified to the task we needed. Therefore, in particular, the original code should not necessarily be the snake game if one could create the task which mimics surgery.

2.2. Task 1

We created 9 × 9 squares of a simple 2D task representing a surgery (a hysterectomy) as shown in Figure 1A. The goal of the task is to remove the object (uterus, yellow); indirect goals, such as cutting ligaments and arteries, were also established. In the context of surgery, there are preferred cutting points that consider the densities of the arteries and the distance from the object, and these points were set as target points (green). Thus, the objective was for the tip of the knife (orange) to pass (cut) all the target points. In other words, if surgeons see the mass (tumors or uterus what so ever) to remove like yellow areas in Figure 1A, then they would cut the ligament (green areas) by using knife as drawing a circle, without approaching the area where the bleeding is expected to occur (red). Rewards, such as +1 for passing green areas, −1 for passing red areas, and 0 for passing blue areas (peritoneum), were defined. In addition, −1 was given and the task was defined as over if the knife passed over the yellow area (uterus) or if the knife passed outside the task area. Finally, −1 was given for passing areas were previously passed.

Figure 1.

Creation of task. (A) Rules of the task. The name and reward of each area were as follows: orange, tip of knife; dark blue, path where tip of knife had passed; red, ligament or artery, −1; green, target area, +1; yellow, uterus, −1 (task over); blue, peritoneum, 0; brown, out of area, −1 (task over). Starting point was located in the upper middle of area. (B) Results of learning. The results after 500 epochs of learning (upper) and 1500 epochs of learning (lower) are shown. The knife passed all the green points in the shortest amount of time after learning.

2.3. Task 2

We extended the task to 48 × 48 squares. This imposed significantly heavier burdens on the computer compared to Task 1, and we extended the CPU memory as described below.

2.4. Task 3

We added inhibitory areas (which would trigger the task to be over) to Task 2, and named it Task 3.

2.5. Deep Q network

We used a neural network library, Keras, for the reinforcement learning (https://keras.io) and TensorFlow as its backend (https://www.tensorflow.org) [9]. The codes for the agent can be found at https://github.com/farizrahman4u/qlearning4k.

The code simply obey the rules’ Q(S, a) = r + gamma * Q(S′, a′)’.

Q(S, a) means the maximum score the agent will get by the end of the game, if it does action a, when the game is in state S. On performing action a, the game will jump to a new state S′, giving the agent an immediate reward r.

In short, Q(S, a) = Immediate reward from this state + Max-score from the next state onwards.

Gamma is a discounting factor to give more weight to the immediate reward.

We are gynecologists and understand the very minimum of deep q learning, and we just tuned nb_frames (the number of frames should the agent remeber), batch_size (batch size for training), gamma (discounting factor) and nb_eopch (number of epochs to train). The original neural networks contained two convolutional layers and two dense layers. We slightly tuned the network by adding convolutional layers or changing the size of frames, but we did not see much improvement. Thus, the readers might find a more efficient way of learning. However, the objective of the present study is to propose the combination of reinforcement learning and surgery, and we did not investigate the tuning any further after obtaining conclusions.

2.6. Development environment

The development environments used in this study were Python 2.7.12, Keras 1.1.0, TensorFlow 0.8.0, and Matplotlib1.5.3. In Task 1, a Mac OS X with an Intel Core i5 processor and 8 GB memory (MacBook Pro, Apple, USA) was used. In Tasks 2 and 3, a Ubuntu 14. 04LTS with an Intel Xenon E5 v2 processor and 26 GB memory (Google Compute Engine, Google, USA) was used.

Advertisement

3. Results

3.1. Learning process and trials for appropriate learning

We created 9 × 9 squares of a simple 2D task representing the surgical scene (Figure 1A). We provide representative movies after 10 epochs, 500 epochs and 1500 epochs of learning (Figure 1B and Movies 1–3). The knife passed all the green points in the shortest amount of time after learning.

We next extended the task to 48 × 48 squares (Movies 4 and 5). This imposed significantly heavier burdens on the computer than in Task 1, and therefore, we expanded the CPU memory as described in the Section 2. The knife was located in limited areas after 500 epochs of learning (Movie 4), and the area of movement was extended after 1500 epochs of learning (Movie 5). Even after extending the CPU memory, the task required a long amount of time, and thus, learning was not necessarily achieved. Thus, we modified Task 2 to Task 3.

In Task 2, the knife moved to areas where no reward could be obtained (upper right area), which was hypothesized to prevent appropriate learning (Movie 5). We considered that some areas are not significant in an actual hysterectomy and added inhibitory areas (Task 3) that would trigger the task to end (dark blue areas in Movie 6). By adding these inhibitory areas, the knife reached the lower areas where it had never reached after 1500 epochs of learning (Movie 6). However, even with these limitations, a long time for learning was required within the simulated environments. These results suggest that efficient learning can be achieved by setting appropriate rules such as adding inhibitory areas. All video materials referenced in this section are available at: https://www.intechopen.com/download/index/process/160/authkey/f5524cabe6c92eff86119b2f0c58ba26

Advertisement

4. Discussion

In this study, we simulated and investigated whether reinforcement learning could be applied to surgery in the future and evaluated the types of hurdles that may exist.

Progress in AI has been remarkable, and AI is currently an essential part of the technology industry. Recent progress in AI can be attributed to the development of deep learning. Deep learning is a form of machine learning featured by the characteristic that the user does not have to choose features or representations while inputting data [10]. Deep learning is currently widely used in image recognition and audio recognition. While playing games with reinforcement learning has also been investigated, reinforcement learning has recently been combined with deep learning, resulting in drastic improvements in performance [4, 5, 6].

Thus, we sought to investigate whether reinforcement learning could be used in surgery in the future and developed appropriate simulations. Our team is composed of clinicians rather than engineers; however, we performed this study after studying deep learning and reinforcement learning.

In reinforcement learning, it is preferable to obtain a reward during the immediately following action rather than having expectations to obtain rewards in distant future actions [6]. This can be understood by the results of Task 1, in which the knife passed all the green areas in the shortest possible time, even though a rule never inhibited a more roundabout approach (Figure 1B). This can be expected to as a way to shorten the surgical time. Furthermore, choosing the shortest route was also of interest. The blue and red arrows in the circle in Figure 2 represent the shortest routes; the route of the blue arrows was chosen after several trials (Figure 1B). The method used to decide the next action in reinforcement learning is maximizing the expected values that would be obtained after subsequent actions [6]. Therefore, actions that have greater expected values are likely to be preferred. Therefore, we expected that the red arrows were not preferred because points would be deducted (red area) or the task would be ended (uterus) with subsequent actions. This approach is also considered superior in surgery. In other words, avoiding areas associated with point deductions would result in retaining the appropriate margins during surgery. Therefore, the characteristics of reinforcement learning seemed to be compatible with typical surgical approaches.

Figure 2.

Choice of the shortest route. Blue and red arrows in the circle represent the shortest routes. Only the route of the blue arrows was chosen after several trials.

The other results obtained were expected. It was concluded that reinforcement learning could solve a simple 2D task; 3D models that more closely replicate surgery should be further considered. In addition, increasing the playable area provides significantly more options for actions, and high-performance computers and tuning of neural networks will be needed for more complex tasks. 3D operation models are currently being developed and are used in practice for endoscopic surgery simulation [11, 12, 13, 14]. The speed at which high-performance computers are being developed is astonishing, and deep learning is being thoroughly investigated by engineers both in academia and business [15, 16, 17]. Combining the progress in both fields may provide designs allowing AI to realistically perform surgery. In such areas, clinicians can contribute the essential aspects of the surgery, that is, the playable task areas and the appropriate scores. Although increasing reward areas or limiting playable areas would speed learning time, such restrictions could also prevent the AI from finding alternative paths. Thus, clinicians and engineers should work cooperatively define rules.

In this study, we applied the concept of reinforcement learning to surgical procedures and identified common points between reinforcement learning and the way surgeons approach an operation. Reinforcement learning is now exclusively used and studied in the context of game play; however, it could also be applied to performing surgeries now that robotic surgery is widely available. Although there are many hurdles to overcome, AI could be applied to surgery by setting the appropriate rules, such as defining rewards and playable (operable) areas. To realize this goal, it is important for clinicians to further study deep learning and reinforcement learning strategies.

Advertisement

Acknowledgments

We greatly appreciate American Journal Experts for their generous help in editing the manuscript.

Advertisement

Conflict of interest

Authors declare there is no conflict of interests.

Advertisement

Supporting information

Movie 1. Result of Game 1 after 10 epochs of learning.

Movie 2. Result of Game 1 after 500 epochs of learning.

Movie 3. Result of Game 1 after 1500 epochs of learning.

Movie 4. Result of Game 2 after 500 epochs of learning.

Movie 5. Result of Game 2 after 1500 epochs of learning.

Movie 6. Result of Game 3 after 1500 epochs of learning.

https://www.intechopen.com/download/index/process/160/authkey/f5524cabe6c92eff86119b2f0c58ba26

S7 Text. The game codes for Game 1, 2 and 3 (python script). The code of agent.py, memory.py and game.py could be obtained in the link in Section 2.

References

  1. 1. Ravi D, Wong C, Deligianni F, Berthelot M, Andreu Perez J, Lo B, et al. Deep learning for health informatics. IEEE Journal of Biomedical and Health Informatics. Jan 2017;21(1):4-21. DOI: 10.1109/JBHI.2016.2636665. PubMed PMID: 28055930
  2. 2. Scholkopf B. Artificial intelligence: Learning to see and act. Nature. 2015;518(7540):486-487. DOI: 10.1038/518486a. Epub 2015/02/27, PubMed PMID: 25719660
  3. 3. Zhang YC, Kagen AC. Machine learning Interface for medical image analysis. Journal of Digital Imaging. Oct 2017;30(5):615-621. DOI: 10.1007/s10278-016-9910-0. PubMed PMID: 27730415
  4. 4. Gibney E. DeepMind algorithm beats people at classic video games. Nature. 2015;518(7540):465-466. DOI: 10.1038/518465a. Epub 2015/02/27, PubMed PMID: 25719643
  5. 5. Gibney E. Google AI algorithm masters ancient game of Go. Nature. 2016;529(7587):445-446. DOI: 10.1038/529445a. Epub 2016/01/29, PubMed PMID: 26819021
  6. 6. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529-533. DOI: 10.1038/nature14236. PubMed PMID: 25719670
  7. 7. Littman ML. Reinforcement learning improves behaviour from evaluative feedback. Nature. 2015;521(7553):445-451. DOI: 10.1038/nature14540. PubMed PMID: 26017443
  8. 8. Wallace SK, Fazzari MJ, Chen H, Cliby WA, Chalas E. Outcomes and postoperative complications after hysterectomies performed for Benign compared with malignant indications. Obstetrics and Gynecology. 2016;128(3):467-475. DOI: 10.1097/AOG.0000000000001591. PubMed PMID: 27500339
  9. 9. Rampasek L, Goldenberg A. TensorFlow: Biology’s gateway to deep learning? Cell Systems. 2016;2(1):12-14. DOI: 10.1016/j.cels.2016.01.009. PubMed PMID: 27136685
  10. 10. Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks. 2015;61:85-117. DOI: 10.1016/j.neunet.2014.09.003. PubMed PMID: 25462637
  11. 11. Beyer-Berjot L, Berdah S, Hashimoto DA, Darzi A, Aggarwal R. A virtual reality training curriculum for laparoscopic colorectal surgery. Journal of Surgical Education. 2016;73(6):932-941. DOI: 10.1016/j.jsurg.2016.05.012. Epub 2016/06/28; PubMed PMID: 27342755
  12. 12. Khan ZA, Kamal N, Hameed A, Mahmood A, Zainab R, Sadia B, et al. SmartSIM—A virtual reality simulator for laparoscopy training using a generic physics engine. The International Journal of Medical Robotics + Computer Assisted Surgery: MRCAS. 2016;16:437. DOI: 10.1002/rcs.1771. Epub 2016/09/28; PubMed PMID: 27671920
  13. 13. Li XL, Du DF, Jiang H. The learning curves of robotic and three-dimensional laparoscopic surgery in cervical cancer. Journal of Cancer. 2016;7(15):2304-2308. DOI: 10.7150/jca.16653. PubMed PMID: 27994668; PubMed Central PMCID: PMCPMC5166541
  14. 14. Romero-Loera S, Cárdenas-Lailson LE, de la Concha-Bermejillo F, Crisanto-Campos BA, Valenzuela-Salazar C, Moreno-Portillo M. Skills comparison using a 2D vs. 3D laparoscopic simulator. Cirugia y Cirujanos. 2016;84(1):37-44. DOI: 10.1016/j.circen.2015.12.012. (English Edition)
  15. 15. Kusy M, Zajdel R. Application of reinforcement learning algorithms for the adaptive computation of the smoothing parameter for probabilistic neural network. IEEE Transactions on Neural Networks and Learning Systems. 2015;26(9):2163-2175. DOI: 10.1109/TNNLS.2014.2376703. PubMed PMID: 25532211
  16. 16. Senda K, Hattori S, Hishinuma T, Kohda T. Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method. IEEE Transactions on Cybernetics. 2014;44(12):2696-2705. DOI: 10.1109/TCYB.2014.2313655. PubMed PMID: 24733037
  17. 17. Xu B, Yang C, Shi Z. Reinforcement learning output feedback NN control using deterministic learning technique. IEEE Transactions on Neural Networks and Learning Systems. 2014;25(3):635-641. DOI: 10.1109/TNNLS.2013.2292704. PubMed PMID: 24807456

Written By

Masakazu Sato, Kaori Koga, Tomoyuki Fujii and Yutaka Osuga

Submitted: 23 February 2018 Reviewed: 02 March 2018 Published: 27 June 2018