Background: Remarkable progress has recently been made in the field of artificial intelligence (AI).Objective: We sought to investigate whether reinforcement learning could be used in surgery in the future.Methods: We created simple 2D tasks (Tasks 1–3) that mimicked surgery. We used a neural network library, Keras, for reinforcement learning. In Task 1, a Mac OS X with an 8 GB memory (MacBook Pro, Apple, USA) was used. In Tasks 2 and 3, a Ubuntu 14. 04LTS with a 26 GB memory (Google Compute Engine, Google, USA) was used.Results: In the task with a relatively small task area (Task 1), the simulated knife finally passed through all the target areas, and thus, the expected task was learned by AI. In contrast, in the task with a large task area (Task 2), a drastically increased amount of time was required, suggesting that learning was not achieved. Some improvement was observed when the CPU memory was expanded and inhibitory task areas were added (Task 3).Conclusions: We propose the combination of reinforcement learning and surgery. Application of reinforcement learning to surgery may become possible by setting rules, such as appropriate rewards and playable (operable) areas, in simulated tasks.
- robotic surgical procedures
- deep learning
- artificial intelligence
- deep Q network (DQN)
- reinforcement learning
Remarkable progress has recently been made in the field of artificial intelligence (AI), and new advancements are made almost every day [1, 2, 3]. Although it is not possible to discuss all developments in AI, such as image recognition, automatic driving or trials of the game of Go, a particularly promising AI task is game playing [4, 5, 6]. AI can be trained to play games, such as the game Breakout, and can perform as well as an experienced human subject. In the context of game play, reinforcement learning is a common strategy [6, 7]. For instance, the objective of the game Breakout is to break blocks to score points. While the performance of the player at the instance of breaking a block should not necessarily be evaluated, the performance immediately prior to successfully breaking a block, such as the player’s ability to direct the ball, should be evaluated. With this aim, the AI calculates the expected values of the scores (reward) that could be obtained by various actions and learns how to achieve the maximum reward via reinforcement learning . Thus, the AI succeeds in breaking more blocks and obtaining higher scores.
The concept of reinforcement learning has a commonality with surgical procedures. A hysterectomy, which is one of the most basic gynecologic procedures , can be considered as an example. The aim of a hysterectomy is to enucleate the uterus; however, there are no specific guidelines regarding precisely how the knife should be used at the moment that the uterus is being removed. Rather, resecting the uterine arteries in advance, for instance, may reduce total blood loss, and this possibility should be evaluated. In addition, the resection of ligaments should also be evaluated, as this step precedes extraction of the uterus.
Altogether, we simulated and investigated whether reinforcement learning could be applied to surgery and whether AI could be possibly used to perform surgeries in the future.
In this study, we created simple 2D tasks that mimicked surgery (such as a hysterectomy simulation) and investigated whether the surgery could be performed as expected via reinforcement learning. During the task, the player (an imaginary knife) moved around the task areas. The player scored when passing target areas, such as imaginary ligaments and arteries, and lost points for other actions. The task was over when the player reached the uterus or moved outside the task area. We established rules and observed the process during which movement of the imaginary knife was learned and improved. In the task with a relatively small task area (Task 1), the knife finally passed all target areas, and the expected learning was achieved. In contrast, in the task with a large task area (Task 2), significantly more time was required to complete the task, suggesting that learning was not achieved. We addressed this problem by expanding the CPU memory and adding inhibitory task areas (Task 3), and some improvements were observed.
In this study, we applied the concept of reinforcement learning to surgical procedures and identified some commonalities between reinforcement learning and the way surgeons approach an operation. We found that various aspects of the techniques of efficient learning should be developed for applying reinforcement learning to surgery. These aspects include choosing a model that closely reproduces the surgical scene using a high-performance computer for deep learning and tuning of neural networks. Surgeons will need to understand the rules, such as when a reward can be earned and where the agent is allowed to move. Efficient learning would be possible by integrating these rules into the AI by engineers.
2. Materials and methods
2.1. 2D tasks
To create the 2D task, a publicly available game code for the game “Snake” was referenced and tuned (
2.2. Task 1
We created 9 × 9 squares of a simple 2D task representing a surgery (a hysterectomy) as shown in Figure 1A. The goal of the task is to remove the object (uterus, yellow); indirect goals, such as cutting ligaments and arteries, were also established. In the context of surgery, there are preferred cutting points that consider the densities of the arteries and the distance from the object, and these points were set as target points (green). Thus, the objective was for the tip of the knife (orange) to pass (cut) all the target points. In other words, if surgeons see the mass (tumors or uterus what so ever) to remove like yellow areas in Figure 1A, then they would cut the ligament (green areas) by using knife as drawing a circle, without approaching the area where the bleeding is expected to occur (red). Rewards, such as +1 for passing green areas, −1 for passing red areas, and 0 for passing blue areas (peritoneum), were defined. In addition, −1 was given and the task was defined as over if the knife passed over the yellow area (uterus) or if the knife passed outside the task area. Finally, −1 was given for passing areas were previously passed.
2.3. Task 2
We extended the task to 48 × 48 squares. This imposed significantly heavier burdens on the computer compared to Task 1, and we extended the CPU memory as described below.
2.4. Task 3
We added inhibitory areas (which would trigger the task to be over) to Task 2, and named it Task 3.
2.5. Deep Q network
We used a neural network library, Keras, for the reinforcement learning (
The code simply obey the rules’ Q(S, a) = r + gamma * Q(S′, a′)’.
Q(S, a) means the maximum score the agent will get by the end of the game, if it does action a, when the game is in state S. On performing action a, the game will jump to a new state S′, giving the agent an immediate reward r.
In short, Q(S, a) = Immediate reward from this state + Max-score from the next state onwards.
Gamma is a discounting factor to give more weight to the immediate reward.
We are gynecologists and understand the very minimum of deep q learning, and we just tuned nb_frames (the number of frames should the agent remeber), batch_size (batch size for training), gamma (discounting factor) and nb_eopch (number of epochs to train). The original neural networks contained two convolutional layers and two dense layers. We slightly tuned the network by adding convolutional layers or changing the size of frames, but we did not see much improvement. Thus, the readers might find a more efficient way of learning. However, the objective of the present study is to propose the combination of reinforcement learning and surgery, and we did not investigate the tuning any further after obtaining conclusions.
2.6. Development environment
The development environments used in this study were Python 2.7.12, Keras 1.1.0, TensorFlow 0.8.0, and Matplotlib1.5.3. In Task 1, a Mac OS X with an Intel Core i5 processor and 8 GB memory (MacBook Pro, Apple, USA) was used. In Tasks 2 and 3, a Ubuntu 14. 04LTS with an Intel Xenon E5 v2 processor and 26 GB memory (Google Compute Engine, Google, USA) was used.
3.1. Learning process and trials for appropriate learning
We created 9 × 9 squares of a simple 2D task representing the surgical scene (Figure 1A). We provide representative movies after 10 epochs, 500 epochs and 1500 epochs of learning (Figure 1B and Movies 1–3). The knife passed all the green points in the shortest amount of time after learning.
We next extended the task to 48 × 48 squares (Movies 4 and 5). This imposed significantly heavier burdens on the computer than in Task 1, and therefore, we expanded the CPU memory as described in the Section 2. The knife was located in limited areas after 500 epochs of learning (Movie 4), and the area of movement was extended after 1500 epochs of learning (Movie 5). Even after extending the CPU memory, the task required a long amount of time, and thus, learning was not necessarily achieved. Thus, we modified Task 2 to Task 3.
In Task 2, the knife moved to areas where no reward could be obtained (upper right area), which was hypothesized to prevent appropriate learning (Movie 5). We considered that some areas are not significant in an actual hysterectomy and added inhibitory areas (Task 3) that would trigger the task to end (dark blue areas in Movie 6). By adding these inhibitory areas, the knife reached the lower areas where it had never reached after 1500 epochs of learning (Movie 6). However, even with these limitations, a long time for learning was required within the simulated environments. These results suggest that efficient learning can be achieved by setting appropriate rules such as adding inhibitory areas. All video materials referenced in this section are available at:
In this study, we simulated and investigated whether reinforcement learning could be applied to surgery in the future and evaluated the types of hurdles that may exist.
Progress in AI has been remarkable, and AI is currently an essential part of the technology industry. Recent progress in AI can be attributed to the development of deep learning. Deep learning is a form of machine learning featured by the characteristic that the user does not have to choose features or representations while inputting data . Deep learning is currently widely used in image recognition and audio recognition. While playing games with reinforcement learning has also been investigated, reinforcement learning has recently been combined with deep learning, resulting in drastic improvements in performance [4, 5, 6].
Thus, we sought to investigate whether reinforcement learning could be used in surgery in the future and developed appropriate simulations. Our team is composed of clinicians rather than engineers; however, we performed this study after studying deep learning and reinforcement learning.
In reinforcement learning, it is preferable to obtain a reward during the immediately following action rather than having expectations to obtain rewards in distant future actions . This can be understood by the results of Task 1, in which the knife passed all the green areas in the shortest possible time, even though a rule never inhibited a more roundabout approach (Figure 1B). This can be expected to as a way to shorten the surgical time. Furthermore, choosing the shortest route was also of interest. The blue and red arrows in the circle in Figure 2 represent the shortest routes; the route of the blue arrows was chosen after several trials (Figure 1B). The method used to decide the next action in reinforcement learning is maximizing the expected values that would be obtained after subsequent actions . Therefore, actions that have greater expected values are likely to be preferred. Therefore, we expected that the red arrows were not preferred because points would be deducted (red area) or the task would be ended (uterus) with subsequent actions. This approach is also considered superior in surgery. In other words, avoiding areas associated with point deductions would result in retaining the appropriate margins during surgery. Therefore, the characteristics of reinforcement learning seemed to be compatible with typical surgical approaches.
The other results obtained were expected. It was concluded that reinforcement learning could solve a simple 2D task; 3D models that more closely replicate surgery should be further considered. In addition, increasing the playable area provides significantly more options for actions, and high-performance computers and tuning of neural networks will be needed for more complex tasks. 3D operation models are currently being developed and are used in practice for endoscopic surgery simulation [11, 12, 13, 14]. The speed at which high-performance computers are being developed is astonishing, and deep learning is being thoroughly investigated by engineers both in academia and business [15, 16, 17]. Combining the progress in both fields may provide designs allowing AI to realistically perform surgery. In such areas, clinicians can contribute the essential aspects of the surgery, that is, the playable task areas and the appropriate scores. Although increasing reward areas or limiting playable areas would speed learning time, such restrictions could also prevent the AI from finding alternative paths. Thus, clinicians and engineers should work cooperatively define rules.
In this study, we applied the concept of reinforcement learning to surgical procedures and identified common points between reinforcement learning and the way surgeons approach an operation. Reinforcement learning is now exclusively used and studied in the context of game play; however, it could also be applied to performing surgeries now that robotic surgery is widely available. Although there are many hurdles to overcome, AI could be applied to surgery by setting the appropriate rules, such as defining rewards and playable (operable) areas. To realize this goal, it is important for clinicians to further study deep learning and reinforcement learning strategies.
We greatly appreciate American Journal Experts for their generous help in editing the manuscript.
Conflict of interest
Authors declare there is no conflict of interests.
Movie 1. Result of Game 1 after 10 epochs of learning.
Movie 2. Result of Game 1 after 500 epochs of learning.
Movie 3. Result of Game 1 after 1500 epochs of learning.
Movie 4. Result of Game 2 after 500 epochs of learning.
Movie 5. Result of Game 2 after 1500 epochs of learning.
Movie 6. Result of Game 3 after 1500 epochs of learning.
S7 Text. The game codes for Game 1, 2 and 3 (python script). The code of agent.py, memory.py and game.py could be obtained in the link in Section 2.