Open access peer-reviewed chapter

Learning Robotic Ultrasound Skills from Human Demonstrations

Written By

Miao Li and Xutian Deng

Reviewed: 26 April 2022 Published: 31 May 2022

DOI: 10.5772/intechopen.105069

From the Edited Volume

Cognitive Robotics and Adaptive Behaviors

Edited by Maki K. Habib

Chapter metrics overview

170 Chapter Downloads

View Full Metrics


Robotic ultrasound system plays a vital role in assisting or even replacing sonographers in some cases. However, modeling and learning ultrasound skills from professional sonographers are still challenging tasks that hinder the development of ultrasound systems’ autonomy. To solve these problems, we propose a learning-based framework to acquire ultrasound scanning skills from human demonstrations1. First, ultrasound scanning skills are encapsulated into a high-dimensional multi-modal model, which takes ultrasound images, probe pose, and contact force into account. The model’s parameters can be learned from clinical ultrasound data demonstrated by professional sonographers. Second, the target function of autonomous ultrasound examinations is proposed, which can be solved roughly by the sampling-based strategy. The sonographers’ ultrasound skills can be represented by approximating the limit of the target function. Finally, the robustness of the proposed framework is validated with the experiments on ground-true data from sonographers.


  • robotic ultrasound
  • robotic skills learning
  • learning from demonstrations
  • compliant manipulation
  • multi-modal prediction

1. Introduction

Ultrasound imaging technology is widely used in clinical diagnosis due to its noninvasive, low-hazard, real-time imaging, relative safety, and low cost. Nowadays, ultrasound imaging can quickly detect diseases of different anatomical structures, including liver [1], gallbladder [2], bile duct [3], spleen [4], pancreas [5], kidney [6], adrenal gland [7], bladder [8], prostate [9], and thyroid [10]. Besides, during the global pandemic caused by COVID-19, ultrasound is largely used for the diagnosis of infected persons by detecting pleural effusion [11, 12]. However, the performance of ultrasound examination is highly dependent on the ultrasound skills of sonographers, in terms of ultrasound images, probe pose, and contact force (Figure 1). In general, the training of an eligible sonographer requires a relatively large amount of time and cases [13, 14]. In addition, the high-intensity repetitive scanning process causes a heavy burden on sonographers’ physical condition, further leading to the scarcity of ultrasound practitioners.

Figure 1.

The medical ultrasound examination (as left figure shown) needs the dexterous manipulation of ultrasound probe (as right figure shown), which is caused by the environmental complexity in terms of ultrasound images, probe pose and contact force. (a) Clinical medical ultrasound examination. (b) Ultrasound probe.

To address these issues, many previous studies in robotics have attempted to use robots to help or even replace sonographers [15, 16, 17]. According to the extent of the system autonomy, robotic ultrasound can be categorized into three levels—teleoperated, semi-autonomous, and full-autonomous. A teleoperated robotic ultrasound system usually contains two main parts—teacher site and student site [18, 19, 20]. The motion of the student robot is completely determined by the teacher, usually a trained sonographer, through different kinds of interaction devices, including a 3D space mouse [18], inertial measurement unit (IMU) handle [20, 21], and haptic interface [21]. While for a semi-autonomous robotic ultrasound system, the motion of the student robot is only partly determined by the teacher [22, 23, 24].

For a full-autonomous robotic ultrasound system, the student robot is supposed to perform the whole process of local ultrasound scanning by itself [25, 26, 27] and the teacher robot is only used for emergencies. Until today, only part full-autonomous robotic ultrasound system has been reported in the literature [28, 29]. These robotic ultrasound systems usually focus on the scanning of certain anatomical structures, such as the abdomen [28], thyroid [26], and vertebra [29]. A comprehensive survey on robotic ultrasound is given in Table 1. Despite these achievements, there are still many obstacles to the development of the robotic ultrasound system. For example, the robustness of most systems is poor and some preparations are required before performing the examination. The key is that there is not a high-dimensional model to learn ultrasound skills (Figure 2) from the sonographer, further to guide the adjustment of the ultrasound probe.

PaperAutonomy degreeSpecific targetModalityGuidancePublication year
[18]teleoperatednoforce, orientation, positionhuman2015
[19]teleoperatednoforce, orientation, positionhuman2016
[20]teleoperatednoforce, orientation, positionhuman2017
[21]teleoperatednoforce, orientation, positionhuman2020
[22]semi-autonomousnoforce, orientation, position, elastogramelastogram, human2017
[23]semi-autonomousnoforce, orientation, position, visionCNN, human2019
[24]semi-autonomousyesforce, orientation, positiontrajectory, human2019
[30]semi-autonomousyesforce, orientation, position, imageCNN, human2020
[25]full-autonomousyesforce, orientation, position, vision, image, MRIvision, MRI, confidence map2016
[26]full-autonomousyesforce, orientation, position, imageSVM2017
[27]full-autonomousnoforce, orientation, position, visionvision2018
[28]full-autonomousyesforce, orientation, position, vision, MRIvision, MRI2016
[29]full-autonomousyesforce, position, visionRL2021

Table 1.

A brief summary of robotic ultrasound. Initials: Convolutional neural network (CNN), magnetic resonance imaging (MRI), support vector machine (SVM), reinforcement learning (RL).

Figure 2.

The feedback information from three different modalities during a free-hand ultrasound scanning process. The first row represents ultrasound images. The second row represents the contact force in the z-axis between the probe and the skin, collected using a six-dimensional force/torque sensor. The third row represents the probe pose, which is collected using an inertial measurement unit (IMU).

In this chapter, we proposed a learning-based approach to represent and learn ultrasound skills from sonographers’ demonstrations, and further guide the scanning process [31]. During the learning process, the ultrasound images together with the relevant scanning variables (the probe pose and the contact force) are recorded and encapsulated into a high-dimensional model. Then, we leverage the power of deep learning to implicitly capture the relation between the quality of ultrasound images and scanning skills. During the execution stage, the learned model is used to evaluate the current quality of the ultrasound image. To obtain a high-quality ultrasound image, a sampling-based approach is used to adjust the probe motion.

The main contribution of this chapter is two-fold: 1. A multi-modal model of ultrasound scanning skills is proposed and learned from human demonstrations, which takes ultrasound images, the probe pose, and the contact force into account. 2. Based on the learned model, a sampling-based strategy is proposed to adjust the ultrasound scanning process, to obtain a high-quality ultrasound image. Note that the goal of this chapter is to offer a learning-based framework to understand and acquire ultrasound skills from human demonstrations [31]. However, it is obvious that the learned model can be ported into a robot system as well, which is our work for the next step [32].

This chapter is organized as follows. Section II presents related work in the field of ultrasound images and ultrasound scanning guidance. Section III provides the methodology of our model, including the learning process of task representation, the data acquisition process through human demonstrations, and the strategy for scanning guidance during real-time execution. Section IV describes the detailed experimental validation, with a final discussion and conclusion in Section V.


2. Related work

2.1 Ultrasound images evaluation

The goal of the ultrasound image evaluation is to understand images in terms of classification [33], segmenting [34], recognition [35], etc. With the rise of deep learning, many studies have attempted to process ultrasound images with the help of neural networks.

Liu et al. have summarized the extensive research results on ultrasound image processing with different network structures, including convolution neural network (CNN), recurrent neural network (RNN), auto-encoder network (AE), restricted Boltzmann’s machine (RBM), and deep belief network (DBN) [36]. From the perspective of applications, Sridar et al. have employed CNN for the main plane classification in fetal ultrasound images, considering both local and global features of the ultrasound images [37]. To judge the severity of patients, Roy et al. have collected ultrasound images of the COVID-19 patient’s lesions to train a spatial transformer network [38]. Deep learning is also adopted in the task of segmenting thyroid nodules from real-time ultrasound images [39]. While deep learning provides a superior framework to understand ultrasound images, it generally requires a large number of expert-labeled data, which can be difficult and expensive to collect.

Confidence map provides an alternative method in ultrasound image processing [40]. The confidence map is obtained through pixel-wise confidence estimation using a random walk algorithm. Chatelain et al. have devised a control law based on the ultrasound confidence map [41, 42], with the goal to adjust the in-plane rotation and motion of the probe. Confidence map is also employed to automatically determine the proper parameters for ultrasound scanning [25]. Furthermore, the advantages of the confidence maps have been demonstrated by combining with position control and force control to perform automatic position and pressure maintenance [43]. However, a confidence map is proposed with the hand-coded rules, which can not be directly used to guild the scanning motion.

2.2 Learning of the ultrasound scanning skills

While the goal of ultrasound image processing is to understand images, learning ultrasound scanning skills aims to obtain high-quality ultrasound images through the adjustment of the scanning operation. Droste et al. have used a clamping device with IMU to obtain the relation between the probe pose and the ultrasound images during ultrasound examination [44]. Li et al. have built a simulation environment based on 3D ultrasound data acquired by a robot arm mounted with an ultrasound probe [45]. However, they did not explicitly learn ultrasound scanning skills. Instead, a reinforcement learning framework is adopted to optimize the confidence map of ultrasound images, by adapting the movement of the ultrasound probe. All of the above-mentioned work only take the pose and the position of the probe as input, while in this chapter, the contact force between the probe and humans is also encoded, which is considered as a crucial factor during the ultrasound scanning process [46].

For the learning of force-relevant skills, a great variety of previous studies in robotic manipulation focused on learning the relation between force information and other task-related variables, such as the position and velocity [47], the surface electromyography [48], the task states and constraints [49], and the desired impedance [50, 51, 52]. A multi-modal representation method for contact-rich tasks has been proposed in ref. [53] to encode the concurrent feedback information from vision and touch. The method was learned through self-supervision, which can be further exploited to improve the sampling efficiency and the task success rate. To the best of our knowledge, for a multi-modal manipulation task, including feedback information from ultrasound, force, and motion, this is the first work to learn the task representation and the corresponding manipulation skills from human demonstrations.


3. Problem statement and method

Our goal is to learn free-hand ultrasound scanning skills from human demonstrations. We want to evaluate the multi-modal task quality of combining multiple sensory information, including ultrasound images, the probe pose, and the contact force, with the goal to extract skills from the task representation and even transferring skills across tasks. We formulate the multisensory data by a neural network, where the parameters are trained by the data supervised by human ultrasound experts. In this section, we will discuss the learning process of the task representation, the data collection procedure, and the online ultrasound scanning guidance respectively.

3.1 Learning of ultrasound task representation

For a free-hand ultrasound scanning task, three types of sensory feedback are available—ultrasound images from the ultrasound machine, force feedback from a mounted F/T sensor, and the probe pose from a mounted IMU. To encapsulate the heterogeneous nature of this sensory data, we propose a domain-specific encoder to model the task, as shown in Figure 3. For the ultrasound imaging feedback, we use a VGG-16 network to encode the 224×224×3 RGB images and yield a 128-d feature vector. For the force and pose feedback, we encode them with a four-layer fully connected neural network to produce a 128-d feature vector. The resulting two feature vectors are concatenated together into one 256-d vector and connected with a one-layer fully connected network to yield a 128-d feature vector as the task feature vector. The multi-modal task representation is a neural network model denoted by Ωθ, where the parameters are trained as described in the following section.

Figure 3.

The multi-modal task learning architecture with human annotations. The network takes data from three different sensors as input—The ultrasound images, force/torque (F/T), and the pose information. The data for the task learning is acquired through human demonstrations, where the ultrasound quality is evaluated by sonographers. With the trained network, the multi-modal task can be represented as a high-dimensional vector.

3.2 Data collection via human demonstration

The multi-modal model as shown in Figure 3 has a large number of learnable parameters. To obtain the training data, we design a procedure to collect the ultrasound scanning data from human demonstrations, as shown in Figure 4. A novel probe holder is designed with intrinsically mounted sensors such as IMU and F/T sensors. A sonographer is performing the ultrasound scanning process with the probe, and the data collected during the scanning process is described as follows:

  • D=SiPiFii=1N denotes a dataset with N observations.

  • SiR224×224×3 denotes the i-th collected ultrasound image with cropped size.

  • PiR4 denotes the probe pose in terms of quaternion.

  • FiR6 denotes the i-th contact force/torque between the probe and the human skin.

Figure 4.

The ultrasound scanning data collected from human demonstrations. The sonographer is performing an ultrasound scanning with a specifically designed probe holder. The sensory feedback during the scanning process is recorded, including the ultrasound images from an ultrasound machine, the contact force and torque from a 6D F/T sensor, and the probe pose from an IMU sensor.

For each recorded data in the dataset D, the quality of the obtained ultrasound image is evaluated by three sonographers and labeled with 1/0. 1 stands for a good ultrasound image while 0 corresponds to an unacceptable ultrasound image. With the recorded data and the human annotations, the model Ωθ is trained with a loss function of cross-entropy. During training, we minimize the loss function with stochastic gradient descent. Once trained, this network produces a 128-d feature vector and evaluates the quality of the task at the same time. Given the task representation model Ωθ, an online adaptation strategy is proposed to improve the task quality by leveraging the multi-modal sensory feedback, as discussed in the next section.

3.3 Ultrasound skill learning

As discussed in related work, it is still challenging to model and plan complex force-relevant tasks, mainly due to the inaccurate state estimation and the lack of a dynamics model. In our case, it is difficult to explicitly model the relations among ultrasound images, the probe pose, and the contact force. Therefore, we formulate the policy of ultrasound skills as a model-free reinforcement learning problem, and the target function is as follows:

maxmizeP,FQθ=fSPFΩθsubject toPDP,FDF,Fz0.E1

where Qθ denotes the quality of the task, which is computed using the learned model Ωθ by passing through the sensory feedback S,P,F. The constraint Fz0 means that the contact force along the normal direction should be positive. DP and DF denote feasible sets of the probe pose and the contact force, respectively. In our case, these two feasible sets are determined by human demonstrations. However, it is worth mentioning that other task-specific constraints for the pose and the contact force can also be adopted here.

By choosing model-free, it requires no prior knowledge of the dynamics model of the ultrasound scanning process, namely the transition probabilities from one state (current ultrasound image) to another (next ultrasound image). More specifically, we choose Monte Carlo policy optimization [54], where the potential actions are sampled and selected directly from previous demonstrated experience, as shown in Figure 5. For the sampling, we impose a bound between Pt, Ft and Pt, Ft, which prevents the next state from moving too far away from the current state. If the new state <Pt,Ft,St> is evaluated by the task quality function Qθ as good, thus the desired pose Pt and contact force Ft are used as a goal for the human ultrasound scanning guidance. Otherwise, new Pt and Ft are sampled from the previous demonstrated experience. This process repeats N times, and the Pt, Ft with the best task quality, is chosen as the final goal for the human scanning guidance. Note that this sampling-based approach does not guarantee the global optimality of Eq. 1. However, this is sufficient for human ultrasound scanning guidance because the final goal is only required to be updated at a relatively low frequency.

Figure 5.

Our strategy for scanning guidance takes the current pose Pt, the contact force Ft, and the ultrasound image St as input, and outputs the next desired pose Pt and contact force Ft. For sampling, we impose a bound between Pt, Ft, and Pt, Ft, which prevents the next state from moving too far away from the current state. For evaluation, if the sampled pose and force are predicted as high-quality according to Eq. 1, the skill-learned model will select them as desired output, otherwise, it will repeat the sampling process. For execution, the desired pose Pt and contact force Ft are used as the goal for the human ultrasound scanning guidance.


4. Experiments: design and results

In this section, we use real experiments to examine the effectiveness of our proposed approach to multi-modal task representation learning. In particular, we design experiments to verify the following two questions:

  • Does the force modality contribute to task representation learning?

  • Is the sampling-based policy effective for real data?

4.1 Experiments setup

For the experimental setup, we used a Mindray DC-70 ultrasound machine with an imaging frame rate of 900 Hz. The ultrasound image was captured using MAGEWELL USB Capture AIO with a frame rate of 120 Hz and a resolution of 2048×2160, as shown in Figure 6.

Figure 6.

Experiments setup. (a) the ultrasound machine – Mindray DC-70. (b) the video capture device – MAGEWELL USB capture AIO. (c) Data-acquisition probe holder. (d) the computer for data collection with Intel i5 CPU and Nvidia GTX 1650 GPU, Ubuntu16.04 LTS.

As shown in Figure 4, the IMU mounted on the ultrasound probe was ICM20948 and the microcontroller unit (MCU) was STM32F411. The highest frequency of IMU could reach 200 Hz, with an acceleration accuracy of 0.02 g and a gyroscope accuracy of 0.06/s. The IMU could output the probe pose in the forms of quaternion. For the force feedback, we used a 6D ATI Gamma F/T sensor with a maximum frequency of 7000 Hz. The computer used for the data collection was with Intel i5 CPU and Nvidia GTX 1650 GPU, and with the operating system of Ubuntu16.04 LTS and ROS Kinetic.

4.2 Data acquisition

To make collected data comparable, the recording program needs to implement two functions—coordinate transformation and gravity compensation. The IMU will start to work as soon as the power is turned on. At that time, the probe pose corresponds to the initial coordinate system, so the quaternion’s values are equal to (1, 0, 0, 0) and the rotation matrix is the identity matrix. However, it will take some time from the wiring of the whole system to recording data, that is, the quaternion’s values at the beginning of recording are never equal to the initial ones. To solve this problem, the coordinate transformation is necessary so that the original pose corresponds to the initial coordinate system. Besides, the force/torque signal contains the contact force with the device’s gravity, which means our program should have the function of gravity compensation.

The real-time quaternion Q output by the IMU includes four values (w,x,y,z), which should be transformed into a real-time rotation matrix R for calculation. The initial rotation matrix is recorded as R0. As the rotation matrix is always orthogonal, the inverse and transpose of R0 are equal and recorded as R01. The relative real-time rotation matrix Rx is calculated as follows:


The gravity components Gx,Gy,Gz in X,Y,Z directions are calculated by Rx and gravity G, as follows:


In this experiment, we mainly consider the influence of force, so simply record original values of torque. The force/torque sensor’s output signal contains real-time force components Fx,Fy,Fz and torque components Tx,Ty,Tz in three directions. The fixed values Fx,Fy,Fz,Tx,Ty,Tz are calculated, as follows:


It is worth noting that gravity G can be calculated by Eq. 6, where the maximum and minimum values of force components in three directions are denoted by Fxmax,Fxmin,Fymax,Fymin,Fzmax,Fzmin.


The recording frequency is 10 Hz and the accuracy of gravity compensation is 0.5 N. The ultrasound data were collected at the Hospital of Wuhan University. The sonographer was asked to scan the left kidneys of 5 volunteers with different physical conditions. Before the examination, the sonographer vertically held the probe above the left kidney of a volunteer. The ultrasound scanning process began with the recording program launched. The snapshots for the scanning process are shown in Figure 7. The collected data consists of ultrasound videos, the probe pose (quaternion), the contact force (force and torque), and labels (1/0). In total, there are 5995 samples of data. The number of positive samples (labeled 1) is 2266, accounting for 37.8%. The number of negative samples (labeled 0) is 3729, accounting for 62.2%. Figure 8 presents trajectories of the recorded information.

Figure 7.

The snapshots of human ultrasound scanning demonstrations and samples of the obtained ultrasound images. Here the images (e) and (f) are labeled as good quality while (g) and (h) are labeled as bad quality.

Figure 8.

The trajectories of the recorded force and pose during an ultrasound examination. Force component in (a) X direction (b) Y direction (c) Z direction; rotation axis: (d) X Axis (e) Y Axis (f) Z Axis.

4.3 Experimental results

The detailed architecture of our network is shown in Figure 9. In this case, the 256-dimensional vector denotes the feature vector presented in Figure 3. We started the training process with a warm start to classify the ultrasound images. The adopted neural network was VGG-16 with cross-entropy loss. A totla of 5995 sets of recorded data were divided into 8:2 for training and validation. Data for training included ultrasound images and labels. The learning rate was 0.001 and the batch size was 20. For the ultrasound skill evaluation, data for training included images S, quaternion P, force F, and labels. By inputting P,F,S, this neural network would output predicted label. We fixed channels of the last fully connected layer in VGG-16 to 128 channels and merged it with PF feature vector. Four fully connected layers were added to transform PF vector into 128 channels, which were concatenated with VGG-16 output vector. After getting the vector with 256 channels, two fully connected layers and a softmax layer were added to output the confidence of the label. Figure 10 presents accuracy and loss in training neural networks. The neural network for classification finally reached an accuracy of 96.89% and 95.61% in training and validation. The neural network for ultrasound skill evaluation finally reached an accuracy of 84.85% and 88.50% in training and validation.

Figure 9.

Framework of the neural network. The ultrasound images were encoded with VGG-16. Four fully connected layers were added to transform PF the vector into 128 channels. Vectors from S and PF were concatenated. Two fully connected layers were added to transform concatenated vector’s channels from 256 to 2. Finally, the softmax layer would map the last values to the probability of label 1 or 0.

Figure 10.

(a) Accuracy and (b) loss in training the neural network for ultrasound image classification. (c) Accuracy and (d) loss in training the neural network for ultrasound skills evaluation.

To confirm the correlation between P and F, we divided data into different levels for training of four networks with different input ports. Net1 was trained with S and P, while Net2 was trained with S and F. Net3 was trained with S, P, and F with two parallel four-layer fully connected neural networks for inputting P and F. Net4 (Figure 9) was trained with S, P, and F, with concatenated PF vectors. The main difference between Net3 and Net4 was the existence of interactions between P and F during the training process. Each network had been trained five times with 20 training epochs. Figure 11 presents the performance of four networks in validation.

Figure 11.

Accuracy of four networks in validation. Net1 was trained with S and P. Net2 was trained with S and F. Net3 was trained with S, P, and F, without interaction between P and F. Net4 was trained with S, P, and F, with the interaction between P and F.

Online ultrasound scanning skill guidance: We selected some continuous data streams from the dataset for verification, which had not been used for training the neural network. The sampling process in Figure 5 was repeated 1000 times and the actions P,F with the best task quality were selected as the next desired action. The whole process took 3 to 5 seconds to output the desired action.

Figure 12 presents predicted results about components of contact force, compared with ground truth data. Figure 13 presents the predicted probe pose with corresponding ultrasound images. Figure 14 presents predicted and true probe poses with corresponding ultrasound images.

Figure 12.

Predicted force’s component in (a) X-axis direction. (b) Y-axis direction. (c) Z-axis direction.

Figure 13.

Predicted probe pose and corresponding ultrasound images. The confidence is the probability of label 1.

Figure 14.

Predicted and true probe pose, with corresponding ultrasound images. The confidence is the probability of label 1.


5. Discussion and conclusion

5.1 Discussion

This chapter provides a general approach to realizing autonomous ultrasound guidance with some merits as follows: (1) The clinical ultrasound skills are considered as a multi-modal model without any unique factor or parameter, namely, it could be used in most robotic ultrasound systems. (2) The ultrasound skills are mapped into low-dimensional vectors, which makes our approach more flexible with other machine learning methods, such as support vector machine, Gaussian mixture model, and k-nearest neighbors algorithm. (3) The autonomous ultrasound examinations are defined as roughly solving the proposed target function by Monte Carlo method, which provides a newborn and robust method to fulfill autonomous ultrasound.

There are some limitations in this chapter. First, the online guidance method is based on random sampling, which leads to a certain degree of randomness. Therefore, there is a certain difference between forecast results and true values in the short term. Second, to ensure the effectiveness of the sampling, a large number of samples are required, which means a higher task quality improvement would require more computation cost. With the expedition of the dataset, this method is difficult to meet the requirement of timely guidance, which can be solved by denoting the feasible set as a probabilistic model to acquire better sampling efficiency. Finally, we believe that through detailed adjustments to the neural network, the efficiency of this model has the opportunity to be greatly improved without losing too much accuracy.


6. Conclusion

This chapter presents a framework for learning ultrasound scanning skills from human demonstrations. By analyzing the scanning process of sonographers, we define the entire scanning process as a multi-modal model of interactions between ultrasound images, the probe pose, and the contact force. A deep-learning-based method is proposed to learn ultrasound scanning skills, from which the skill-representing target function with a sampling-based strategy for ultrasound examination guidance is proposed. Experimental results show that this framework for ultrasound scanning guidance is robust, and presents the possibility of developing a real-time learning guidance system. In future work, we will speed up the prediction process by taking advantage of self-supervision, with the goal to port the learned guidance model into a real robot system.



This work was supported by Suzhou key industrial technology innovation project (SYG202121), and the Natural Science Foundation of Jiangsu Province (Grant No. BK20180235).


  1. 1. Gerstenmaier J, Gibson R. Ultrasound in chronic liver disease. Insights Into Imaging. 2014;5(4):441-455
  2. 2. Konstantinidis IT, Bajpai S, Kambadakone AR, Tanabe KK, Berger DL, Zheng H, et al. Gallbladder lesions identified on ultrasound. Lessons from the last 10 years. Journal of Gastrointestinal Surgery. 2012;16(3):549-553
  3. 3. Lahham S, Becker BA, Gari A, Bunch S, Alvarado M, Anderson CL, et al. Utility of common bile duct measurement in ed point of care ultrasound: A prospective study. The American Journal of Emergency Medicine. 2018;36(6):962-966
  4. 4. Omar A, Freeman S. Contrast-enhanced ultrasound of the spleen. Ultrasound. 2016;24(1):41-49
  5. 5. Larson MM. Ultrasound imaging of the hepatobiliary system and pancreas. Veterinary Clinics: Small Animal Practice. 2016;46(3):453-480
  6. 6. Correas J-M, Anglicheau D, Joly D, Gennisson J-L, Tanter M, Hélénon O. Ultrasound-based imaging methods of the kidney—Recent developments. Kidney International. 2016;90(6):1199-1210
  7. 7. Dietrich C, Ignee A, Barreiros A, Schreiber-Dietrich D, Sienz M, Bojunga J, et al. Contrast-enhanced ultrasound for imaging of adrenal masses. Ultraschall in der Medizin-European Journal of Ultrasound. 2010;31(02):163-168
  8. 8. Daurat A, Choquet O, Bringuier S, Charbit J, Egan M, Capdevila X. Diagnosis of postoperative urinary retention using a simplified ultrasound bladder measurement. Anesthesia & Analgesia. 2015;120(5):1033-1038
  9. 9. Mitterberger M, Horninger W, Aigner F, Pinggera GM, Steppan I, Rehder P, et al. Ultrasound of the prostate. Cancer Imaging. 2010;10(1):40
  10. 10. Haymart MR, Banerjee M, Reyes-Gastelum D, Caoili E, Norton EC. Thyroid ultrasound and the increase in diagnosis of low-risk thyroid cancer. The Journal of Clinical Endocrinology & Metabolism. 2019;104(3):785-792
  11. 11. Buonsenso D, Pata D, Chiaretti A. Covid-19 outbreak: Less stethoscope, more ultrasound. The Lancet Respiratory Medicine. 2020;8(5):e27
  12. 12. Soldati G, Smargiassi A, Inchingolo R, Buonsenso D, Perrone T, Briganti DF, et al. Proposal for international standardization of the use of lung ultrasound for covid-19 patients; a simple, quantitative, reproducible method. Journal of Ultrasound in Medicine. 2020;10:1413-1419
  13. 13. Arger PH, Schultz SM, Sehgal CM, Cary TW, Aronchick J. Teaching medical students diagnostic sonography. Journal of Ultrasound in Medicine. 2005;24(10):1365-1369
  14. 14. Hertzberg BS, Kliewer MA, Bowie JD, Carroll BA, DeLong DH, Gray L, et al. Physician training requirements in sonography: How many cases are needed for competence? American Journal of Roentgenology. 2000;174(5):1221-1227
  15. 15. Boctor EM, Choti MA, Burdette EC, Webster RJ III. Three-dimensional ultrasound-guided robotic needle placement: An experimental evaluation. The International Journal of Medical Robotics and Computer Assisted Surgery. 2008;4(2):180-191
  16. 16. Priester AM, Natarajan S, Culjat MO. Robotic ultrasound systems in medicine. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control. 2013;60(3):507-523
  17. 17. Chatelain P, Krupa A, Navab N. 3d ultrasound-guided robotic steering of a flexible needle via visual servoing. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). Washington, USA: IEEE; 2015. pp. 2250-2255
  18. 18. Seo J, Cho J, Woo H, Lee Y. Development of prototype system for robot-assisted ultrasound diagnosis. In: 2015 15th International Conference on Control, Automation and Systems (ICCAS). Busan, Korea: IEEE; 2015. pp. 1285-1288
  19. 19. Mathiassen K, Fjellin JE, Glette K, Hol PK, Elle OJ. An ultrasound robotic system using the commercial robot ur5. Frontiers in Robotics and AI. 2016;3:1
  20. 20. Guan X, Wu H, Hou X, Teng Q, Wei S, Jiang T, et al. Study of a 6dof robot assisted ultrasound scanning system and its simulated control handle. In: 2017 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM). Ningbo, China: IEEE; 2017. pp. 469-474
  21. 21. Sandoval J, Laribi MA, Zeghloul S, Arsicault M, Guilhem J-M. Cobot with prismatic compliant joint intended for doppler sonography. Robotics. 2020;9(1):14
  22. 22. Patlan-Rosales PA, Krupa A. A robotic control framework for 3-d quantitative ultrasound elastography. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). Marina Bay, Singapore: IEEE; 2017. pp. 3805-3810
  23. 23. Mathur B, Topiwala A, Schaffer S, Kam M, Saeidi H, Fleiter T, et al. A semi-autonomous robotic system for remote trauma assessment. In: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE). Athens, Greece: IEEE; 2019. pp. 649-656
  24. 24. Victorova M, Navarro-Alarcon D, Zheng Y-P. 3d ultrasound imaging of scoliosis with force-sensitive robotic scanning. In: 2019 Third IEEE International Conference on Robotic Computing (IRC). Naples, Italy: IEEE; 2019. pp. 262-265
  25. 25. Virga S, Zettinig O, Esposito M, Pfister K, Frisch B, Neff T, et al. Automatic force-compliant robotic ultrasound screening of abdominal aortic aneurysms. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Daejeon, Korea: IEEE; 2016. pp. 508-513
  26. 26. Kim YJ, Seo JH, Kim HR, Kim KG. Development of a control algorithm for the ultrasound scanning robot (nccusr) using ultrasound image and force feedback. The International Journal of Medical Robotics and Computer Assisted Surgery. 2017;13(2):e1756
  27. 27. Huang Q, Lan J, Li X. Robotic arm based automatic ultrasound scanning for three-dimensional imaging. IEEE Transactions on Industrial Informatics. 2018;15(2):1173-1182
  28. 28. Hennersperger C, Fuerst B, Virga S, Zettinig O, Frisch B, Neff T, et al. Towards mri-based autonomous robotic us acquisitions: A first feasibility study. IEEE Transactions on Medical Imaging. 2016;36(2):538-548
  29. 29. Ning G, Zhang X, Liao H. Autonomic robotic ultrasound imaging system based on reinforcement learning. IEEE Transactions on Bio-medical Engineering. 2021;68:2787-2797
  30. 30. Kim R, Schloen J, Campbell N, Horton S, Zderic V, Efimov I, et al. Robot-assisted semi-autonomous ultrasound imaging with tactile sensing and convolutional neural-networks. IEEE Transactions on Medical Robotics and Bionics. 2020;3:96-105
  31. 31. Deng X, Lei Z, Wang Y, Li M. Learning ultrasound scanning skills from human demonstrations. 2021. arXiv preprint arXiv:2111.09739. 2021. DOI: 10.48550/arXiv.2111.09739
  32. 32. Deng X, Chen Y, Chen F, Li M. Learning robotic ultrasound scanning skills via human demonstrations and guided explorations. 2021. arXiv preprint arXiv:2111.01625. DOI: 10.48550/arXiv.2111.01625
  33. 33. Hijab A, Rushdi MA, Gomaa MM, Eldeib A. Breast cancer classification in ultrasound images using transfer learning. In: 2019 Fifth International Conference on Advances in Biomedical Engineering (ICABME). Tripoli, Lebanon: IEEE; 2019. pp. 1-4
  34. 34. Ghose S, Oliver A, Mitra J, Mart R, Lladó X, Freixenet J, et al. A supervised learning framework of statistical shape and probability priors for automatic prostate segmentation in ultrasound images. Medical Image Analysis. 2013;17(6):587-600
  35. 35. Wang L, Yang S, Yang S, Zhao C, Tian G, Gao Y, et al. Automatic thyroid nodule recognition and diagnosis in ultrasound imaging with the yolov2 neural network. World Journal of Surgical Oncology. 2019;17(1):1-9
  36. 36. Liu S, Wang Y, Yang X, Lei B, Liu L, Li SX, et al. Deep learning in medical ultrasound analysis: A review. Engineering. 2019;5(2):261-275
  37. 37. Sridar P, Kumar A, Quinton A, Nanan R, Kim J, Krishnakumar R. Decision fusion-based fetal ultrasound image plane classification using convolutional neural networks. Ultrasound in Medicine & Biology. 2019;45(5):1259-1273
  38. 38. Roy S, Menapace W, Oei S, Luijten B, Fini E, Saltori C, et al. Deep learning for classification and localization of covid-19 markers in point-of-care lung ultrasound. IEEE Transactions on Medical Imaging. 2020;39(8):2676-2687
  39. 39. Ouahabi A, Taleb-Ahmed A. Deep learning for real-time semantic segmentation: Application in ultrasound imaging. Pattern Recognition Letters. 2021;144:27-34
  40. 40. Karamalis A, Wein W, Klein T, Navab N. Ultrasound confidence maps using random walks. Medical Image Analysis. 2012;16(6):1101-1112
  41. 41. Chatelain P, Krupa A, Navab N. Optimization of ultrasound image quality via visual servoing. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). Washington, USA: IEEE; 2015. pp. 5997-6002
  42. 42. Chatelain P, Krupa A, Navab N. Confidence-driven control of an ultrasound probe: Target-specific acoustic window optimization. In: 2016 IEEE International Conference on Robotics and Automation (ICRA). Stockholm, Sweden: IEEE; 2016. pp. 3441-3446
  43. 43. Chatelain P, Krupa A, Navab N. Confidence-driven control of an ultrasound probe. IEEE Transactions on Robotics. 2017;33(6):1410-1424
  44. 44. Droste R, Drukker L, Papageorghiou AT, Noble JA. Automatic probe movement guidance for freehand obstetric ultrasound. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Lima, Peru: Springer; 2020. pp. 583-592
  45. 45. Li K, Wang J, Xu Y, Qin H, Liu D, Liu L, et al. Autonomous navigation of an ultrasound probe towards standard scan planes with deep reinforcement learning. Xi’an, China: IEEE; 2021:8302–8308. arXiv preprint arXiv:2103.00718
  46. 46. Jiang Z, Grimm M, Zhou M, Hu Y, Esteban J, Navab N. Automatic force-based probe positioning for precise robotic ultrasound acquisition. IEEE Transactions on Industrial Electronics. 2020;68:11200-11211
  47. 47. Gao X, Ling J, Xiao X, Li M. Learning force-relevant skills from human demonstration. Complexity. 2019;2019:5262859
  48. 48. Zeng C, Yang C, Cheng H, Li Y, Dai S-L. Simultaneously encoding movement and semg-based stiffness for robotic skill learning. IEEE Transactions on Industrial Informatics. 2020;17(2):1244-1252
  49. 49. Holladay R, Lozano-Pérez T, Rodriguez A. Planning for multi-stage forceful manipulation. Xi'an, China: IEEE; 2021:6556–6562. arXiv preprint arXiv:2101.02679
  50. 50. Li M, Tahara K, Billard A. Learning task manifolds for constrained object manipulation. Autonomous Robots. 2018;42(1):159-174
  51. 51. Li M, Yin H, Tahara K, Billard A. Learning object-level impedance control for robust grasping and dexterous manipulation. In: 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE; 2014. pp. 6784-6791
  52. 52. Li M, Bekiroglu Y, Kragic D, Billard A. Learning of grasp adaptation through experience and tactile sensing. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, USA: IEEE; 2014. pp. 3339-3346
  53. 53. Lee MA, Zhu Y, Srinivasan K, Shah P, Savarese S, Fei-Fei L, et al. Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks. In: 2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE; 2019. pp. 8943-8950
  54. 54. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; 2018


  • More details about our original research:;

Written By

Miao Li and Xutian Deng

Reviewed: 26 April 2022 Published: 31 May 2022