Humanoid robots are expected to support various human tasks, such as high-mix low-volume production and assembly tasks. Programming technologies such as robot language and teaching-playback  have been developed, but it is difficult to apply these technologies to multi-fingered robots because they require instruction for both motion and force at many points simultaneously.
Recently, several techniques have been proposed that use human motion measurements directly as robot teaching data for automatic programming: teaching by showing , assembly plan from observation [3-4], gesture-based programming , and robot learning [6-10]. Applications for dual arm robots  have also been presented. These are based on measurements of motions and forces generated in the real world. Task programming based on the observation of human operation is viable for humanoid robots because it is not necessary to describe motions and forces explicitly for the robot to accomplish a task.
Direct teaching that involves remote robot operation  presents two difficulties. The first is caused by the communication time lag that arises when the robot is distant from the operator, which can make the remote-robot system unstable. The second is the issue of constant operator stress when any mistake on the operator’s part is immediately reflected in the robot motions and coudl result in a fatal accident. Robot teaching in a virtual reality (VR) environment can overcome these problems. We call this approach VR robot teaching. Several approaches to analyzing human intentions from human demonstrations have been presented [13-17]. Most of these studies, however, do not handle the virtual force generated in the VR environment as robot teaching data. Moreover, research on VR robot teaching for humanoid both-hands robots has not yet been presented.
Our group presented a concept of VR robot teaching for multi-fingered robots  in which the virtual forces at contact points were utilized to analyze human intention. We found that humans feel comfortable handling virtual hands based on a human hand model, but uneasy handling virtual hands based on a robot hand model, because the geometrical form and motional function of the robot hand is not the same as that of the human hand. To minimize the difference between human-robot fingertip position and orientation, mapping methods from human grasps to robot grasps have been studied [19-20]; these did not, however, take the manipulability of the robot hand into consideration. Hand manipulability is a key measure for stable and robust robot grasps . Moreover, a segmentation method for processing human motion data, including plural tasks, whose segmentation tree is additive for any new primitive motion required for performing a new task, has been presented . These studies did not, however, handle 3D forces at contact points because a force-feedback glove, consisting of a data glove and a force display mechanism using wire rope, were used as a haptic interface, which could only display one dimensional force to the fingertips. Hence, it was difficult to teach a task that included a contact with another object and a motion along a surface of the object. 3D forces at contact points are key information for human intention analysis. Moreover, an expansion of VR robot teaching is needed for humanoid both-hands robots.
This paper presents a VR robot teaching method  for a humanoid robot hand using a multi-fingered haptic interface capable of displaying 3D force at each fingertip of the operator. In this teaching, segmentation of motion, task recognition, and re-segmentation of motion are executed sequentially using 3D forces. That is, human motion data consisting of contact points, grasped force, hand and object positions, and the like are segmented into plural primitive motions based on a segmentation tree: the type of task is analyzed based on the acquired sequence of primitive motions, and re-segmentation of the motion is executed sequentially. We demonstrate how the segmentation tree is additive for new primitive motions as part of performing a new task using 3D forces. In this method, the position and orientation of the robot hand are determined so as to maximize its manipulability, on the condition that the robot grasps the object at its teaching contact point. This approach makes the virtual teaching system very user friendly. We present the experimental results of performing a task, which includes contacting another object and moving along the surface of the wall, using a humanoid robot hand named Gifu Hand III  and a multi-fingered haptic interface robot called the HIRO II . Moreover, we extend the VR robot teaching to humanoid both-hands robots by adding primitive motions of human bimanual coordination, which consist of Equal Grasp by both hands, Main Grasp by one hand, Pass from one hand to another, etc., to the segmented tree. Type of task by both-hands is also analyzed based on the acquired sequence of primitive motions. This shows that the segmentation tree can also be additive for bimanual coordination tasks.
The proposed method will be useful in the efficient manufacturing of a wide variety of products in small quantities, and for teleportation with large time delay using humanoid hand robots.
2. Virtual robot teaching
2.1. Scheme of virtual robot teaching
A conceptual scheme of the VR robot teaching system is shown in Figure 1. The system consists of a VR robot teaching system and remotely located robot system. In the former, the human carries out various tasks in a VR environment, employing a virtual object by handling a multi-fingered haptic interface. From the series of human motions and 3D forces acting on the human’s fingers, the motion intention of the human is analyzed. Based on the motion intention analysis, a series of robot commands, which include desired trajectories, grasping forces, and contact points, are generated for the object coordinate frame. The robot commands are tested in the robot simulation system and then sent to the remote robot system. In the robot system, the robot works according to the robot commands. The robot system can absorb a slight geometrical difference between the virtual space and real space, because the robot obeys commands relative to the object coordinate frame.
This scheme has two advantages. The first is that communication time delay has no effect, since the robot commands are generated from the off-line motion analysis. The second is that the human is relieved of continual stress, since inadvertent human error can be compensated for.
2.2. Robot with Gifu Hand III
We consider a multi-fingered hand robot equipped with the Gifu Hand III developed by our group . The shape and mechanism of the Gifu Hand III were designed to resemble those of the human hand. That is, it has a thumb and 4 fingers, the thumb has 4 joints with 4 degree of freedoms (DOF), and each finger has 4 joints with 3 DOF. All servomotors are mounted in the hand frame. A 6-axis force sensor can be attached to each fingertip, and a distributed tactile sensor with 859 detecting points can be mounted on the surfaces of the palm and fingers. Since the Gifu Hand III was designed so that not only the shape but also the mechanism were very similar to those of the human hand, as long as the shape of the object is simple and the size is manageable, most of the measured human motion data can be applied directly to the robot command.
2.3. Haptic interface robot HIRO II
The multi-fingered haptic interface HIRO II  shown in Figure 2 can present force and tactile feeling at the 5 fingertips of the human hand. The HIRO II design is completely safe. The mechanism of HIRO II consists of a 6-DOF arm and a 15-DOF hand with a thumb and 4 fingers. Each finger has 3 joints, allowing 3 DOF. The first joint, relative to the base of the hand, allows abduction/adduction. The second and third joints allow flexion/extension. The thumb is similar to the fingers except for the reduction gear ratio and the movable ranges of joints 1 and 2. In order to read the finger loading, a 6-axis force sensor is installed in the second link of each finger. The user must wear finger holders over his/her fingertips to manipulate the haptic interface. Each finger holder has a ball attached to a permanent magnet at the force sensor tip and forms a passive spherical joint. This passive spherical joint has two roles. First, differences between the human finger orientation and the haptic finger orientation are adjusted. Second, it allows operators to remove their fingers from the haptic interface in case of a malfunction. The suction force generated by the permanent magnet is 5 N. Humans can feel 3D force at each fingertip through the HIRO II.
2.4. Motion segmentation
Assembly work consists of plural tasks, such as pick-and-place, peg-in-hole, peg-pullout-from-hole, and so on. Hence, the motion intention analysis system should have the ability to recognize the type of task from human motion data. A segmentation method  has been proposed to realize this function in which segmentation of motion, task recognition, and re-segmentation of motion are executed sequentially.
A flowchart of the segmentation of motion data is shown in Figure 3, which is a modified version of a previous one. In this segmentation tree, a pick-and-place task, peg-in-hole task, peg-pullout-from-hole task, turn-screw task, slide task, pick-and-press task, pick-and-follow task, trace task and press task are considered. The first 5 tasks can be performed without 3D force interface, but the last four cannot, because 3D contact force is required to present the human motion when the target object is in contact with multiple objects. We assumed that the human motion to execute these tasks consists of 14 primitive motions as follows: Move, Approach, Grasp, Translate, Slide, Insert, Pullout, Release, Place, Turn, Press, Follow, Push, and Trace. The last 4 primitive motions are added to segment the contact tasks when humans use the 3D force display. This means that the segmentation tree is additive for the potential use of 3D forces by a simple modification, indicated by the colored cells in Figure 3.
The Move segment indicates only the operator’s hand moving at a point distant from a virtual object; with Approach, the operator’s fingertip is coming close to a virtual object, but does not touch it, and the previous motion is Move. With Grasp, the finger contacts an object located on a base or another object; the grasp condition is satisfied and the previous motion of the Grasp is not Translate. The grasp condition means that the virtual force generated by the interference between the finger and virtual object is greater than a specified value. This grasp is the precision grasp presented by Cutkosky . In the Translate segment, the object moves with the hand and fingers as one unit; that is, the virtual object departs from the base or other object and the grasp condition is satisfied. In the Place segment, the object contacts the environment and the previous motion is Translate. In human operation, it is difficult to distinguish the Translate and Place segments exactly, and indeed the operator does not feel the distinction between them. We assumed that the starting point of the Place segment is the moment at which the virtual object first contacts the environment. In the Release segment, the fingertip leaves the object and the previous motion is not Move. Hence, the starting point of the Release segment is the moment at which one of the fingertips leaves the object. In the Slide segment, the object is touched by the hand satisfying the grasp condition and translated to a point. In Insert, a finger contacts a target object set inside another object, and the target object moves toward the other object; meanwhile in Pullout, a finger is in contact with a target object inside another object, and the target object moves away from the other object. In the Turn segment, the object is turned around an axis. In the Press section, the object in a satisfying grasp condition contacts another object, and in Follow, the object in a satisfying grasp condition moves along the surface of another object with contact force. In the Push segment, fingers touch an object and move it toward another object until they are in contact, but the grasp condition is not satisfied. In Trace, fingers touch an object and move it along the surface of another object with contact force, but again the grasp condition is not satisfied.
To segment the motion data, three coordinate frames are utilized: the reference coordinate frame, the origin of which is fixed in the task space; the object coordinate frame fixed in the object; and the hand coordinate frame fixed in the hand. The following 6 parameters are measured as the motion data: an object position with respect to the reference coordinate frame refpobject, an i-th fingertip position with respect to the object coordinate frame objectpi-th finger, a virtual force at the i-th fingertip with respect to the reference coordinate frame reff i-th finger and the object velocity refvobject. Moreover, the index of the grasp space of hand objectPfinger (), the sum of fingertip forces refFfinger () and the contact state flag, which indicates a contact state between the target object and the other object, are evaluated. By using these parameters, the distance between the hand and the object, the presence or absence of contact between a fingertip and the object, the contact relation between the target object and the other object, and the grasping condition are evaluated and used in the segmentation tree. As a result, the motion data is segmented into primitive motions.
2.5. Task analysis
Type of task is analyzed based on the sequence of the obtained primitive motions in the following manner. A sequence of primitive motions from Move to Release is 1 task because the hand must move to the target object to do something first, then release the object at the end of the task. When Insert is in the sequence of primitive motions, it is a peg-in-hole task. When the sequence of primitive motions includes Pullout, it is a peg-pullout-from-hole task. When Turn is in the sequence of primitive motions, it is a turn-screw task. When Follow is included in the sequence of primitive motions, it is a pick-follow task. The inclusion of Press in the sequence of primitive motions indicates a pick-press task; Follow indicates a pick-and-follow task; and Push in the sequence of primitive motions, signifies a push task. When Trace and Press are both in the sequence of primitive motions, it is a trace task.
After task recognition, primitive motions are relabeled based on the recognized task. For example, if Slide is between Place and Release, it is combined into Place, because Slide is a fluctuation by the operator. Meanwhile, Insert and Pullout in a turn-screw task are combined as Turn because they happen at a low angular velocity to the target object. This modification process can be added as appropriate when a new primitive motion is added. After the relabeling based on the recognized task, the desired trajectories, contact points, and contact forces are analyzed within each segment.
2.6. Robot command
The geometrical form of the Gifu Hand III is similar to that of the human hand, but the two are not identical. In particular, the space between the thumb and opposing fingers is smaller in the Gifu Hand III because of a mechanical design limitation. In order to be able to map teaching data based on the human-hand model to teaching data for a robot hand, a virtual teaching method for multi-fingered robots based on a combination of scaling the virtual hand model to the size of the robot hand and hand manipulability has been developed. In this method, the position and orientation of the robot hand are determined so as to maximize the manipulability of the robot hand, on the condition that the robot grasps the object at the object’s teaching contact point. More details can be found in .
3.1. Experimental system
Figure 4(a) illustrates the experimental system. An operator manipulates virtual objects in the virtual environment through the multi-fingered haptic interface robot HIRO II. There are two objects in a box; both are cubic, 120 [mm] on each side, with a mass of 50 [g] and a friction surface coefficient of 0.4. Fingertip position is indicated by a small ball in the computer graphics (CG). Static and dynamic friction coefficients between fingertip and object are 1.0 and 0.5, respectively. The robot arm is controlled by the position PID control with a friction compensator. The robot hand is controlled by the impedance control, which consists of the position PD control and the force feedback control. The control sampling cycle is 1 ms.
3.2. Pick-and-follow task
A task in which a right-side object was translated to the right-side corner was executed. The operator executed the VR robot teaching in a virtual environment, as shown in Figure 4(b). First, the operator executed a pick-and-follow task, in which he moved his hand to the target object, grasped and picked it up on the base, translated the object to the wall along the x-axis until contact, followed along the wall to the base while grasping the object and keeping contact between the object and the wall, and then released the object. The operator then executed a push-and-trace task without grasping the object, in which he pushed the object to the wall with two fingers, traced the wall to the corner keeping contact between the object and the wall, and then released the object. Figure 5 shows the measured parameters, primitive motions obtained by the proposed segmentation, and recognized tasks, which consisted of a pick-and-follow task and a push-and-trace task. Points on the curving parameter line show the timing to separate the primitive motions. For example, Move and Approach in the task are segmented by the magnitude of objectPfinger; Grasp is segmented by the gravitational direction element of refFfinger; Translate is segmented by the gravitational direction element of refpobject; Follow and Release are segmented by the norm of refFfinger and the contact flag.
After the segmentation of primitive motions, a motion sequence from Move to Release was grouped into a task, and a task type recognized. This showed that the segmentation of motion data and recognition of task were executed appropriately. After the recognition of task type, the primitive motions were relabeled based on the task understanding. For example, Slide between Approach and Grasp in the first pick-and-follow task was combined as Grasp because the Slide was a fluctuation by the operator. Similarly, the second Push between Trace and Trace in the second push-and-trace task was combined into Trace. After relabeling, robot finger trajectories were generated smoothly.
Desired position and force profiles were smoothed in order to reduce the vibration in robot hand motion. After performing an action, the position and orientation of the robot hand was determined to maximize the hand manipulability measure and robot commands were generated. The computer simulation was then executed to check the robot commands, as shown in Figure 6(a), in which the CG of primitive motions are presented. Finally, the tasks were executed experimentally by the 6-DOF robot arm with the Gifu Hand III, as shown in Figure 6(b). These images show that the proposed segmentation can be applied to robot teaching that includes performance of plural tasks.
The force profile at the thumb fingertip is shown in Figure 7. The x, y, and z elements of the force show the friction depending on contact force between the object and the wall, the normal contact force between the object and fingertip of the thumb, and the friction caused by gravitational force, respectively. The total force acting on the object is shown in Figure 8. These show the contact timings between finger and object, and object and wall. Experimental results almost follow the desired profiles. The robot could execute the task, and the operator could feel a realistic 3D force.
4. Virtual robot teaching for both-hand robots
Both-hands robots are expected to execute more varied and complicated tasks than single-hand robots, because both-hands robots can accomplish single-hand tasks with each hand independently, as well as both-hands coordinated tasks. We extended the VR robot teaching for single-hand robots in Section II to VR robot teaching for both-hands robots. The basic approach was to distinguish between single-hand task and both-hands coordinated tasks.
4.1. Work environment
The experimental system is shown in Figure 9. An operator manipulates virtual objects in CG through the bimanual multi-fingered haptic interface robot HIRO II. A peg-in-hole task in CG is shown in Figure 10 as an example of a work environment. The ten white spheres show the fingertip positions of the human measured by the HIRO II. The virtual objects are a circular ring and a cylinder with a pole. The VR system checks physical interference between the following geometrical relations: fingertip and the virtual object, the virtual object and environment, and one virtual object and the other. The position and orientation of virtual objects are transformed based on physical dynamics models.
4.2. Segmentation and task analysis
The conceptual scheme of the VR robot teaching system was described in Section II. In this section, the system is extended to both-hands task. Both-hands task means bimanual coordination work, such as Push-and-trace, Pick-and-follow, Peg-in-hole, and Pick-and-place, etc. For example, the case in which an object is grasped in each hand and the two objects are brought into contact in air by the left and right hands would be considered a Both-hand task. Figure 11 is a decision flowchart for Both-hand task. In the figure, both Left-hand task and Right-hand task are related to the flow chart in Fig. 3. This part is an appended segmentation tree for both-hands tasks.
When humans translate an object using both hands, the sequence of motion of the work can be divided into plural primitive motions based on grasping state, work flow, and so on. These primitive motions include human intention. The robot should be controlled based on the human intention. We assume that both-hand tasks consist of the following 5 additional primitive motions: Equal Grasp, Main Grasp, Pass, Fit, and Translate. In the Equal Grasp segment, both hands grasp an object and translate it with equal contact force. In Main Grasp, the left or right hand has the main grasp on an object, and the other hand’s grasp is ancillary. In Pass, one hand grasps an object and passes it to the other hand. The object is then translated to the target position. In Fit the left and right hands translate two objects individually, then bring the two into contact. In Translate, the two objects are translated by the left and right hands individually, and do not interfere with each other. The segmentation tree for these primitive motions is shown in Figure 12. When new primitive motion is needed, the segmentation tree is additive by simple modification.
Task type is analyzed based on the sequence of the obtained primitive motions in the manner described in Section II-E. A sequence of primitive motions from Move to Release is considered to be one task, because the hand must move to the target object to accomplish something first, and must release the object at the end of the task. In task recognition, the task is recognized by a key primitive motion. For example, if Insert is in the sequence of primitive tasks, the task is a Peg-in-hole task; if the sequence of primitive tasks involves Push, the task is a Push-and-trace task.
After task recognition, primitive motions are relabeled based on the recognized task. For example, Fit between Translate and Translate is combined as Translate because Fit is a fluctuation by the operator. This modification process will be added as appropriate when a new primitive motion is added. After the relabeling based on the recognized task, the desired trajectories, contact points, and contact forces are analyzed within each segment.
4.3. Experiment of Pick-and-Place Task
We executed VR robot teaching in a virtual environment through bimanual multi-fingered haptic interfaces, as shown in Figure 13. The operator executed a pick-and-place task, in which he moved his left hand to the target object, grasped and picked it up with the left hand, passed it to right hand, grasped it with the right hand, translated it to a target position, and released it. Figure 14 shows the measured parameters, primitive motions obtained by the proposed segmentation, and the recognized task, which consisted of two pick-and-place tasks. Points on the curving parameter line show the timing to separate the primitive motions as explained in Section II-B. For example, in bimanual operation, Main Grasp and Equal Grasp are segmented by the norm of refFfinger and the contact flag.
After the segmentation of primitive motions, a motion sequence from Move to Release was grouped into a task, and a task type recognized. This showed that the segmentation of motion data and recognition of task were executed appropriately. Once the task type was recognized, the primitive motions were relabeled based on the task understanding. For example, in bimanual operation, the sequence Main Grasp by left hand, Equal Grasp, and Main Grasp by right hand were combined into Pass. After relabeling, robot finger trajectories were generated smoothly.
Desired position and force profiles were smoothed to reduce the vibration in robot hand motion. After performing a task, the position and orientation of the robot hand was decided to maximize the hand manipulability measure  and robot commands were generated. The computer simulation was then executed to check the robot commands, as shown in Figure 15, in which the CG of primitive motions is presented.
We presented a VR robot teaching system, consisting of human demonstration and motion-intention analysis in a virtual reality environment using a multi-fingered haptic interface for automatic programming of multi-fingered robots. This approach has extended VR robot teaching to bimanual tasks using both-hand multi-fingered haptic interfaces. By using 3D forces at contact points between human fingers and an object, new tasks, including contact with multiple objects, can be learned in a virtual reality environment. The segmentation is executed according to the proposed segmentation tree, which is additive for new primitive motions. Task type is analyzed based on the obtained sequence of primitive motions, and the primitive motions are relabeled based on the recognized task. This method permits us to demonstrate plural tasks sequentially in a virtual reality environment.
This approach makes the virtual teaching system user-friendly. Our experimental results for performing an assembly task using a humanoid robot hand named Gifu Hand III and a multi-fingered haptic interface robot HIRO II demonstrate the effectiveness of the proposed method. Furthermore, we demonstrated that the VR robot teaching method can be extended to both-hands robot teaching.
This paper was supported in part by SCOPE (No. 121806001), by the Ministry of Internal Affairs and Communications, and by a Grant-in-Aid for Scientific Research from JSPS, Japan ((A) No. 26249063). The authors would like to thank the members of our laboratory, and in particular, Mr. Syunsuke Nanmo for his cooperation with the experiments.