Open access peer-reviewed chapter

The Extraction of Symbolic Postures to Transfer Social Cues into Robot

By P. Ravindra S. De Silva, Tohru Matsumoto, Stephen G. Lambacher, Ajith P. Madurapperuma, Susantha Herath and Masatake Higashi

Published: January 1st 2010

DOI: 10.5772/7123

Downloaded: 1219

1. Introduction

At present, the inclination of robotic researchers is to develop social robots for a variety of application domains. Socially intelligent robots are capable of having natural interaction with a human by engaging in complex social functions. The challengeable issue is to transfer these social functions into a robot. This requires the development of computation modalities with intelligent and autonomous capabilities for reacting to a human partner within different contexts. More importantly, a robot needs to interact with a human partner through human-trusted social cues which create the interface for natural communication. To execute the above goals, robotic researchers have proposed a variety of concepts that are biologically-inspired and based on other theoretical concepts related to psychology and cognitive science. Recent robotic research has been able to achieve the transference of social behaviors into a robot through imitation-based learning (Ito et al., 2007) (Takano & Nakamura, 2006), and the related learning algorithms have helped in acquiring a variety of natural social cues. The acquired social behaviors have emphasized equipping robots with natural and trusted human interactions, which can be used to develop a wide range of robotic applications (Tapus et al., 2007).

The transference of a variety of skills into a robot involves several diminutive and imperative processes: the need for efficient media for gathering human motion precisely, the elicitation of key characteristic of motion, a generic approach to generate robot motion through the key characteristics of motion, and the need for an approach to evaluate generated robot motions or skills. The use of media for amassing human motions has become a crucial factor that is very important for attaining an agent's motion within deficit noisy data. Current imitation research has explored ways of simulating accurate human motions for robot imitations through a motion capture system (Calinon & Billard, 2007(a)) or through image processing techniques (Riley et al., 2003). A motion capture system provides accurate data that is quieter than image processing techniques (Calinon & Billard, 2007(b)). However, approaches using existing motion capture systems or image processing techniques have faced tedious problems. For example, when using a current motion capture system, markers must be placed on the subject's body, which sometimes causes discomfort for expressing natural motion. Also, image processing techniques utilize more than five cameras to detect human motions, which is a technically difficult task when processing information from five cameras simultaneously.

The earlier stage of imitation research (Hovel et al., 1996) (Ikeuchi et al., 1993) has focused on action recognition and detection of task sequences to teach a demonstrator's task to robots. They have mostly focused on developing perceptual algorithms for visual recognition and analysis of human action sequences. Perceptions were segmented into the actions for defining demonstrator tasks, and these sub-tasks (sequences) were repeated by the robot's arm. This work has dealt with a robot's arm for imitating a demonstrator's tasks, which has been convenient for generating a robot's arm motion in comparison to a robot's whole body motions. A human's body motions are complex when it performs tasks or behaviors, with the angle of their body parts dynamically changing (the kinematics of body motion), and each of the body angles have a relationship to each other. To transfer a demonstrator's motions into a robot, we must consider the above points, including the characteristics of motions.

In essence, an imitation approach must assort the characteristics of an agent's motion: the speed of the motion, the acceleration of motions, the distribution of motions, the changing point of motion directions, etc. Since recent robotic platforms have focused on developing the kosher mathematical model for extracting the characteristics of human motion, these extractions have evolved conveniently for transferring human motion into a robot (Aleotti & Caselli, 2005) (Dillmann, 2004). Kuniyoshi (Kuniyoshi et al., 1994) proposed a robot imitation framework that reproduces a performer's motion by observing the characteristics of motion patterns. A robot has reproduced a complex motion pattern through a recurrent neural network model.

Inamura (Inamura et al., 2004) proposed a robot learning framework by extracting motion segmentation. Motion segmentation has been employed by a Hidden Markov Model (HMM) for the acquisition of a proto symbol to represent body motion. These elicited motion segmentations with a proto symbol have been expended to generate a robot's motions. A problem with these contributions has been the patterns of motion have been assorted by observing the entire motion in each time interval. Instead of assorting the characteristics of motion via observation, it is important to design a mathematical model for selecting the characteristics of motion autonomously.

Another tendency of the proposed motion primitives is based on a framework for robot learning of complex human motions (Kajita et al., 2003) (Mataric, 2000). Recognizing primal motion primitives in each time interval is a decisive issue which is used for generating a whole robotic motion by combining the extracted motion primitives. In (Shiratori et al., 2004), the proposed robot learns dancing through motion primitives, and the forced assumption of an entire dance motion is a combination of determinate motion primitives. To disclose the motion primitives, the speed of the hands and legs during dancing and the rhythm of music are used. Most educed motion primitives are not meaningful and are difficult to replicate. The motion primitives-based techniques are able to cope with a variety of problems when motion primitives are extracted. Thus, there is a need to define diverse motion primitives and to yield to the whole motion through defined motion primitives. This procedure is able to procure different motion patterns that are dissimilar to the original agent's motions. Also, a motion primitive-based technique has to rely on a starting and end points of each motion primitive to generate a robot's motion accurately, which is contestable and arduous in this field.

Calinon & Billard (Calinon & Billard, 2007(c)) have proposed a robot imitation algorithm that projects motion data into a latent space, and the resulting data is employed by the Gaussian Mixture Model (GMM) in order to generate the robot's motion. In addition, a demonstrator is used to refine their motion while the robot reproduces the skills. Several statistical techniques, including a demonstrator motion and a motion-refined strategy were employed for generating the robot's motions. The proposed approach must process a demonstrator motion with recent motion-refined information simultaneously in order to successfully implement the imitation task. We believe their imitation task became too complicated, and another mathematical approach which combines the demonstrator's motion with a motion refine task (robot's motor information) for determining the robot's motions must be considered. The main emphasis of the robot imitation algorithm is that it relies on using less motion data (selecting symbolic postures), and it is necessary to conceive the robot limitation and environment using a simple mathematical framework for imitating human motion precisely.

In our approach, the robot does not use an agent's entire body motion to generate its motion. Instead, it selects preferable symbolic postures to re-generate the robot's motion through the dissimilarity values without any prior knowledge of social cues. Most existing imitation research attempts to transfer an agent's entire motion without considering a robot's limitations (e.g., motor information, body angles, and limitation of robot's motion). These methods are only applicable for predefined contexts, and are inconvenient to consider as a general framework for robot imitation in different contexts.

In contrast, our approach aims to extract symbolic postures, and through these elicited postures the robot generates the rest of the motions while its limitations are enumerated. Therefore, our proposed approach attempts to generate robot motion in different contexts without changing the general framework. Reinforcement Learning (RF) (Kaelbling et al., 1996) is utilized for finding optimal symbolic postures between two selected consecutive dissimilar postures.

2. Human motion tracking

Our approach needs to acquire human's motion information to transfer natural social cues into robot. To accomplish the above task, we have proposed the use of a single camera-based, image-processing technique to accurately obtain a agent's upper body motion. We attach a small color patch to a agent's head, right shoulder, right elbow right wrist, body/naval, left wrist, and left elbow (see Fig. 1). Through these markers, we estimate a agent's 12 upper body angles: hip front angle, shoulder font/rear angle (both left and right hand), shoulder twist angle (both left and right hand), elbow angle (both left and right hand), head front angle, neck twist angle, and neck tilt angle (see Fig. 1 for more details).

3. The extraction of symbolic postures

In this paper, we propose an approach capable of learning and eliciting the motions' segmentation points through postures dissimilarity values without any prior knowledge of the motions.

Figure 1.

a): Attached color patch to the agent's upper body, (b): initial camera setup to detect each body position, (c): angle between camera and body, (d): hip front angle, (e): shoulder front/rear and right/left angle, (f): shoulder twist angle and elbow angle, (g): head front angle, (h): neck twist angle, (i): neck tilt angle.

Our approach assumes that the highest potential dissimilarity posture (points) can change the direction of the motion or the pattern of motion. Here we assumed that the characteristics of posture can be extracted through 12 upper body angles with the mean and variance of the postures in each frame. The postures' dissimilarity values can be computed according to the correlation of two consecutive postures. In this phase we explore the possible key-motion points which are capable of changing the motion pattern or motion directions.

First, we estimated the dissimilarity of two consecutive postures, and the highest dissimilarity values were directed to elicit dissimilarity postures from the entire motion. During this phase, we selected only higher dissimilarity postures which fulfill the 0.8 < i i+1 1 condition. Then, the earliest postures of consecutive postures were selected; for example, if posture number i and posture number i+1 have the highest dissimilarity value (max i i+1), then only posture i was considered for further estimation. Here i and i+1 represent the standard deviation of posture i and posture i+1, since ij is defined as the angle of postures i of joint angle j,i and represents the mean value of posture i. Similarly, i+1j is defined as the angle of posture i+1 of joint angle j andi+1 represents the mean value of posture i+1 consorted with 12 upper body angles. The posture dissimilarity value (varying between 0 i i+1 1) could be obtained through the following equation:

ρi i+1=|[(n1)σiσi+1Σj=1..12(βijβ¯i) (βi+1jβ¯i+1)]/(n1)σiσi+1 |E1

The significance of our approach was to estimate the possible key-motion points which are common for 12 upper body angles.

However, a study by (Calinon & Billard, 2007(d)) showed that it was necessary to consider each of the joint angles separately for extracting key-motion points. We believe that we have to consider the structure of the posture (combination of joint angles) to elicit key-motion points, since a posture provides information about how each of the joint angles are related in a particular frame. Accordingly, the selected key-motion points were considered as segmentation points of the demonstrator's motions.

4. Elicitation of optimal symbolic postures from reinforcement learning

In a study by (Calinon & Billard, 2007(d)) (Inamura et al., 2004) an HMM model was used for extracting dynamic features of a demonstrator's motions at states of the HMM to construct a robot's motions. Aude (Calinon & Billard, 2007(d)) used an HMM model with the Viterbi algorithm to elicit key-motion points from the entire motion. Here, the Viterbi algorithm searches the most significant state combinations from the inflexion point which are selected by local minimum or maximum points. As is generally known, a Viterbi algorithm searches an optimal state sequence to model motion or behavior. Moreover, the approach forces the Viterbi algorithm to select the best state sequence from inflexion points. But one problem is that the mechanism of the Viterbi algorithm does not consider eliciting the best state sequence, which includes the best key motion points to construct robot's motion. In that sense, there is a limitation in using an HMM for eliciting key motion points which can be considered as the best key motion points to generate a robot's motion - although HMM does provide the best sequence of states for modeling a human's motion or behaviors.

In our approach we used a Reinforcement Learning (Kaelbling et al., 1996) algorithm to learn and extract the most significant postures, which considered the individual difference of the postures. An RF mechanism is capable of directly considering the posture dissimilarity values to find the optimum postures (key motions) in order to construct the robot's motion for a given demonstrator's motion. This is the motivation for and advantage of using RF compared to a HMM, since RF learning extracts a few postures that have maximum individual differences of postures compared with entire postures. We estimated the postures dissimilarity values (pii+1) through equation 1. The estimated values are considred as the states in Q-learning (pii+1 si), and the action is defined as the movemnet of state si si+1. We can define Q-learning function as:

Qˆ(si,ai)(1ai) ri(si,ai)+ai[R(si,ai)+γmaxªQ(si+1,ai)]E2

Where R (si, ai) is the reward matrix for each of the actions. The action ai is defined as the movement of one state (posture) to another state (posture) and the element of the reward matrix is based on the value of the state transit (action) which is estimated using posture dissimilarity. In the Q-learning function, the action policy was defined as an essential part to find the optimal postures that have a maximum individual difference when compared with the other postures (motion points) or the optimal verdict to the Q-learning (see Fig.3). Accordingly, we defined two action policies: a state transit can move from one state si to another state sk with i<k, and a state transit cannot be at a similar state (no link between si and si ).

Figure 2.

An illustration of the proposed novel approach for generating the robot's social cues. First the symbolic postures are extracted through dissimilarity values and the Q-learning algorithm is utilized to find the optimal symbolic postures between selected postures in the previous step. In the final step, each angle is considered as a separately divisional cubic spline in order to generate robot motion through selected symbolic postures.

To process Q-learning, we must initialize the rewards matrix R (si, ai) whose estimatation is based on the individual difference of postures estimated by ik= R(si sk, ai), where i<k.

Consequently, if element of R(si sk, ai)>0, the initial reward matrix has a connection between si and sk; otherwise, the reward matrix does not have a connection between si and sk.

Figure 3.

a) An illustration of the action policy of reinforcement learning to extract optimal symbolic postures. The action moves from one state to another si sk with i<k, and the action does not remain at the same state (no connection between si to si). For example, we do not have any connections from s2 to s1, and also from s3 to s2; and actions do not remain at the same states. (b) The initial reward matrix is defined according to: if si- sk>0; the reward matrix then creates the connection between si and sk. If si- sk = 0, the reward matrix does not have a connection between those states. For example, if the above example satisfies s1- s2>0 and s2- s4>0, then the reward matrix has a connection for each state; but if s2- s3 = 0, then the reward matrix does not have a connection between them.

These policies are applied to the initial reward matrix. Here, we determine the learning rate t and the discount factor as 1. In the initial stage, we setup the Q-matrix Q(st, at) as a zero matrix. Afterwards, we update ˆQ(st, at) using the reward matrix. After updating the ˆQ(st, at), we employed the epsilon greedy policy to find out the optimal state, and the corresponding key state was used as a guide to extract the optimal key-motion points (postures). RF is the concept of extracting postures that have the most individual difference values from motion sequences. In extracting these postures (key-motion), we assumed that the changing point of motion direction or motion pattern was also significant for constructing a robot's motion.

A similar mechanism is applied to the rest of the unlearned postures to extract optimal symbolic postures from the entire range of human motions. After extracting the optimal symbolic postures, our approach incorporates the divisional cubic spline interpolation for generating a robot's motion, considering each of angles as separate. Please refer to Fig.2 for further understanding of the proposed algorithm.

3. Generating robot motions

In this phase, we consider the trajectory of the angle (demonstrator) separately in the task space to construct each of the robot's angles, since we know the body scales of the robot and demonstrator are totally different. Indeed, both robot motion and demonstrator motion are proportional to each other when we capture motion through their joint angles because the body joint angles do not depend on the scale of the body. To construct the robot's motion, each of the angle trajectories are considered separately in task space, and selected key-motion points (common for every angle) are considered as reference points in the spline interpolation to construct the robot motions. We can define selected reference motion points as (0, 1.., n), where i= 0,1,..,n represents the selected key-motion points, and the corresponding time as (t1, t2,.., tn). The divisional cubic spline interpolation is defined as:


where tj< t < tj+1, j= 0,1..., n-1 ; also aj, bj, cj, and dj are unknown parameters. Each cubic spline is generated by considering two consecutive points. To estimate aj, bj, cj, and dj, we need to define uj, hj, and vj:

S"(tj)=ujS"(t0)=u0=S"(tn)=un=0hj=tj+1tj,j=0,1,...,n1vj=6{[(βj+1βj)/hj][(βjβj1)/hj1]}, j=1,2,..,n1E4

After estimating uj, we compute aj, bj, and c j in the following way;

aj=(uj+1uj)/6(tj+1 tj)bj=uj/2cj=[(βj+1βj)/(tj+1tj)](1/6)[(tj+1tj)(2uj+uj+1)] dj=βjwhere j=0,1,..., n1E5

Estimating the above parameters at time tj, where j= 0, 1,...,n we can generate an angle of robot's smooth motion. A similar approach is utilized for generating data of other angles for obtaining an entire robot's motion smoothly and precisely.

Figure 4.

The human agent expressing a "pointing gesture" in (a)(b)(c), and the robot successfully imitating the "pointing gesture'' in (d)(e)(f).

5. Experimental protocol

The non-verbal communication channels help to transfer information interactively, and to provide more explicit elucidation to the meaning of verbal language. Since non-verbal communication is an essential channel in human communication for language understanding. Among these channels, a gesture-based channel plays a dominant role in human-human communication.

Recently, robotic research induced the development of a social cue-embodied robot to ameliorate the interface for natural human-robot interactions. A gesture-based channel can be used to more efficaciously and attractively create natural social cues embodied in a robot when in comparison with other communication channels, e.g., facial expressions. However, a gesture-based channel has played a major role in human-human communications, and we believe that a similar manner will work in human-robot interactions.

The experiment was conducted with a Fujitsu HOPE-3 robot with 28 degrees of freedom. The robot's leg DOF was set to a constant position. The human agent wore eight color patches and expressed three social cues in a natural way. Through an image processing technique, we estimated the position of the color patch within each frame. During the process, we first estimated the angle between the human body and camera position, which helped to estimate the 12 body angles.

Since, in our experiment, we attempt to transfer three social cues: a "pointing'' gesture (see Fig. 4), "a gesture for explaining something attractively'' (see Fig. 5) and a gesture for expressing "I don't know'' (see Fig. 6), the human agent is used for transferring these selected social cues to the robot through the proposed imitation algorithm. The aforementioned gesture-based social cues are frequently used in human-human communication, and consequently for these social cues the robot would be used to create better natural human-robot interactions.

Figure 5.

The human expressing a "gesture for explaining something attractively,'' and the robot transferring the social cue precisely through the proposed imitation algorithm.

The dissimilarity values using the reinforcement learning method was applied to elicit symbolic key postures from the entire motions. Finally, we utilized the divisional cubic spline interpolation for generating robot motion considering each of the 12 angles separately. Fig. 4, Fig. 5, and Fig. 6 illustrate the expression of the agent's social cues and corresponding robot social generated by the proposed imitation algorithm. Our proposed algorithm precisely transferred the social cues into the robot. The robot obtained similar motion patterns of social cues when compared with the agent expressed motion.

Figure 6.

The figure depicts the human expression of the social cue "I don't know'' in (a)(b)(c), and the robot imitating the social cue precisely is shown in (d)(e)(f).

6. Experimental results

The novel part of the proposed method is its use of simple and accurate mathematical concepts with a few symbolic gestures for generating the whole robot motion. The robot required less computational complexity to precisely generate natural social cues. The generated robot social cues are commensurate to the patterns of the agent's social cues, and these can be validated by comparing the body angle data of the robot with the actual human body angle data (refer to Fig.7Fig.10 for a further description of the proposed algorithm).

Fig.7 illustrates the left hand front/rear angle, and right elbow angle (Fig. 8) for the "pointing gesture.'' In the figure, the dashed-line represents the original human angles and the solid-line represents the generated robot angles. In addition, the x-axis represents the time and the y-axis represents the radian values of angle. The pointing gesture has a quiet simple motion when compared to the other social cues. However, the figures substantiated our claim that the robot-generated social cues had an almost similar pattern as that of the human-agent expressed social cues.

Also, according to the experimental results, some time intervals contained noisy data (see Fig. 7 at time 0.3 < t < 0.4). However, our proposed approach still did not consider these noisy data points in generating the robot's motion. The reason is that we compared the posture dissimilarity values extracted the key symbolic postures which consisted of all 12 body angles.

Figure 7.

An illustration of the motion of human social cue (constitute in dash-line) and generated robot motion (constitute in solid-line). The x-axis represents time and y-axis represents the radian value of angles. The angle of left hand front/rear angle data produced by the robot and the human for the "pointing gesture“.

Figure 8.

The x-axis represents time and y-axis represents the radian value of angles. The motion data of the robot and human (right elbow angle) for expressing the "pointing gesture.''

Also, a similar pattern was shown in Fig. 8 time range in 0.3 < t < 0.4 and 0.3 < t < 0.4. These results support our claim that the noisy data did not have a significant effect on generating accurate robot motion. In order to validate our proposed algorithm, the final social cues were transferred as the "I don't know'' social cue. When carefully analyzing the angle of the right elbow (Fig. 9) and left front/rear (Fig. 10), the robot generated these motions more precisely than the other social cues.

Figure 9.

An illustration of human and generated robot motions when the social cue of the "I don't know'' gesture is expressed. The x-axis represents time and the y-axis represents the radian data of angles. Also, the solid line represents the robot generated motion and the dashed-line represents human motion for angle data of the right elbow.

Figure 10.

The x-axis represents time and the y-axis represents the radian data of angles. Figure represent the angle data for the left hand front/rear angle is shown for social cue of ``I don't know'' gesture is expressed.

The results of our experiment provide further evidence to validate that the noisy data did not have a significant effect on generating the robot motion precisely. This is demonstrated in Fig.11 and Fig.12, which represent the right hand elbow angle (Fig.11), and right hand shoulder twist angle (Fig.12). The data of the angles were obtained when the human demonstrator expressed the "gesture for explaining something attractively.'' Here, the "circle'' symbol represents selected key motion points for the cubic spline in generating robot motions. Furthermore, Fig.12 shows certain noisy data that were not selected as key motion points.

Figure 11.

Visualized data of selected body angles when the human expresses the gesture for "explaining something attractively". The "circle'' symbol represents selected key motion points for generating robot motion through the cubic spline. The x-axis represents time and the y-axis represents the radian of angle data for right hand elbow angle.

Figure 12.

An illustration of selected body angles when the human express "explaning something attractively''. The "circle'' symbol represents selected key motion points for generating robot motion through the cubic spline. The x-axis represents time and the y-axis represents the radian of angle data for right hand shoulder twist angle.

However, when considering the right hand twist angle (Fig.12) separately, that point still represents a point similar to the motion changing point. The concept of our proposed method includes considering and comparing all body angles to determine the key motion points (symbolic postures).

This manifests how our approach is capable of ignoring noised data efficiently. However, our mechanism did not select that point as a motion changing point. Overall, our results showed that the proposed imitation algorithm was able to generate the robot's social cues precisely, which corresponds to the agent's social cues, except during certain small time intervals.

7. Conclusion

In this paper, we presented a framework to transfer the natural gestural behaviors of a human agent to a robot through a robust imitation algorithm. The novelty of our proposed algorithm is the use of symbolic postures to generate the gestural behaviors of a robot ithout using any training data or trained model. The idea behind using symbolic postures is that a robot is flexibly able to generate its own motion.

The main challenge in robot imitation is identifying the changing points of motion direction at each time interval. In our approach, we estimated the changing points of motion direction through posture dissimilarity values and reinforcement learning at each time interval.

The image processing-based method obtained some noisy data that estimated the position of the colored patches. The noisy data did not have a significant effect on the accurate generation of the robot's motion, which was due to the fact that the imitation algorithm generated the robot's motion through only a small number of symbolic postures. Overall, the experimental results revealed that the proposed imitation algorithm imitated the human gestural behaviors quite accurately, except during only a few time intervals.


This research has been supported by both the Grant-in-Aid for Young Scientists (B)(19700477) from the Japan Society for the Promotion of science (JSPS) and the Grant-in-Aid for Sustainable Research Center of the Ministry of Education, Science, Sports and Culture of Japan.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

P. Ravindra S. De Silva, Tohru Matsumoto, Stephen G. Lambacher, Ajith P. Madurapperuma, Susantha Herath and Masatake Higashi (January 1st 2010). The Extraction of Symbolic Postures to Transfer Social Cues into Robot, Intelligent and Biosensors, Vernon S. Somerset, IntechOpen, DOI: 10.5772/7123. Available from:

Embed this chapter on your site Copy to clipboard

<iframe src="" />

Embed this code snippet in the HTML of your website to show this chapter

chapter statistics

1219total chapter downloads

3Crossref citations

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

In-Vitro Magnetoresistive Biosensors for Single Molecular Based Disease Diagnostics: Optimization of Sensor Geometry and Structure

By Seongtae Bae

Related Book

First chapter

Biosensor for Environmental Applications

By Andrea Medeiros Salgado, Lívia Maria Silva and Ariana Farias Melo

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More about us