Open access peer-reviewed chapter

Methods for Real-time Emotional Gait Data Collection Induced by Smart Glasses in a Non-straight Walking Path

Written By

Nitchan Jianwattanapaisarn, Kaoru Sumi and Akira Utsumi

Submitted: 14 May 2022 Reviewed: 26 August 2022 Published: 10 October 2022

DOI: 10.5772/intechopen.107410

From the Edited Volume

Intelligent Video Surveillance - New Perspectives

Edited by Pier Luigi Mazzeo

Chapter metrics overview

87 Chapter Downloads

View Full Metrics

Abstract

Emotion recognition is an attractive research field because of its usefulness. Most methods for detecting and analyzing emotions depend on facial features so the close-up facial information is required. Unfortunately, high-resolution facial information is difficult to be captured from a standard security camera. Unlike facial features, gaits and postures can be obtained noninvasively from a distance. We proposed a method to collect emotional gait data with real-time emotion induction. Two gait datasets consisting of total 72 participants were collected. Each participant walked in circular pattern while watching emotion induction videos shown on Microsoft HoloLens 2 smart glasses. OptiTrack motion capturing system was used to capture the participants\' gaits and postures. Effectiveness of emotion induction was evaluated using self-reported emotion questionnaire. In our second dataset, additional information of each subject such as dominant hand, dominant foot, and dominant brain side was also collected. These data can be used for further analyses. To the best of our knowledge, emotion induction method shows the videos to subjects while walking has never been used in other studies. Our proposed method and dataset have the potential to advance the research field about emotional recognition and analysis, which can be used in real-world applications.

Keywords

  • emotion induction
  • emotion recognition
  • gait analysis
  • motion capturing
  • smart glasses
  • non-straight walking behavior
  • emotional movies
  • watching video while walking

1. Introduction

Intelligent video surveillance research gains a lot of interests by the public. In this study, an example of applications was conducted to show the potential of monitoring human behaviors from their movements. The authors have conducted some studies to analyze the characteristics of individuals. In order to conduct research about recognition of human emotions, the authors proposed a research environment and research method in which human emotions can be changed in real time using video stimuli, and experiments on emotion recognition can be performed using our proposed environment and method. Recognizing human emotions is very useful in several circumstances, for example, improvement of human-robot interaction experiences, suspicious behaviors detection for crime and altercation prevention, customer satisfaction evaluation, students’ engagement evaluation. These are some examples of applications that can improve the quality of life for humans.

Affective computing [1] is a specific research field that was emerged because of the popularity of emotion analysis research, which attempts to make a computer to be able to understand and generate human-like affects. There are many studies related to affective computing proposed during recent years. A good example is about the online exercise program for students to practice their programming skills. Affective computing technique was applied to an online exercise program by analyzing the emotion of students as well as their performance in each task. Then, an animated agent is used to interact with students during the exercise so the students can interact with the agent. This method can improve students’ experiences and performances at the same time [2].

Another good example is about surveillance and security application, which is related to intelligent video surveillance topic of this book. A survey conducted by [3] reveals that gait analysis is very useful for crime prevention applications. In recent years, a CCTV camera system is now a standard equipment, which is installed in almost every public places. Human gaits can be analyzed in a very short time due to the advancement of computer vision and machine learning technology together with the help of on-board computation devices. Therefore, suspicious behaviors can be detected promptly. According to a study [4], smart video surveillance is very useful for many applications by applying gait analysis techniques such as human identification, human re-identification, and forensic analysis because human gaits can be obtained from far away without the subjects’ awareness or cooperation.

In emotion recognition and prediction aspect, in the past, these applications were performed by human observers [5]. Unfortunately, using human observer to judge the emotions of other subjects is time consuming. Humans as the judges are not consistent enough to be used in reality. As a result, automatic emotion methods have been developed. Most publicly available methods for emotion recognition nowadays are performed using facial expression. Facial features perform very well in some situations. However, there are still some limitations of using facial features to perform emotion recognition. If the facial images or videos must be captured in a crowded and noisy environment, it is difficult to capture high-quality facial features since a standard security camera cannot perform well enough. Particularly, when the subject is not facing forward with the camera, facial features are difficult to be obtained accurately. Moreover, some subjects wear eyeglasses and sunglasses, or have beard and mustache, which also prevent emotion analysis from facial features to be performed effectively. Therefore, if facial features cannot be clearly captured, other features should be used instead to make emotion recognition and analysis more practical for real-world uses.

Gaits and postures are the way that human body moves and poses while they are walking. This kind of expressions can be observed from distance without awareness of subjects. Also, it can be captured without the need for high-resolution images or videos. Thus, gaits and postures are very good expressions of a human that can be used for emotion recognition and analysis. There are several applications that can be performed effectively and accurately using human gaits and postures such as human identification [6, 7] human re-identification [8], human age estimation, and gender recognition [9, 10]. From many studies proposed, these prove that human gait and posture are very appropriate features for prediction and recognition of human emotions [5, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20].

The objective of this study is to propose a method for emotional gait data collection using a novel method to induce subjects’ emotions when the subjects are walking in a non-straight walking direction, which is an unconventional walking path and cannot be found in other studies. This study proposed a method and environment to perform gait data collection in different emotions. Microsoft HoloLens 2 smart glasses were used for displaying the emotion induction videos to the participants while they are walking. OptiTrack motion capturing system was used to record the walking data of subjects. Although we used the OptiTrack, which is a marker-based devices, to record body movements while walking, other marker-less motion capture devices such as Microsoft Kinect or Intel RealSense can be also used instead of OptiTrack. Also, because of the advancement of pose-estimation software, for example, OpenPose or wrnch AI, any video cameras can be used to capture human gaits for gait analysis.

Advertisement

2. Related work

Many studies on emotion recognition were proposed in recent years because of their usefulness. Most of them were performed using facial features, which are sufficiently accurate in some situations. However, facial features still have some limitations as discussed in the previous section. Even though there are many studies that used human gaits and postures as features for emotion recognition, the number is still fewer than the studies using facial features.

A survey conducted by [21] investigated several studies about gait analysis, not only for emotion recognition but also for human identification. They found that characteristics of human walking are different in different emotions. This information can be used for development of automatic emotion recognition. In comparison with other biometrics, for example, speech features, facial features, physiological features, they found that using gaits has many advantages. For instance, gaits can be observed without subject’s awareness from afar, imitation of gaits are very difficult, and subject’s cooperation is not required to obtain human gaits. Hence, gaits are very powerful expressions, which can be used to perform automatic emotion recognition. We are going to mention only about the equipment that can be used for collection of gait data and the results, which shows the effectiveness of emotion prediction from gaits. From this survey study, several devices can be used to capture gait data. For instance, a force plate can be used to record velocity and pressure data [11], velocity data can be recorded well by using an infrared light barrier system [11, 22], motion capturing devices such as Vicon are a good tool to record the coordinates of body parts by attaching the markers on subjects’ body [12, 13, 14, 15, 23, 24, 25], wearable devices such as smart watches equipped with accelerometer as well as smart phones can be used to record body movements data to use for gait analysis [18, 19, 20], and Microsoft Kinect is also an effective tool for recording human skeleton without the need of markers to be attached on subjects’ body [6, 7, 8, 9, 16, 17, 26]. Some findings are useful for future studies are as follows. When the subjects feel happy, they step faster [5] and their strides are longer [27]. Also, their joint angles amplitude [27] and arm movements [12] increase. When the subjects feel sad, their arm swings decrease [5], and their limb shape and torso shape [24] are contracted. Their joint amplitude also decreases [12].

Nowadays, there are several studies about gait analysis proposed. Some examples include emotion prediction [11, 21], mental illness prediction [22, 23], human identification or re-identification [6, 7, 8], and gender prediction [9, 10]. Several tools can be used for gait data collected as already mentioned, for example, light barrier, force plate, video camera, accelerometer, motion capturing system. From this equipment, we focus on the equipment that captures the coordinates of body parts or silhouette images of human body since these gait features are sensitive to walking directions. Using straight walking direction usually results in high-quality gait data [11, 12, 14, 15, 16, 17, 19, 22, 23, 26, 28, 29, 30], so most studies used this type of walking direction. There are fewer works that used free-style walking pattern; that is, subjects can choose the walking path any direction the want [6, 7, 8, 9]. By using free-style walking data, the results often lower than using straight walking data, but it increases the opportunities to deploy the proposed methods in reality. Since human walking in public spaces is always lack of awareness for being observed and the walking pattern cannot be controlled to be a straight walking path. That is, to collect the straight walking gait data in real-world environment is more difficult than random direction walking data.

In this study, we decided to use the latest technology smart glasses called Microsoft HoloLens 2 to display the emotional videos to the subjects while they are walking. Therefore, we concerned some issues including the interference of smart glasses to human gaits while walking, negative effects such as trips and slips are also important to be considered. Some studies were performed on this topic, and they are useful for our study. For example, a study proposed by [31] performed an investigation of gait performance while the subjects use head-worn display during walking. Experiment was done using 12 subjects to check whether the subjects can walk normally in different conditions. Several factors were assessed, that is, walking speed and obstacle crossing speed, required coefficient of friction, minimum foot clearance, and foot placement location around the obstacle. From this study, they found that using head-worn display to perform tasks while walking has no effect with level walking performance when comparing with using a paper list and with baseline walking that used nothing. For obstacle crossing experiment, they found that the subjects choose more cautious and more conservative strategy to cross the obstacle if they are using the head-worn display. Obstacle crossing speed also decreases by 3% when compared with the baseline walking. Besides, using head-worn display does not affect with foot placement location around the obstacle.

Other useful studies that investigated the negative effects on human gaits when using head-worn display are [32, 33]. They performed experiments to find out the adverse effect when the subjects use head-worn display while walking. They asked 20 subjects (10 men and 10 women) to walk in four different conditions on a treadmill. Subjects were asked to perform one single-task walk (walking and do nothing) and three dual-task walks (walking and perform attention-demanding tasks). Dual-task walks were conducted by different display types including paper-based, smart phone, and smart glasses for displaying information to the subjects while they walk. Attention-demanding tasks include Stroop test, categorizing, and arithmetic. The subjects use head-down posture while they performed tasks on paper-based display and on smart phone. In single-task walking and in dual-task walking using smart glasses, they use head-up posture. Vicon motion capture system with seven cameras was used in their experiments. The results of their studies reveal that walking while using smart glasses to perform attention-demanding tasks has more impacts with gait performance, for example, gait stability in comparison with walking while performing attention-demanding tasks on other display types. The important finding from their studies is that the subjects are more unstable if they use smart phone and paper-based display to perform tasks while walking than using smart glasses. This means that the head-up and head-down postures affect with human gaits.

From reviewing of related works, Microsoft HoloLens 2 was confirmed that it can be used for displaying videos as the subjects can use head-up posture while walking, and they can also see the room environment while watching videos since the HoloLens 2 display is transparent. Even though there could be some negative impacts such as walking stability or obstacle crossing strategy, we cope with these issues by asking our subjects to take rehearsal walks to make them familiar with using HoloLens while walking and with the walking area before performing the actual recording walks. About the obstacle, our walking space is very clear so there should be no problem with using HoloLens 2 while walking.

Advertisement

3. Data collection

Gait data collection method described in this study has been proposed by us [34]. The data collection method we used is as follows. Since most studies in emotion recognition and analysis using gaits and postures were performed by asking subjects to walk straightly on the pathway or on the treadmill, we found that walking in a straight line will result in cleaner gait data. However, it will be more difficult to be implemented in reality. In emotion induction aspect, there are some techniques widely used as follows. First, subjects are asked to walk while recalling their own personal experiences according to assigned emotions. Second, subjects are not normal people but professional actors. Third, subjects are asked to watch an emotional video on a conventional screen such as television or computer display before they start walking.

With these settings, it is possible that some problems can occur. In the first method, subjects may not be able to recall their memories well enough to express the desired emotions on their gaits and body movements. In the second method, using professional actors instead of normal people can make the gaits too exaggerate and not natural. In the last method, it is possible that the induced emotions will not last until the end of walking because the video stimuli end before the subjects start walking. These issues can make the collected gait data incorrectly reflect human emotions and the relationship between collected gaits and emotions will be inaccurate.

In order to solve this problem, our experiments were designed to make the subjects watch a video for emotion induction and walk at the same time to record real-time emotion of subjects. Since there is a latest smart glasses technology named Microsoft HoloLens 2 available for consumer uses, we decided to use HoloLens 2 for displaying emotional videos to the subjects while they are walking. With this method, subjects can watch the stimuli at the same time they walk; hence, their emotions will be constantly and consistently induced. As of now, to the best of our knowledge, there is no other researcher who used this method before. Because of the transparent display of HoloLens 2, subjects can see the walking space and the room environment at the same time while they walk. We also expected that showing videos to the subjects during walking will be more similar to when subjects walk in real life and see some situations that make their emotion change in real time according to those situations. This emotion induction method is planned to simulate the subject’s real-time emotion. Also, because we showed the videos at the same time of walking, we can ensure that the induced emotions will be more stable, more consistent, and last until the end of the walk.

For the walking direction, because our subjects have to watch the videos for emotion induction and walk in the walking space at the same time, allowing them to walk freely without path guidance at all can be too difficult for them. Because subjects need to concentrate with the content of the videos, if they also have to select the walking path while walking, it is possible that they will not be able to focus on the videos well enough and the emotion induction will not be effective. Consequently, we asked the subjects to walk in a circular pattern without guidance line on the floor. That is, subjects can walk in lax circular path, clockwise, or counter-clockwise depending on their own preferences. With this walking direction, subjects can walk like an oval shape or like a rounded-rectangle shape as they want. Therefore, we can collect both straight walking and non-straight walking in one walking trial.

3.1 Equipment for data collection

Motion capturing devices can be categorized into two main types. First, marker-less type, which is easy to setup and requires nothing to be attached on subject’s body. Second, marker-based type, which requires several markers to be attached on subject’s body. Differences between these two types are marker-less type uses image processing and machine learning technology to predict the positions of body parts from depth image and color image captured from build-in cameras, while marker-based type requires several cameras to be installed, and the actual position of each marker is calculated from the reflection of infrared light captured by all cameras. This means that the position of each marker is reconstructed from all cameras data to obtain a coordinate in three-dimensional space. This makes the marker-based motion capturing device more accurate but also more difficult to setup, whereas marker-less device such as Microsoft Kinect is much easier to setup and use in any situation.

In this study, we decided to use OptiTrack, which is a famous marker-based motion capturing system to capture human gaits. Fourteen OptiTrack Flex 3 cameras were installed around the recording space, and OptiTrack Entertainment Baseline Markerset consisting of 37 markers was used in our experiments. Table 1 lists all marker names, and Figure 1 shows the position of each marker on human body.

HeadTop
HeadFront
HeadSide
BackTop
Chest
BackLeft
Right
WaistFrontLeft
Right
WaistBackLeft
Right
ShoulderBackLeft
Right
ShoulderTopLeft
Right
ElbowOutLeft
Right
UpperArmHighLeft
Right
WristOutLeft
Right
WristInLeft
Right
HandOutLeft
Right
ThightFrontLeft
Right
KneeOutLeft
Right
ShinLeft
Right
AnkleOutLeft
Right
ToeOutLeft
Right
ToeInLeft
Right

Table 1.

List of OptiTrack baseline markers.

Figure 1.

Position of each marker on the body (Human Figure Source: https://sketchfab.com/3d-models/man-5ae6bd9271ac4ee4905b96e5458f435d).

3.2 Recording environment

The black tape was used for marking a rectangle walking area on the floor as shown in Figure 2. Inside the rectangle is the area that OptiTrack can capture. The size of this walking space is 2.9 by 3.64 meters. Fourteen OptiTrack Flex 3 cameras were installed on seven camera stands, and each stand was placed around the walking space as illustrated in Figure 3. In other words, one camera stand has two OptiTrack Flex 3 cameras installed at different height levels, one on higher level and another on the lower level as shown in Figure 4.

Figure 2.

Rectangle walking area marked with black tape on the floor.

Figure 3.

Position of each camera and dimension of the walking area.

Figure 4.

Two OptiTrack Flex 3 cameras installed on each camera stand at different height levels.

3.3 Materials for data collection

Three videos were selected as stimuli to induce subject’s emotion. HoloLens 2 was used for displaying these videos to each subject while he or she is walking circularly in the recording area.

  • Neutral Video: The nature landscape video from YouTube named Spectacular drone shots of Iowa corn fields uploaded by the YouTube user named The American Bazaar (https://www.youtube.com/watch?v=4R9HpESkor8)

  • Negative Video: An emotional movie selected from LIRIS-ACCEDE database named Parafundit by Riccardo Melato

  • Positive Video: An emotional movie selected from LIRIS-ACCEDE database named Tears of steel by Ian Hubert and Ton Roosendaal

The Neutral video was selected from nature landscape videos on YouTube that should not induce any emotion. Positive video (for inducting happy emotion) and negative video (for inducting sad emotion) were selected from a public annotate movie database named LIRIS-ACCEDE (https://liris-accede.ec-lyon.fr/). This database was published by [35]. It consists of several movies and their emotion annotations in valence-arousal dimension. All movies in this database are published under the Creative Commons license. In our study, we selected two movies from the Continuous LIRIS-ACCEDE collection. We found that most movies contain both positive valence and negative valence. As we would like to make an entire walking trial to contain only one emotion, we selected one movie with only positive valence and another movie with only negative valence annotation. As each subject needs to walk when the movie starts until the movie ends, all movies we selected must not be too long. In our opinion, less than 15 minutes in length is acceptable. The lengths of the neutral video, negative movie, and positive movie are 5:04, 13:10, and 12:14 minutes, respectively. Sample plots of valence score for annotated movies are shown in Figure 5, and plots of the negative and positive movie we used are shown in Figure 6. Neutral video has no sound at all to ensure that it will not induce any emotion. Positive video and negative video contain music, sound effects, and conversation in English. Subjects can hear the audio from stereo speakers, which are build-in with the HoloLens 2.

Figure 5.

Valence plots of sample movies.

Figure 6.

Valence plots of negative movie (parafundit) and positive movie (tears of steel) we selected.

3.4 Methods for data collection

Before participating in our experiments, we kindly asked our participants to answer the health questionnaire and signed the consent form. Questions in the health questionnaire are as follows.

  1. Do you have any neurological or mental disorders?

  2. Do you have a severe level of anxiety or depression?

  3. Do you have hearing impairment that cannot be corrected?

  4. Do you have any permanent disability or body injury that affects your walking posture?

  5. Do you feel sick now? (e.g., fever, headache, stomachache)

  6. If you have any problem with your health condition, please describe it.

According to this questionnaire, any subject who had a health issue could be excluded from participation. However, in this study, all subjects confirmed that they were healthy.

For the first dataset we proposed in [1], only health questionnaire listed above was used. In addition, for the second dataset proposed in this study, more questions were added to check for the dominant hand, dominant foot, and dominant brain side of each subject.

The dominant hand of each subject was determined using the modified version of Flinders Handedness survey questions published by Left Handers Association of Japan available online at https://lefthandedlife.net/faq003.html. All questions on this website were translated into Japanese and to make the questions more appropriate with Japanese culture. The questions of dominant hand questionnaire and their English translation are as follows.

  1. 文字を書くとき、どちらの手でペン(筆記具)を持ちますか?

    When writing, which hand do you hold a pen (writing instrument)?

  2. 食事をするとき、どちらの手でスプーンを持ちますか?

    When you eat, which hand do you hold the spoon?

  3. 歯を磨くとき、どちらの手で歯ブラシを持ちますか?

    When brushing your teeth, which hand do you hold your toothbrush?

  4. マッチを擦るとき、どちらの手でマッチ棒を持ちますか?

    When you rub a match, which hand do you hold the matchstick with?

  5. 消しゴムで文字や図画を消すとき、どちらの手で消しゴムを持ちますか?

    When erasing letters and drawings with an eraser, which hand do you hold

  6. お裁縫をするとき、どちらの手で縫い針を持ちますか?

    When sewing, which hand do you hold the sewing needle?

  7. 食卓でパンにバターを塗るとき、どちらの手でナイフを持ちますか?

    When you put butter on bread at the table, which hand do you hold the knife?

  8. 釘を打つとき、どちらの手で金づち(ハンマー)を持ちますか?

    When you hit a nail, which hand do you hold a hammer?

  9. ジャガイモやりんごの皮をむくとき、どちらの手でピーラー (皮むき器) を持ちますか?

    When peeling potatoes or apples, which hand do you hold a peeler?

  10. 絵を描くとき、どちらの手で絵筆やペンを持ちますか?

    When drawing, which hand do you hold a paintbrush or pen?

In each question, subjects can choose for left hand, right hand, and both hands. The score for each question is −1, +1, and 0 for left hand, right hand, and both hands, respectively. Total score for all questions was calculated for each subject to check the dominant hand of that subject. If the total score is −10 to −5, the subject is classified as left-handed. If the total score is −4 to +4, the subject is classified as both-handed, and if the total score is +5 to +10, that subject is right-handed.

Additionally, another questionnaire for checking the dominant foot for each subject was also used. Dominant foot was determined by using Chapman et al.’s Foot Dominant test questions, which are translated into Japanese language. The questions are available in Japanese at https://blog.goo.ne.jp/lefty-yasuo/e/37149f8d3105e9b43aa58c5925024915. The questions in Japanese and English translation are as follows.

  1. サッカーボールを蹴る

    Which foot do you use to kick a soccer ball?

  2. 缶を踏みつける

    Which foot do you use for trampling the can?

  3. ゴルフボールを迷路に沿って転がす

    Which leg do you use to roll a golf ball along the maze?

  4. 砂に足で文字を書く

    Which foot you use to write letters on the sand?

  5. 砂地をならす

    Which foot do you use to smooth the sand?

  6. 小石を足で並べる

    Which foot do you use to arrange the pebbles?

  7. 足先に棒を立てる

    Which foot do you use to put a stick on your toes?

  8. ゴルフボールを円に沿って転がす

    Which foot do you use to roll the golf ball along the circle?

  9. 片足跳びをできるだけ速くする

    Which foot do you use to make one-legged jumps as fast as possible?

  10. できるだけ高く足を蹴上げる

    Which foot do you use to kick your feet as high as you can?

  11. 足先でこつこつリズムをとる

    Which foot do you use to take a rhythm with your feet?

The dominant foot was judged by checking the total score. If the subject’s answer is left foot, the score is 3 points, right foot is 1 point, and both feet score is 2 points. In total, if the total score is 28 points or more, that subject was judged as left-footed. If the total score is less than 28 points, that subject was classified as right-footed.

Another questionnaire is for checking the dominant brain side. There are many dominant brain test questions available. In this study, we selected the arm and hand folding questions for testing the dominant brain side. There are two questions in this questionnaire. Subjects selected a picture that is matched with them for each question. The questions and pictures are from https://www.lettuceclub.net/news/article/194896/. Both questions are shown in Japanese and English as follows.

  1. 自然に腕を組んでください。どのようになりましたか?

    Please fold your arm naturally. Which picture match with you?

  2. 自然に手を組んでください。どのようになりましたか?

    Please fold your hand naturally. Which picture match with you?

Subjects were asked to select the picture of arm folding and hand folding that match with them. Hand folding test was used for testing the input brain, and arm folding test was used for testing the output brain. For hand folding, if the thumb of the right hand is below, the input brain is right side. If the thumb of the left and is below, the input brain is left side. For arm folding, if the right arm is below, the output brain is right side. If the left arm is below, the output brain is left side. The pictures for the subjects to select in the questionnaire are shown in Figure 7.

Figure 7.

Arm and hand folding test questions (Source: https://www.lettuceclub.net/news/article/194896/).

After finishing the health questionnaire, informed consent, dominant hand questionnaire, dominant foot questionnaire, and dominant brain side questionnaire, each subject was instructed to walk in a circular pattern inside the walking area marked by the black tape on the floor. Subjects are free to choose the direction they want to walk between clockwise or counter-clockwise. Also, subjects could switch the direction anytime when they want during each walking trial. The following are all walking trials each subject was asked to walk.

  1. Walk in the rectangle walking area for 3 minutes as a rehearsal walk

  2. Wear HoloLens 2 that showed nothing and walk in the walking area for 3 minutes as another rehearsal walk

  3. Watch neutral video on HoloLens 2 while walking in the rectangle walking area

  4. Watch the first emotional video (positive/negative video) on HoloLens 2 while walking in the rectangle walking area

  5. Watch the second emotional video (negative/positive video) on HoloLens 2 while walking in the rectangle walking area

The intention of the first rehearsal walk is to make the subjects to be familiar with the room environment and the walking space. For the second rehearsal walk with HoloLens 2 showing nothing, as we found from [31, 32, 33], if the subjects have never used smart glasses before, gait performance can be unstable. Therefore, we asked each subject to take another rehearsal walk to make the subject to feel familiar with walking and wearing HoloLens 2. After two rehearsal walks, we showed the neutral video on HoloLens 2 and ask each subject to start walking when the video starts and stop walking when the video stops. Then, we showed the first emotional video on HoloLens 2 and asked the subjects to walk in the same procedures as the first video. After this emotional video ended, we asked the subjects to go for a break for 10 minutes to reset their emotion to normal condition. Finally, we showed the second emotional video on HoloLens 2 and asked the subjects to walk while watching the last video. The first emotional video and the second emotional video were swapped between positive video then negative video, and negative video then positive video. Overall process for data collection of the first dataset is shown in Figure 8. For the second dataset, questionnaire for dominant hand, dominant foot, and dominant brain side was conducted after the health questionnaire and before the first rehearsal walk.

Figure 8.

Data collection process.

Furthermore, subjects were asked to report their perceived emotion after finishing neutral walk, positive walk, and negative walk. The questions are as follows.

  • Please choose your current feeling: Happy, Sad, Neither (Not Sad & Not Happy)

  • How intense of your feeling: 1 (Very Little) to 5 (Very Much)

In the first dataset, only self-reported emotion questionnaire was used after neutral walk, negative walk, and positive walk. In the second dataset, we added another question after the last self-reported questionnaire, that is, after finishing the last walking trial. As we are unsure that the subjects can walk naturally while they are watching videos on HoloLens 2 or not, we added a question asking them whether they can walk naturally while using HoloLens 2 and asked them to explain the reason.

Sample screenshots of a subject walking in circular pattern while watching a video on HoloLens 2 are shown in Figure 9. A sample image of a subject wearing HoloLens 2 and OptiTrack motion capturing suit with markers is shown in Figure 10.

Figure 9.

Samples of walking in the recording area.

Figure 10.

A Subject Wearing HoloLens 2 and OptiTrack Motion Capture Suit with 37 Markers.

Advertisement

4. Result and discussion

Two emotional gait datasets were collected. The first dataset proposed in [1] contains 49 subjects including 41 men and 8 women. The average age is 19.69 years with 1.40 years standard deviation. The average height is 168.49 centimeters with 6.34 centimeters standard deviation. The average weight is 58.88 kilograms with 10.84 kilograms standard deviation. In total, there are 147 walking trials in this dataset. As the order of emotional videos shown to each subject was swapped, we have 24 subjects watched negative movie before positive movie (neutral -- > negative -- > positive), and 25 subjects watched positive movie before negative movie (neutral -- > positive -- > negative). For the emotion perceived by the subjects from the self-reported emotion questionnaire, there are 44 sad walking trials, 44 happy walking trials, and 59 neither walking trials. Comparison between expected emotion, which is the annotated emotions of the videos (negative, positive, neutral), and the reported emotion, which is the emotion reported by the subjects after finished walking (happy, sad, neither), is shown in Table 2 and Figure 11.

Stimuli\Reported EmotionHappySadNeither
Positive Movie122314
Negative Movie131917
Neutral Movie19228

Table 2.

Comparison of expected emotion and reported emotion (first dataset).

Figure 11.

Plots of expected emotion and reported emotion (first dataset).

As we also performed another data collection in addition to the first dataset, our second dataset contains 23 subjects including 10 men and 13 women. The average age is 19.91 years. The standard deviation of age is 3.04 years. The average height is 164.93 centimeters, and the standard deviation of height is 9.58 centimeters. The average weight is 57.32 kilograms, with 11.32 kilograms standard deviation. In total, this dataset consists of 69 walking trials. The order of emotional videos shown to each subject was also swapped as same as the first dataset. In this dataset, there are 12 subjects who watched negative movie before positive movie (neutral -- > negative -- > positive), and 11 subjects who watched positive movie before negative movie (neutral -- > positive -- > negative). Reported emotion that the subjects perceived compared with expected emotion is listed in Table 3 and Figure 12. In this dataset, we also collected the dominant hand, dominant foot, and dominant brain side. The results of these questionnaire are as follows.

  • Dominant Hand: 10 left-handed subjects, 8 right-handed subjects, and 5 both handed subjects

  • Dominant Foot: 8 left-footed subjects, 15 right-footed subjects

  • Dominant Brain Side: 3 left-input/left-output subjects, 7 left-input/right-output subjects, 7 right-input/left-output subjects, and 6 right-input/right-output subjects

Stimuli\Reported EmotionHappySadNeither
Positive Movie1076
Negative Movie0194
Neutral Movie1238

Table 3.

Comparison of expected emotion and reported emotion (second dataset).

Figure 12.

Plots of expected emotion and reported emotion (first dataset).

Dominant hand, dominant foot, and dominant brain side data will be useful in the future when this dataset is used for emotion recognition and analysis of body movements is performed.

According to Table 2 and Figure 11 that show the comparison of expected emotion and reported emotion for the first dataset, we found that not all subjects feel the emotions we want them to feel. That is, positive video cannot make everyone feel happy, and negative video cannot make everyone feel sad. From positive video, the number of subjects who feel sad is almost twice from the subjects who feel happy; that is, 12 subjects feel happy while 23 subjects feel sad. For negative video, more subjects feel sad than happy; that is, 19 subjects feel sad while 13 subjects feel happy. For neutral video, the results are quite random since we intended to make it not inducing any emotion. Therefore, most subjects feel neither for neutral video. Other reported emotions of neutral video including happy and sad can occur because of other causes. For example, if the subjects have never used HoloLens 2 before, walking while watching a video on HoloLens 2 can make them feel happy, sometimes, if the subjects feel uncomfortable while walking and watching a video on HoloLens 2 at the same time, it is possible that they will feel sad after watching neutral video.

For the second dataset, Table 3 and Figure 12 show that the reported emotions for positive movie are 10 subjects feel happy and 7 subjects feel sad, which are not so different. These results reveal that the emotion induction for positive movie is not so effective. In the first dataset, a lot more subjects feel sad after watching positive movie, which is opposite to the emotion we want them to feel. In the second dataset, more subjects feel happy than feel sad after watching positive movie. But the numbers are not much different. This still means that emotion induction for positive video as stimuli is not effective enough even though the number of happy subjects is higher than sad subjects unlike the first dataset. Next, when we consider the negative movie, in the first dataset, 13 subjects feel happy and 19 subjects feel sad. In comparison with the second dataset, no one feels happy but 19 subjects feel sad. The results show that emotion induction for negative video in the second dataset is more effective than the first dataset although the stimulus we used for negative video is the same movie. One possible reason is that the subjects in the second dataset are more sensitive to negative movie than the subjects in the first dataset. Moreover, neutral video for both dataset results in random reported emotion for both datasets. This is the good outcome showing that the neutral video did not induce any emotion as we expect this video to be.

From all results of comparison between expected and reported emotions, we can see that the reported emotions are not similar to the expected emotions, which are the annotated emotions of the video stimuli. There are several possible causes; for instance, some subjects might be more sensitive to feel sad when they saw some stories. That is, some subjects can feel very sad, whereas other subjects can feel little sad, not sad and not happy (neutral), or feel happy when they saw the negative movie. This phenomenon is normal since different people have different emotion perception. This explanation is also valid for positive movie. Although the annotated emotion of this movie is positive, some subjects can feel happy while other subjects feel sad. Another possible reason is that, sometimes, the subjects cannot fully understand the contents of the movies because they watched the movies and walked at the same time. Therefore, subjects need to concentrate on walking in addition to watching movies. This makes some subjects cannot completely understand the content of the movies, so the reported emotion is opposite from the desired emotion we want them to feel. Other possible explanation is that some components or stories in the positive movie can make some subjects feel sad. For example, some music or video scenes might be very intense and some subjects are sensitive to these contents. This explanation is still related to the previous one. Additionally, individual preferences are also importance issues that should be considered. For example, if some subjects do not like sci-fi movie that we used as the positive stimuli, that movie can make them feel sad because this movie is the kind they do not like. Music soundtrack of the movies is also the reason that makes the perceived emotions different from the emotion we expected. If the subjects like the music, they can feel happy even the movie is negative movie, and if the subjects do not like the music, they can feel sad even the movie is positive movie. Lastly, if the subjects did not feel well when they watch the movie during walking, for example, some subjects feel motion sickness, or some subjects feel bored, the perceived emotions will be inaccurate and different from the emotions we expected. Because of this reason, we asked the subjects whether they can walk naturally or not after they finished walking.

In the first dataset, we did not have this information. But for the second dataset, seven subjects answered that they can walk naturally, eight subjects answered that they cannot walk naturally, and eight subjects answered they are unsure. If we consider their explanation, the subjects who answered they can walk naturally have positive feedbacks. The followings are some examples.

  • 映像に集中していたから

    Because I was concentrating on the video

  • 歩きにくさを感じなかったから

    Because I did not find it difficult to walk

  • 飽きなかったから!

    I did not get tired of it!

  • 視界が完全に隠されてたわけではなかったため。あしもとは見えてたので比較的歩きやすかった。

  • Because the view wasn’t completely hidden. I could see the foot, so it was relatively easy to walk.

Unfortunately, the subjects who answered that cannot walk naturally have negative feedbacks with walking while watching videos on HoloLens 2. These are some examples.

  • 音や映像に気を取られたから

    Because I was distracted by the sound and image

  • 途中でフラついたり、まっすぐ歩けなかったりしたからです。

    Because I was fluttering on the way and I could not walk straight

  • 映像に集中していて、たまに枠線を超えそうになったから。

    Because I was concentrating on the video and sometimes I almost crossed the border.

  • 歩く範囲が小さいため

    Because the walking range is small

Even some subjects answered that they feel unsure, some feedbacks are also negative.

  • 時々眠たかったから

    Sometimes I wanted to sleep

  • 映像を見ながら歩くのが少し難しかった

    It was a little difficult to walk while watching the video

  • 枠外には出なかったが、円状を単調に歩いていたので、目がくらんで、不自然に歩いていたかもしれないから。

    I did not go out of the frame, but I was walking monotonously in a circle, so I might have been dazzled and walked unnaturally.

There are some positive feedbacks from unsure subjects. These answers show that they are walk unconsciously so we can collect the very natural walking styles.

  • 自分がどう歩いていたか気にしていなかった

    I did not care how I was walking

  • 何も考えていなかったため、自然だったかはわからないです

    I did not think about anything, so I do not know if it was natural

From these answers, we can see that some subjects have difficulties with watching videos on HoloLens 2 while walking at the same time. These are very reasonable explanations why emotion induction is not effective, and the reported emotions are much different from expected emotions. In the first dataset, we did not collect these data, hence, we cannot know that the subjects can walk naturally or not, and we cannot know how they feel after watching videos on HoloLens 2 while walking, but for the second dataset, we have these data, and they are very useful.

By using Microsoft HoloLens 2 for displaying emotional videos, there are several things we found and worth for consideration.

  • This kind of emotion induction method can be used to simulate real-time emotion. However, using entire movies is not so effective.

    • The subjects need to pay too much attention with the movie, so some subjects cannot walk naturally, some subjects have motion sickness, and some subjects feel bored because the movies are too long.

    • If subjects decide to give priority to walking more than watching movies, some subjects cannot fully understand the movie content.

  • HoloLens 2 can be used for showing contents to the subjects to simulate real-time emotion, but showing entire movie is not a good idea because of many negative feedbacks from the subjects.

    • Showing short movies instead of full movie or using some animation as mixed reality agents (VR/AR) should be better ideas to induce real-time emotion of the subjects.

  • Asking subjects to report their perceived emotion using the self-reported questionnaire is very important since the desired emotion we expected can be different from the emotion they reported.

  • Displaying video during walking can induce natural emotion because several subjects are unconscious that their walking postures change or not.

  • Currently, trust the subjects what they feel from the self-reported questionnaire is better than using the emotion tag from the stimuli.

Advertisement

5. Conclusion

To summarize, this study extends our previously proposed emotion induction and data collection method [34]. In conventional emotion induction, emotional videos are shown to the subjects before walking using a computer screen or television. In our method, emotional videos are shown on Microsoft HoloLens 2, which is the latest smart glasses technology. We found that displaying emotional videos on HoloLens 2 while walking can make the subjects express emotions on their gaits unconsciously while walking. Subjects can also see the room environment and the stimuli contents on HoloLens 2 at the same time. Some subjects think it is easy to walk while watching videos on HoloLens 2, while some subjects say that it is difficult to focus on waking and paying attention with video contents on HoloLens 2 at the same time. Our goal of this study is to simulate a real-time emotion while walking, however using full-length movies may not be a good idea because there are some negative feedbacks from our participants. For the walking direction, using a non-straight walking path will make the data collection become more realistic since capturing human gaits in reality is difficult to capture only straight and clean walking data. Therefore, if the emotion recognition system was developed and tested using non-straight walking gait data, opportunities to deploy this system in real-world scenario are increased. Additionally, expected emotions, which are the annotated emotion of the stimuli, should not be used to tag a walking trial since the stimuli emotions can be opposite to the emotions that the subjects perceived. Asking the subjects to report their actual feelings after walking is the best way we can do for now. In this study, OptiTrack motion capturing system was used for capturing gait data, but it is not mandatory to use marker-based systems such as OptiTrack or Vicon. Using marker-less motion capturing device, for example, Microsoft Kinect is also possible. Even a standard video camera or mobile phone camera with pose-estimation software such as OpenPose can also be used to capture body movement data for emotion recognition by gait analysis. In summary, this study investigates the possibility of performing emotion recognition and analysis by using smart glasses to induce emotions of subjects. The results show that emotion recognition from human gaits can be performed and is very useful in many circumstances. Since emotion recognition is an example of tangible applications of intelligence video surveillance, methods for inducing human emotions should be also considered. To develop an effective emotion recognition system as a part of intelligence video surveillance, obtaining a high-quality dataset is an important factor that should be focused on.

Advertisement

6. Acknowledgment

The author would like to thank all participants who joined our both experiments. Additionally, we appreciated the help and support from all members of Kaoru Sumi Laboratory at Future University Hakodate, who supported and assisted in both experiments such as experiment venue setup, equipment setup, experimental design, translation of all documents, and interpretation between Japanese and English during the entire experiments processes.

Advertisement

Funding statement

This work was supported by JST Moonshot R&D Grant Number JPMJMS2011.

References

  1. 1. Picard RW. Affective Computing. MIT press; 2000
  2. 2. Tiam-Lee TJ and Sumi K. Analysis and prediction of student emotions while doing programming exercises. In: International Conference on Intelligent Tutoring Systems. Springer. 2019. pp. 24–33
  3. 3. Bouchrika I. A survey of using biometrics for smart visual surveillance: Gait recognition. In Surveillance in Action. Cham: Springer; 2018. pp. 3-23. DOI: 10.1007/978-3-319-68533-5_1
  4. 4. Anderez DO, Kanjo E, Amnwar A, Johnson S, Lucy D. The rise of technology in crime prevention: Opportunities, challenges and practitioners perspectives. 2021. arXiv preprint arXiv:2102.04204
  5. 5. Montepare JM, Goldstein SB, Clausen A. The identification of emotions from gait information. Journal of Nonverbal Behavior. 1987;11(1):33-42
  6. 6. Khamsemanan N, Nattee C, Jianwattanapaisarn N. Human identification from freestyle walks using posture-based gait feature. IEEE Transactions on Information Forensics and Security. 2017;13(1):119-128
  7. 7. Limcharoen P, Khamsemanan N, Nattee C. View-independent gait recognition using joint replacement coordinates (jrcs) and convolutional neural network. IEEE Transactions on Information Forensics and Security. 2020;15:3430-3442
  8. 8. Limcharoen P, Khamsemanan N, Nattee C. Gait recognition and re-identification based on regional lstm for 2-second walks. IEEE Access. 2021;9:112057-112068
  9. 9. Kitchat K, Khamsemanan N, Nattee C. Gender classification from gait silhouette using observation angle-based geis. In: 2019 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM). IEEE. 2019. pp. 485–490
  10. 10. Isaac ER, Elias S, Rajagopalan S, Easwarakumar K. Multiview gait-based gender classification through pose-based voting. Pattern Recognition Letters. 2019;126:41-50
  11. 11. Janssen D, Schöllhorn WI, Lubienetzki J, Fölling K, Kokenge H, Davids K. Recognition of emotions in gait patterns by means of artificial neural nets. Journal of Nonverbal Behavior. 2008;32(2):79-92
  12. 12. Roether CL, Omlor L, Christensen A, Giese MA. Critical features for the perception of emotion from gait. Journal of Vision. 2009;9(6):15-15
  13. 13. Karg M, Kühnlenz K, Buss M. Recognition of affect based on gait patterns. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2010;40(4):1050-1061
  14. 14. Barliya A, Omlor L, Giese MA, Berthoz A, Flash T. Expression of emotion in the kinematics of locomotion. Experimental Brain Research. 2013;225(2):159-176
  15. 15. Venture G, Kadone H, Zhang T, Grèzes J, Berthoz A, Hicheur H. Recognizing emotions conveyed by human gait. International Journal of Social Robotics. 2014;6(4):621-632
  16. 16. Li B, Zhu C, Li S, Zhu T. Identifying emotions from non-contact gaits information based on microsoft kinects. IEEE Transactions on Affective Computing. 2016;9(4):585-591
  17. 17. Li S, Cui L, Zhu C, Li B, Zhao N, Zhu T. Emotion recognition using kinect motion capture data of human gaits. PeerJ. 2016;4:e2364
  18. 18. Zhang Z, Song Y, Cui L, Liu X, Zhu T. Emotion recognition based on customized smart bracelet with built-in accelerometer. PeerJ. 2016;4:e2258
  19. 19. Chiu M, Shu J, Hui P. Emotion recognition through gait on mobile devices. In: 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE. 2018. pp. 800–805
  20. 20. Quiroz JC, Geangu E, Yong MH. Emotion recognition using smart watch sensor data: Mixed-design study. JMIR Mental Health. 2018;5(3):e10153
  21. 21. Xu S, Fang J, Hu X, Ngai E, Guo Y, Leung V, et al. Emotion recognition from gait analyses: Current research and future directions. arXiv preprint arXiv:2003.11461. 2020.
  22. 22. Lemke MR, Wendorff T, Mieth B, Buhl K, Linnemann M. Spatiotemporal gait patterns during over ground locomotion in major depression compared with healthy controls. Journal of Psychiatric Research. 2000;34(4–5):277-283
  23. 23. Michalak J, Troje NF, Fischer J, Vollmar P, Heidenreich T, Schulte D. Embodiment of sadness and depression—gait patterns associated with dysphoric mood. Psychosomatic Medicine. 2009;71(5):580-587
  24. 24. Gross MM, Crane EA, Fredrickson BL. Effort-shape and kinematic assessment of bodily expression of emotion during gait. Human Movement Science. 2012;31(1):202-221
  25. 25. Destephe M, Maruyama T, Zecca M, Hashimoto K, Takanishi A. The influences of emotional intensity for happiness and sadness on walking. In: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE. 2013. pp. 7452-7455
  26. 26. Sun B, Zhang Z, Liu X, Hu B, Zhu T. Self-esteem recognition based on gait pattern using kinect. Gait & Posture. 2017;58:428-432
  27. 27. Halovic S, Kroos C. Not all is noticed: Kinematic cues of emotion-specific gait. Human Movement Science. 2018;57:478-488
  28. 28. Sadeghi H, Allard P, Duhaime M. Functional gait asymmetry in able-bodied subjects. Human Movement Science. 1997;16(2-3):243-258
  29. 29. Kang GE, Gross MM. Emotional influences on sit-to-walk in healthy young adults. Human Movement Science. 2015;40:341-351
  30. 30. Kang GE, Gross MM. The effect of emotion on movement smoothness during gait in healthy young adults. Journal of Biomechanics. 2016;49(16):4022-4027
  31. 31. Kim S, Nussbaum MA, Ulman S. Impacts of using a head-worn display on gait performance during level walking and obstacle crossing. Journal of Electromyography and Kinesiology. 2018;39:142-148
  32. 32. Sedighi A, Ulman SM, Nussbaum MA. Information presentation through a head-worn display (“smart glasses”) has a smaller influence on the temporal structure of gait variability during dual-task gait compared to handheld displays (paper-based system and smartphone). PLoS One. 2018;13(4):e0195106
  33. 33. Sedighi A, Rashedi E, Nussbaum MA. A head-worn display (“smart glasses”) has adverse impacts on the dynamics of lateral position control during gait. Gait & Posture. 2020;81:126-130
  34. 34. Jianwattanapaisarn N, Sumi K. Investigation of real-time emotional data collection of human gaits using smart glasses. Journal of Robotics, Networking and Artificial Life. 2022;9(2):159-170. DOI: 10.57417/jrnal.9.2_159
  35. 35. Baveye Y, Dellandréa E, Chamaret C, Chen L. Deep learning vs. kernel methods: Performance for emotion prediction in videos. In: 2015 International Conference on Affective Computing and Intelligent Interaction (acii). IEEE. 2015. pp. 77–83

Written By

Nitchan Jianwattanapaisarn, Kaoru Sumi and Akira Utsumi

Submitted: 14 May 2022 Reviewed: 26 August 2022 Published: 10 October 2022