Utterance Emotion Estimation by Using Feature of Syntactic Pattern

Emotion has been defined as basic emotions by various researchers, however, there are not many studies describing the relation between emotion and language patterns in detail based on statistical information. There are various languages all over the world, and even a language of the same country has different writing styles/expressions depending on which language media is used or who is a writer/ speaker, which is thought to make it difficult to analyze the relation of emotion and language patterns. The author has been engaged in constructing and analyzing emotion corpora in some domains based on different sources. From the analysis results, emotion expressions started to become more understood that they have differences and tendencies according to the attributes of the writers and the speakers. In this chapter, I focused on the differences detected in the attributes of the writer/speaker with respect to language patterns; in usage tendencies or combinations of words, unknown expressions (slangs), sentence patterns, non-verbal expressions (emoji, emoticon, etc.) with relevant emotions, then introduce the outcome of the analytical survey on a large scale corpus obtained from a social networking service.


Introduction
In the research field of psychology, cognitive linguistics, it has been analyzed and studied about emotion and language [1,2]. With regard to the relation between basic emotions and language, Fischer [3] performed cluster analysis and created a systematic chart based on emotion categories (emotion word) that can be expressed by language.
In the field of natural language processing, especially, sentiment analysis, a lot of researchers have been engaged in a study on the relationship between language patterns and emotion [4][5][6][7][8]. However, there are various languages all over the world, and language pattern varies depending on language media or writers. For this reason, there are no dictionaries describing language patterns and emotion cyclopaedically.
In the studies by Matsumoto [5] and Tokuhisa [9], they related language pattern dictionaries and occurred emotions. Mera et al. [10] proposed a framework to calculate degrees of positive/negative by using an emotion calculation formula for each case frame pattern. Because most of the methods proposed in these studies were assumed to be applied to "ideal" and "grammatical" sentences, they might not be effective for sentences on Internet.
Matsumoto et al. [11] proposed a method to estimate emotion in utterances including grammatically incorrect expressions such as Internet slangs. In the case of such casual expressions, it is thought to be more effective to take a method by machine learning based on a large scale natural language corpus than to register the knowledge into a dictionary. However, it is difficult to obtain a large scale corpus with labels, and it costs high to make such a corpus. Matsumoto et al. proposed a method to extract features based on word distributed representations as a robust method for unknown expressions. Their method converts words into distributed representation vectors and quantizes them with unsupervised clustering. They demonstrated that the method is robust to unknown expressions compared to existing methods.
After describing emotion estimation methods based on: dictionary, pattern and corpus, we introduce such important elements in corpus-based emotion estimation as gender differences and use of emoji expressions. Then we propose a deep learning-based method that uses a syntactic pattern as a feature combining the corpus-based method and the pattern-based method.
Section 2 introduces the emotion expression dictionary used in our previous research, Section 3 describes the emotion estimation by sentence patterns, and Section 4 explains the corpus-based emotion estimation method. Section 5 analyses emotion estimation with elements of gender and emojis. Section 6 propose a method based on syntax patterns, and Section 7 summarizes this chapter.

Emotion expression dictionary
Dictionaries collecting emotion expressions or evaluation expressions already exist [12][13][14]. These dictionaries defined emotional kinds that can be expressed with the words or phrases as classification categories and are registered them words or phrases. WordNet-Affect is a database created by extending WordNet thesaurus (conceptual database). A part of the information registered in WordNet-Affect is shown in Table 1.
There is a study that converted WordNet-Affect into Japanese language [15]. The evaluation polarity dictionary and the Japanese appraisal evaluation expression dictionary are language resources available for reputation analysis or opinion analysis, and they include words with annotation of emotion polarity; positive/negative. To analyze emotion of a sentence written in Japanese, an emotion expression dictionary including Japanese emotion expressions is necessary. It is also necessary to correspond linguistic resources to each language for emotion analysis written in foreign language. Because a framework of linguistic resource might be different according to the kind of language, it is difficult to make a unified dictionary. In the case of Japanese language, the "Emotion Expression Dictionary" by Nakamura [16] is often referred to and often used in the field of natural language processing. However, many of the expressions included in the emotion expression dictionary are written words appeared in novels, therefore, there are some expressions that are rarely used as colloquial expressions. The Emotion synonym dictionary [17] also includes a few colloquial expressions, listing up the expressions which are thought to be useful for writing novels, scenarios and dramatic dialogs. Currently, as there are no dictionaries that cover practical language expressions such as colloquial expressions, such expressions or patterns are usually extracted from linguistic corpora.
As representative databases with registration of sentence patterns related to emotion expressions, there are EDR electronic dictionary [18], GoiTaikei: A Japanese Lexicon [19], and Kyoto University Case Frame [20]. However, because these linguistic resources are focused on semantic relations, emotion information is not annotated to these databases.
Using dictionaries has an aspect that known knowledge defined by human can be effectively used, however, it is often insufficient when it comes to dealing with things that are greatly related to human sensibilities such as emotions. While some words or expressions always give us unchangeable meanings or impressions, others change their meanings or impressions with the times. For example, the fairness and common sense toward the attributes such as race, religion and gender have changed significantly between decades before and today, so that this issue has been often referred to as one of the problems of artificial intelligence in recent years. Also, as language itself changes, dictionaries need to be updated constantly. In the form of a Wikipedia dictionary, some errors or old information are corrected or updated by being exposed to many people on the Web. However, such descriptions in the Wikipedia dictionary are based on the sensibility of the majority of people, it may not be possible to appropriately estimate the emotions of people with different sensibilities, so there is a limit to emotion estimation with just dictionaries.

Relation of sentence patterns and emotion
This section explains the relation of sentence patterns and emotion from the viewpoint of natural language processing by introducing the studies by Matsumoto [5] and Tokuhisa [9]. Matsumoto et al. [5] focused on the emotion occurrence condition for each sentence pattern to estimate emotion in dialog. They also constructed a dictionary that was registered emotion expressions to consider emotion values of each word. The emotion values mean the strength level of expressing each emotion.
Their study used a sentence pattern database that was extended the emotion calculation formula proposed by Mera et al. [10]. However, because they targeted basic sentence patterns, the method has the same problem with the existing method such as lack of versatility and it is weak to spoken expressions. The "Japanese Lexicon" [19] introduces a sentence pattern of each word. In the example of "Crying, "the sentence patterns are: • N1 ga N2 wo Warau (N1 laughs at N2) N1 and N2 are nouns. The emotion expressed by the sentence can differ depending on the noun applicable to N1 and N2. Referring to the example sentence: "Jiro cries over his debt,""debt" generally has a negative image. However, the emotion generated in this sentence can be affected by the speaker's attitude to "Jiro." These patterns were necessary to be annotated rules manually. Figure 1 shows the case frame pattern of "N1 ga N2 de/ni Naku." The following table ( Table 2) shows some examples of sentence patterns and emotion occurrence rules. These information are saved as XML format on account of readability. Figure 2 shows the emotion occurrence event sentence pattern database with XML format.
Matsumoto et al. [21] also extracted emotion occurrence event sentence patterns from a corpus. The following describes a flow of automatic extraction by Matsumoto et al. showing by example.
Step 1. The inputted sentence is analyzed by dependency parser. "CaboCha [22]"was used as the dependency parser.
First, according to the result of dependency parsing the last segment of the sentence is judged as a predicate. When a segment relates to the predicate and the end of the segment is either case particle or binding particle of "ga,""ha,""wo," "ni,""he,""de,""to,""kara,""made" or "yori" is extracted as surface case.
Step 2. The noun included in the obtained surface case element is annotated the semantic attributes based on "A Japanese Lexicon." If the semantic attributes of the noun cannot be obtained, the basic form of the noun will be set into the surface case slot without annotating semantic attributes. The segment independent from the segment of predicate is not judged as case Case frame pattern of "N1 ga N2 de/ni Naku". element. Because such sentence segment might be important element for deciding emotion attributes, it is extracted as modifier element. The obtained sentence pattern will be as 'EPT.'

English Case Pattern Predicate
Step 3. The set of emotion attributes 'E' annotated to the inputted sentence is decided as emotion attribute of 'EPT.' The combinations of 'EPT' and 'E' obtained from Step1 to Step3 are registered to the emotion occurrence sentence pattern DB. Figure 3 shows an example of extraction process when "Watashi wa odoroki no amari me wo shirokuro saseta." is inputted.
This study automatically extracted sentence patterns from the emotion labeled corpus, created and evaluated the sentence pattern database. As the result of the cross-validation experiments for eight kinds of emotion estimations from sentences expressing emotions based on the corpus-derived sentence pattern database, approx. 42% emotion estimation accuracy was obtained.
Tokuhisa et al. [23] statistically analyzed the valency pattern of each sentence pattern, and proposed a method for emotion inference. Tokuhisa et al. [24] constructed and evaluated the dialog corpus by annotating emotion tags focusing on facial expressions of characters from manga comics.
Their study mainly target the utterances in dialogs, the target data are utterances not by actual persons but by fictional persons. Although these data are simulated real dialog, it is considered that there exist some bias by the authors and generality might be lacking.
It is difficult to register entire colloquial expressions into a dictionary by strictly typifying their sentence patterns, besides, there are few challenging studies that try to annotate emotion that is subjective and sensitive to the sentence patterns. However, I thought that it would not be impossible to extract a relation between emotion and language patterns by studying thoroughly the recent corpus-based methods.

Corpus-based emotion analysis method
This section describes a corpus-based emotion analysis method by referring to the related literatures. The corpus annotated with emotion tags is defined as the emotion corpus. We would like to introduce existing studies that created and evaluated emotion analysis models based on statistical information and machine learning using emotion corpora.

Japanese-English parallel corpus [Minato et al.]
Minato et al. [25,26] annotated emotion tags on word and sentence units included in Japanese and English parallel corpora. The completed corpus included 1,190 Japanese-English sentences. Based on the statistic results of the tagged words and sentences, they proposed and evaluated an emotion estimation method. They further considered the relevance between the two languages. Overview of their corpus is shown in Table 3.
The annotation to the corpus was made by the author, and evaluation by some examinees were not conducted. Matsumoto et al. [27] conducted an questionnaire on this corpus to several examinees and analyzed precision and recall between the tags annotated by the author and the tags annotated by some examinees. Because all examinees were Japanese people, they evaluated only Japanese sentences (1190 sentences). They calculated reliability of the annotation of the emotion tag by multi annotators. Reliability of tag annotation was calculated based on the match frequency among the three operators (initial tag annotator and two examinees). In their study, they proposed a method to reconstruct an emotion corpus by annotating reliability values. Reliability of tag is calculated with the Eq. (1).
P W Tag x ÀÁ shows the sum of the weight of the tags annotated by the corpus creator and the two examinees They calculated the importance for each emotion category by calculating reliability of tag annotation. For calculation they used the weight of emotion tags according to the reliability as weight for emotion category instead of using simple word frequency. The calculation is based on the TFIDF method. Eq. (2) shows the weight of emotion category.  'w m '. 'N' shows the total number of emotion category, and 'l' shows the total unique frequency of word. 'α i ' is normalization coefficient which is calculated with Eq. (3).

Corpus-based method using N-gram [Mishina et al.]
Mishina et al. [28] extracted word n-gram features from the emotion corpora, and proposed an emotion estimation method using the similarity score RECARE which was improved from BLEU often used for translation evaluation. The target emotion categories were four kinds; "anger", "joy", "hate", "hope". The problems of the method are; i) necessary to calculate similarity with all sentences in the corpus, and ii) the estimation accuracy affected by the corpus quality because the method is a simple example-based method.

Corpus creation and analysis [Quan et al.]
Quan et al. [29] constructed a large size of Chinese weblog emotion corpus "Ren-CECps," and analyzedthe corpus. In Ren-CECps, emotion tags were annotated to sentence, word, paragraph, and article units by some test subjects, and the corpus was analyzed from various viewpoints. The annotation to the corpus required hands, and as the size becomes larger and the corpus includes richer information, the higher annotation costs. There is a demerit that because the target are weblog articles, if there is bias in the writers, that will affect the quality of the corpus.

The emotion labeled corpus divided according to the users' attributes
In the study of Matsumoto et al. [30], they targeted the tweet sentences posted on Twitter and targeted each tweet for emotion estimation. Therefore, they needed to annotate emotion tags on each tweet sentence. The emotion estimation model was generated with the following steps: 1. The attribute labeled user account list is created from the accounts of popular users whose user attributes are known.
2. The tweets are collected for each user by using the attribute labeled user account list.
3. The four annotators manually annotate emotion tags on the collected tweets.
4. The emotion estimation model is created by extracting features from the tweets and by using a machine learning method. In Step 4, the feature is extracted. First, the tweet sentence is split into word units by morphological analysis. Then, the words are converted into the distributed representations. They used another corpus to train the distributed representations. For about one year, they continued collecting tweets randomly; then, based on these tweets, they constructed a tweet corpus. They converted the corpus into the word-splitting format and used the text in this format for training the distributed representations.
Then, they annotated the emotion tags on the tweet sentences. Emotion tags annotated to the tweets are as follows: • Positive emotions: "Joy,""Hope,""Love,""Relief,""Reception" • Negative emotions: "Anger,""Hate,""Sorrow,""Fear,""Surprise,""Anxiety" • Other emotions: "No emotion" and "Unclassified" The total number of the emotion categories is 13. Some examples of the labeled tweets and their user attributes are shown in Table 4. The numbers of tweets for each emotion tag are shown in Table 5. As shown in Table 5, I found that there is bias in the numbers of tweets for each emotion.
In their chapter, they reported that emotion estimation accuracy increase by training the emotion corpus which is prepared for each attributes.
However, one thing to keep in mind when estimating emotion based on the corpus is who to annotate the corpus is. If the annotators' attributes and sensibilities are biased, a biased emotion estimation model would be built by learning the biased corpus. Such model cannot infer appropriate emotions according to the attributes of the authors or the speakers of the object sentence for estimation. To clarify the issue that attributes affect emotion estimation, the next subsection analyses what emotional expressions are used depending on gender based on the corpus.

Analysis of emotion expressions for each gender
In this section, I analyze emotion expressions by targeting on an emotion labeled corpus that are divided by gender. By investigating appearance frequency of emotion expressions included in the emotion expression dictionary and the kinds of the emotion labels annotated to the tweets including each emotion expression, I analyze appearance tendency of each expression according to gender by TF-ICF.

Joy
At that time, we had a good match! Male Athlete

Hope
We did good! We should take a good rest and get together in good condition in the next live.

Sorrow
Coordinate plans for clothes are ruined by rain.

Anxiety
All of the roads are in heavy traffic jam... I'm afraid if I could make it to the soccer game...

Female Athlete
The TF-ICF calculation results for each gender are shown in Table 6. In this table, only top 10 expressions and TF-ICF scores are displayed.
From the analysis result, there are not significant difference between male and female. It is cause that only the general expressions are treated in the emotion expression dictionary for expression extraction. In addition, the results shown in Table 7 were obtained by TF-ICF calculation without limitation of emotional expression.
It is found there are expressive differences of gender as seen from this result. For example, in both of gender, the symbols often be used in emotion: "Joy". Above all, female often use emoji. On the other hand, even though, comparatively, female use genial emotional expressions in emotion: "Anger", male often use radical expressions. It is considered that there are specific emotional expressions for each gender, and those express the gender difference of emotional expression.
Difference of emotional expressions by gender might decrease the estimation accuracy of the learned emotion estimation model due to gender bias. In order to avoid this, it would be useful to prepare an emotion estimation model for each Table 7. A part of TF-ICF calculation without limitation of emotional expression.

11
Utterance Emotion Estimation by Using Feature of Syntactic Pattern DOI: http://dx.doi.org/10.5772/intechopen.96597 gender or attribute, or to replace the expressions related to attributes with common expressions. In any case, it is clear that some sort of breakthrough is needed to maintain the fairness of machine learning.

Analysis of emoji
In this subsection, I analyze the appearance tendency of non-verbal expressions such as emoji according to gender. We analyzed usage trend of emoji from the total 59,009 tweets which were collected separately from the emotion corpus for each gender.
The results are shown in Figures 4 and 5. In this figure, the horizontal axes shows Emoji type, the vertical axes shows frequency of use. In the graph of male, emojis with over 20 frequency are shown, and in the graph of female, emojis with over 100 frequency are shown. The types of emoji were set 4 classes; expression, emotion, exclamation and other. Table 8 shows the result of emoji types and frequencies by counting emojis appeared over 10 times. As seen from this result, females had tendancy to use more emojis than male, and female often used emoji expressing expressions or emotions. As was expected that females would use more rich emotion expressions in their tweets, it was obvious from this usage trend of emoji. On the other hand, males used more exclamation marks than other types of emoji.
This result indicates that not only emotional expressions but also nonverbal expressions such as emojis have sufficient influence on emotion estimation. In addition to emojis, Japanese language has emoticons and ASCII art to convey various emotions. Globally, nonverbal expressions play important roles in

Emotion estimation from feature of syntactic pattern by deep learning 6.1 Creation of emotion estimation by deep neural networks
We train language patterns that show emotions by using a deep learning method. We use syntactic patterns obtained from the parsing results by the Japanese dependency and case structure analyzer as features for learning. We use KNP [31] as the Japanese dependency and case structure analyzer. KNP is a syntactic, case and reference analyzer developed by Kyoto University. This system uses a noun case frame dictionary constructed by 7 billion web text.
As preprocessing of KNP, it is necessary to annotate morphological features on word unit by using a morphological analyzer. In this study, I make this annotation of morphological features by the morphological analyzer Juman [32]. As seen in Figure 6, sentences are analyzed by KNP.
As the result of analysis, the features are annotated on morpheme level and chunk level. The analysis result consists from "Clause layer", "Tag layer", "Morpheme layer". In this study, the features are extracted from the "Tag layer". For training, I use the features that have been annotated on chunk level to associate syntactic patterns with emotions. The examples of features annotated on chunk level are shown in Table 9.
The training data are the utterances annotated with emotion tags by manual. These utterances are used in the study by Matsumoto et al. [33], the source sentences are bilingual (Japanese-English). Because these sentences were used as  Table 8.
Emoji types and frequencies (over 10 times). examples for English composition, it is easy to extract syntactic patterns from sentences. As a preliminary experiment, I confirm emotion estimation accuracy by cross-validation. The breakdown of the five kinds of experimental corpora are shown in Table 10.
As the training, I use bi-directional LSTM (bi-LSTM) [34] which is extended LSTM (Long Short-Term Memory) [35]; a kind of recurrent neural networks. LSTM is suited to learning sequences. It enables efficient learning by memorizing and deleting past inputs. Figure 7 shows the neural network structure using bi-LSTM. I use two LSTM layers.
In this study, I create a feature vector by chunk unit, and input the feature vector from the beginning of a sentence for training. The maximum number of chunks was set as 30 based on the maximum number of the chunks in the corpora.   Table 10.
Statistic of emotion tagged corpora.

Figure 7.
Neural networks using bidirectional LSTM. Table 11 shows the result of the preliminary experiment. Averaged F-measure was 32-49%. The cause of this was thought to be the bias of emotion tags.

Experiment
I apply the emotion estimator trained syntactic features using bi-LSTM to the tweet sentences for each gender and evaluate the estimator by calculating accuracy. The architecture of the neural networks using bi-LSTM is shown in Figure 8. The tweet corpus shown in Table 12 was used for the experiment.
We compare the result of the proposed method and the emotion estimation result based on emoji. The dictionary registered emojis and their expressing emotions is constructed as the Emoji Emotion Dictionary. The emoji emotion vectors of the emojis that are not in the dictionary are estimated. Emoji emotion vector of each emoji is obtained by calculating similarity with the seed emojis included in the emoji emotion dictionary and by acquiring emotion categories and similarities of top 5 similar seed emojis. The cosine similarity between the emoji distributed representations is used as the similarity of emojis. Eq. (6), (7), (8) shows the calculation of an emoji emotion vector.
emotion ¼ arg max ew x avg x (8) Eq.(6) shows emotion vector EV e i of emoji em e i . Emoji emotion vector is a weighted mean of the emotion vectors of the top N similar seed emojis using similarity sim e i with seed emojis. ew j e i shows the weight of emotion category j. Eq. (7) is the formula to calculate the mean emoji emotion vector from the top similar N emoji set EM topN . The estimated emotion is outputted as the emotion category x with the maximum weight value ew j avg of the mean vector by Eq. (8). The averaged emoji emotion vector is outputted by calculating emojis including in the sentences as the emotion estimation result. In this study, N value is set as 5 to estimate emotions.  Table 11. F-measures of the preliminary experimental results.

Experimental results
Because neutral tags were not annotated to the target tweet corpus, the accuracies for 4 emotion categories were calculated: "Joy,""Anger,""Sorrow,""Surprise." The experimental result is shown in Table 13. The highest accuracy was found for "Sorrow, "and the second highest was for "Joy. "The lowest accuracy was 24.3% and that was obtained for "Anger".
On the other hand, the overall accuracy was 43.7% by the emoji-based method, which was better than by the bi-LSTM based proposed method. However, the accuracy for "Anger" was low; 4% although the accuracy for "Surprise" was 100%. The primal reason is that the varieties of "Surprise" seed emoji were smaller than other kinds of emotions. It is also because that the number of tweets expressing "Surprise" with emoji was relatively scarce.
This result shows that using the syntax pattern enables effective emotion estimation using deep learning even with a small amount of learning data. It is thought that a more accurate model can be realized by flexibly changing dictionary knowledge depending on the domain or the speaker of the target sentence.

Conclusions
This chapter introduced our study on "emotion analysis on Japanese language" in the research field of the existing natural language processing and linguistic resources. Most of the existing approaches tried to associate emotions and language patterns, however, if language patterns express different emotions depending on the words consisting of the sentences, the rules for millions of combinations must be described.
It will be effective to analyze emotions based on corpora by annotating emotions on the corpora. In this chapter, various features were annotated on sentences by  Table 13.
Comparison between the accuracies of the proposed method and the emoji-based method. using a syntactic parser and feature vectors were generated by clause unit. The emotions of the tweet sentences were estimated by training the features using bi-LSTM neural networks. It was also shown that the capability to development emotions from language patterns by using "emoji" as non-verbal expression. From the experimental results, the emoji-based method was found to be effective to tweet sentences including emoji. Because the amount of the emotion labeled data is limited and the existing dictionary and corpus-based methods cannot cover emotion expressions that are colloquially and depended on users' attributes, improvement of estimation accuracy is limited. Because emojis are non-verbal emotion expressions that can be used for all users, and the emoji expressions are not depended on the kind of languages, it is a hopeful key of emotion analysis in future.
In addition, syntax pattern might not be correctly extracted from the casual sentences that are often seen in dialogs on SNS. In that case, general-purpose neural language models such as BERT [36] and GPT-3 [37] will be useful. Future developments in language models might eliminate the necessity of human-defined linguistic knowledge such as syntactic patterns, however, methods such as fine tuning are still effective to build emotional estimation models satisfying the needs of all the people from large data. In that case, dictionary knowledge and syntax patterns will play effective roles in improving accuracy and presenting the basis for judgment.