Average assessment of 9 discrete emotions from CAP-D in Valence-Arousal-Approach/Avoidance dimensions as assessed in NAPS.
Defining “emotion” and its accurate measuring is a notorious problem in the psychology domain. It is usually addressed with subjective self-assessment forms filled manually by participants. Machine learning methods and EEG correlates of emotions enable to construction of automatic systems for objective emotion recognition. Such systems could help to assess emotional states and could be used to improve emotional perception. In this chapter, we present a computer system that can automatically recognize an emotional state of a human, based on EEG signals induced by a standardized affective picture database. Based on the EEG signal, trained deep neural networks are then used together with mappings between emotion models to predict the emotions perceived by the participant. This, in turn, can be used for example in validation of affective picture databases standardization.
- emotion recognition
- emotion perception
- machine learning
- deep neural networks
In psychological research, the most common method of measuring perceived emotions or emotional states is through self-assessment forms filled manually by participants. The information they give is useful but very subjective and dependent on many extraneous factors, i.e. the construction of the form, the instructions, and the level of emotional intelligence of the participant. Also, the forms cannot be used when working with children or mentally disabled people. The physiological signals can give a more objective view of the emotional reactions of the body. Among measurement techniques using galvanic skin response (GSR), facial electromyography (EMG), electrocardiography (ECG), breathing rate, or temperature; electroencephalography (EEG) is one of the most common in emotion recognition applications. It is non-invasive and offers high-resolution, high-dimensional data about the source of the emotions itself - the brain activity. In EEG, highly conductive electrodes placed on the scalp collect the electrical charge induced by the activity of the brain.
The correlation between emotional state and EEG is widely used in cognitive psychology, psychophysiology, and medicine  for the examination of mental disorders like depression , autism spectrum disorder (ASD) , attention-deficit hyperactivity disorder (ADHD) , or schizophrenia . From a psychological point of view, EEG gives insights into the mechanisms of how emotions are made. Emotion recognition systems, like the one presented in this chapter, can be used to assess the emotional perception of humans. However, the analysis of complex and high-dimensional EEG patterns and correlations would be virtually impossible without computers and computational methods like machine learning. The emotion recognition algorithms are a part of a special branch of computer science called affective computing . It is also a part of the artificial intelligence field as it relates to the understanding and displaying emotions by machines. Automatic emotion recognition systems based on EEG have already shown outstanding accuracy in many different applications  and well-established benchmarks like DEAP (database for emotion analysis using physiological signals) . Machine learning algorithms are used in the vast majority of these systems and are considered state-of-the-art in the domain. Among them, deep neural networks are the most promising emerging approach which does not require additional feature extraction steps .
The chapter presents the idea and design of the system for validation of affective picture databases by confronting its result with predictions of EEG-based artificial deep neural networks. Consecutive sections are a step-by-step guide for creating such a system. In Section 2, different psychological models of emotion are described, the problem of mapping between emotion models is introduced, and our new mapping is proposed. In Section 3, the instructions for designing a complete EEG experiment for machine learning emotion recognition are given, together with a list of affective picture sets, and state-of-the-art algorithms. In Section 4, the system for validation of affective databases is presented. The chapter ends with a summary and future work section.
2. Psychological models of emotion
Recognition of emotions must start from the definition of the model in which they are measured. This is the main dividing line in the field of emotion analysis . The theory of emotions is still an open topic despite plenty of publications and research. The reason is that human emotions are mental states generated by the central nervous system , and as such, they are hard to assess, nondeterministic, and subjective phenomena. Individuals with different levels of emotional intelligence may not be able to assess their emotional state accurately [ref]. Moreover, similar stimuli may induce very different states in two similar people, and the same person may respond differently to the seemingly similar stimuli. The age, time of the day, mood, experience, fatigue may all affect the perception of emotions.
However, there is some evidence for neural circuits that are responsible for particular basic emotional events , so some assumptions and simplifications were made to extract several different emotion models. In general, they divide into discrete (or categorical) and dimensional (or continuous) models.
The discrete emotion models describe different numbers of independent emotion categories. One of the most popular models by Paul Ekman describes six universal basic emotions of anger, disgust, fear, happiness, sadness, and surprise . The model is derived from the observation of universal facial expressions. The paper describing the model has been cited and discussed by thousands of researchers, but, the existence of basic emotions is still an unsettled issue in psychology, rejected by many researchers [13, 14, 15]. Another model by Plutchik describes 8 primary bipolar emotions: joy and sadness; anger and fear; surprise and anticipation; and trust and disgust . But, unlike in Ekman’s model, Plutchik’s wheel of emotions relates these pairs in the circumplex model. Recently, the model consisting of as many as 27 classes bridged by continuous gradients was proposed .
The continuous models are usually represented in numerical dimensional space. The most popular dimensions were defined by Mehrabian and Russell in  as pleasure, arousal, and dominance (PAD model). The first dimension is frequently called valence in the literature, it describes how pleasant (or unpleasant) is the stimuli for the participant. The arousal dimension defines the intensity of emotion. Dominance is described as a level of control and influence over one’s surroundings and others . Usually, less attention is paid to this third dimension in the literature . However, only the dominance dimension enables to distinguish between angry and anxious, alert and surprised or relaxed from protected . The model that includes only valence and arousal levels is called a circumplex model of affect  and is one of the most commonly used to describe the emotions elicited with stimuli. Currently, this model is facing some criticism, because complex emotions in particular are hard to define within only these two general dimensions [22, 23]. The effort to present scientific results in a simple and structured form may lead to a critical reduction of the phenomena. The newest research findings on the global meaning structure of the emotion domain pointed out that more than two dimensions are needed to describe the nature of the human emotional experience sufficiently [23, 24].
2.1 Mappings between models
Discrete and dimensional models are not defined as contradictory. Instead, they both can give unique value that can assist in understanding the functions of emotions . There are multiple works on mappings between different, both discrete and continuous, emotion models [22, 26, 27]. They are usually based on self-assessment questionnaires of the group of participants who assesses the discrete emotions (induced or represented by words, images, videos, short stories, or facial expression) in a few continuous dimensions of circumplex, PAD, or similar models, i.e. Valence-Arousal-Control-Utility , Valence-Arousal-Approach/Avoidance ). Formerly, the questionaries were based on Self-Assessment Manikins (SAMs)  or several-point (usually 5, 7, or 9 points) Likert scale (like in IAPS  or OASIS  datasets). The new trend is to use more fine-grained continuous scales like selecting a point on the 10 cm line  or Affective Slider .
Two popular mappings based on emotion words are presented in Figure 1. The three-dimensional visualizations are adapted from . The emotion words are placed in the position representing their average PAD assessment by 300  and 70 subjects  accordingly. The length of the dashed lines is proportional to the pleasure/valence value. In both models, the most pleasure-inducing words are Love, Happiness, Hope, and Gratefulness. On the other end, we have highly-arousing Anger and Fear that can be differentiated only by the dominance dimension. The least arousing and pleasant word is Sadness which is also of low dominance. The main difference between mappings is the location of Hate which is relatively less arousing in the Hoffmann mapping. Some of the emotion words like Contempt, Disgust, or Compassion have equivalents only in the model presented in Figure 2.
Figure 2 is based on data from , it presents the average assessment of 16 common emotion words by 187 subjects in Valence-Arousal-Control dimensions (the dimension of Utility was also assessed, but is omitted in the figure) before and after multi-dimensional scaling (MDS) into 3 dimensions that results with a far more “honest” Euclidean space between emotion instances. As can be observed, the locations of emotion words after MDS are much more scattered across space, but they keep some basic relationships, i.e. Love, Happiness, Gratefulness, and Compassion have still larger values in Dimension 1 (similar to Valence); pairs of similar emotions like Sadness and Disappointment, or Happiness and Love are still relatively close to each other. Thus, this MDS mapping may be a good basis for machine learning algorithms based on dimensional proximities.
2.2 Own mapping between NAPS and CAP-D affective picture sets
In our example, we will use the set of 266 affective pictures from NAPS (Nencki Affective Picture System)  and NAPS BE (a subset of NAPS with 6 basic emotion labels added)  that were included in CAP-D (Categorized Affective Pictures Database) . Subsets of images from this set were assessed in several emotion models by different groups of participants:
valence, arousal, and approach-avoidance dimensions (266 images assessed by 119 female and 85 male subjects in NAPS)
valence and arousal dimensions (144 images assessed by 67 female and 57 male subjects in NAPS-BE)
arousal dimension (266 images assessed by 73 female and 60 male subjects in CAP-D)
intensities of 6 basic Ekman emotions and dominant emotion (or emotions) per picture (144 images assessed by 67 female and 57 male subjects in NAPS-BE)
categorization and intensity in 10 emotion categories including 6 basic Ekman emotions: anger, compassion, disgust, fear, happiness, love, peacefulness, pride, sadness, surprise (266 images assessed by 73 female and 60 male subjects, and 15 clinical psychologists in CAP-D)
Several mappings between dimensional and discrete emotion models can be built on this diverse set of responses. The diagram of possible mappings is presented in Figure 3.
Among the options presented in Figure 3, we selected three mappings from 10 emotions from CAP-D onto Valence-Arousal-Approach/Avoidance from NAPS (Table 1 and Figure 4), Valence-Arousal from NAPS, and Valence-Arousal from NAPS-BE (Figure 5). In order to establish each mapping, the dimensional assessments for all images representing a specific discrete class in CAP-D (as the 1st emotion) were normalized to <−1, 1 > range, averaged, and placed in the calculated coordinates in the dimensional space (Figures 4 and 5). In practice, only 9 discrete emotions could be mapped for NAPS as there were no images representing
The presented mapping will be used in section 4. as a part of the EEG system for validation of affective databases standardization. When using this mapping in this system, we need also a specific method for the discretization of the coordinates. We can use the estimate which checks if the coordinates in the dimensional space predicted by the algorithm are closer than a standard deviation from the discrete emotion position in the mapping. Also, we can just limit to the nearest discrete emotion in the dimensional space. A detailed discussion about discretization and precision metrics in emotion recognition can be found in .
3. Machine learning for EEG-based emotion recognition
The emotion recognition from EEG is an example of a problem that wouldn’t have a solution without the use of modern machine learning methods. Physiological signals like EEG have very high dimensionality, high level of noise, and physiological artifacts. It is very hard to define simple hand-crafted algorithms to deal with this kind of data. This section is a short introduction to the design of machine learning classifiers, and a summary of current trends and applications of computer-aided emotion recognition.
Machine learning (ML) describes the methods of automatic knowledge extraction and drawing conclusions from the provided database. It is a part of the broader domain of artificial intelligence (AI) that is connected with automatic reasoning and higher cognitive functions in machines. The simplest ML algorithms like k-nearest neighbors (kNN) or k-Means just compare the test samples with the existing database and classify them based on the similarity. More complex ML algorithms induce general rules present in the database and use these rules to predict test samples (decision trees). Algorithms like support vector machines (SVMs) transform and divide the database using multi-dimensional planes that split samples of different categories.
All these traditional algorithms have one common disadvantage: they do not work well with massive amounts of high-dimensional data like EEG. Thus, it is usually necessary to extract some lower-dimensional features like power or frequencies of brain waves. This is not the case for deep learning methods that can operate on raw data. Deep learning is inherently connected with artificial neural networks. They are inspired by the biological model of neural networks in the brain. Such deep artificial neural networks can be seen as very complex non-linear functions translating input data into output data of any kind. They encode all the features and knowledge about the data in the connections between neurons in the network. Deep neural networks have shown outstanding accuracy in different EEG applications . Thus, we use them as a “core algorithm” in our examples. However, it is possible to replace it with any other traditional machine learning method based on features like brain waves, event-related potentials (ERPs) and synchronization, frontal EEG asymmetry, or steady-state visually evoked potentials (SSVEPs) [7, 36].
The main part of the system is an emotion recognition machine learning algorithm. The algorithm learns to translate EEG signals into values (discrete, dimensional, or both) defined by each emotion model (or combination of models). The core (architecture, hyperparameters, initialization) of the algorithm is the same for each model, only the definition of the outputs and loss functions are changing. For discrete models, the traditional classification approach is applied. For dimensional models, emotion recognition becomes a regression problem . There is also the possibility to design a multi-output algorithm based on both discrete and dimensional models. If this multi-target optimization increases the generalizability of the algorithm it may support the importance of both dimensional and discrete models of emotions . In our example in Figure 6, we present an intra-subject learning approach where the neural network is trained on a representative sample of affective images – the distribution of pictures’ features (e.g. picture categories, emotions induced, colors, brightness) used during training should be similar in the affective database validated in the final system. We keep the same set of participants in training and in the final system to ensure comparability of the physiological responses.
3.1 Designing an EEG experiment for emotion recognition
Perhaps, the hardest, but essential part of creating an EEG-based classifier is the design of proper experimental procedures for data acquisition. It is a crucial part that requires specialistic knowledge in psychology, hardware, and signal processing. One mistake in this phase may cause a failure of the whole study. The best way to start is to check the literature for similar experiments and learn from their ideas and mistakes. To train and then test EEG-based classifiers correctly, it is important to follow the same procedures and maintain the conditions of the experiments. Our knowledge about cognitive brain functions is incomplete, so potentially irrelevant confounding variables may have a strong impact on the brain response. The list of confounding variables typically includes: observer-expectancy effect (the way instructions are provided, presence of researcher during the experiment), age and gender of participants (many confirmed differences between women and men in the literature), the time of the day, the mood, fatigue, and motivation of the participant (usually increased by some reward), left/right-handedness (if the participant responds for stimuli), or impact of drugs and stimulants.
The dependent variable in the emotion recognition EEG experiments is usually defined in time or frequency space, and the independent variable is usually a class of emotion or a value in the dimensional model that intends to be induced using the specific stimulus. According to a thorough survey from , the most frequently used types of stimuli are affective images (in over 35% of articles) before videos, music, and other modalities like games or imagination techniques. This is partly because of the high availability of affective picture sets described in the next section.
3.2 Affective picture databases
There are several publicly accessible affective picture sets for emotion recognition (Table 2). Arguably, the most popular one in the literature is IAPS  (International Affective Picture System, pronounced “eye-apps”). It contains color photographs of objects, landscapes, and animals, but also dead bodies and erotic content in order to induce a wide range of emotional states. It uses three-dimensional scales of valence, arousal, and dominance/control. However, there are newer sets like NAPS (Nencki Affective Picture System)  and OASIS (Open Affective Standardized Image Set)  that contain many more pictures and/or assessments. The largest NAPS set also has scales in three similar dimensions of valence, arousal, and approach/avoidance, and may be easily extended by discrete emotion labels from NABS-BE , erotic pictures from NAPS-ERO , or fear-inducing pictures from SFIP  (Set of Fear Inducing Pictures). The pictures in NAPS are of high-quality, and represent 5 main categories (people, faces, animals, objects, and landscapes). The newest CAP-D dataset  aggregates subsets of pictures from IAPS, NAPS, and GAPED, and extended them with discrete emotional categories.
|Dataset name [ref] (Year)||Number of pictures and assessments||Assessment method||Emotion models used|
|IAPS  (2005)||956 pictures, 100 subjects (50 women)||5-point Self-Assessment Manikin (SAM)||Dimensional model: valence, arousal, dominance/control|
|NAPS  (2014)||1356 pictures, 204 subjects (119 women)||9-point sliding scale||Dimensional model: valence, arousal, approach/avoidance|
6 basic emotions (only for a subset of 510 images) 
|OASIS  (2017)||900 pictures, 822 subjects (420 women)||7-point Likert scale||Dimensional model: valence, arousal|
|GAPED  (2011)||730 pictures, 60 subjects (no gender given)||100-points rating scale||Dimensional model: valence, arousal, congruence with moral and legal norms|
|CAP-D  (2018)||513 pictures, 133 subject (73 women), 15 clinical psychologists||Describing the picture with 1 of 10 emotion words||10 discrete emotions, arousal and intensity dimensions|
|SFIP  (2017)||288 pictures, 1671 subjects||5-point Likert scale for fear,|
9-point Self-Assessment Manikin for valence
|Intensity of fear, valence|
3.3 EEG devices
The selection of an EEG device is dependent on the purpose and goal of the study. For sophisticated psychological or medical research in emotion recognition, it is crucial to use more expensive research-grade or medical-grade hardware. The examples of EEG caps of such devices are presented in Figure 7. However, the heart of the system is not the cap, but the amplifier. It should provide at least 32 channels for electrodes with at least 256 Hz sampling to record all relevant frequencies, and the voltage resolution of less than a few nanovolts to capture small differences in the signal between conditions. Additional channels for electrooculogram (EOG) and accelerometers are necessary for artifact filtering algorithms.
There is an emerging interest in low-cost solutions, especially for applications in brain-computer interfaces. One of the examples is Emotiv EPOC+ that was validated to work well with emotion recognition [41, 42].
3.4 State-of-the-art emotion recognition algorithms
There are a couple of thorough reviews of EEG-based emotion recognition systems in the literature [1, 7, 36, 43]. The vast majority of top-performing algorithms are based on machine learning approaches. The methods from the literature achieve levels of up to 94% for 2-class discrete problems (such as arousal vs. neutral or happiness vs. sadness) and up to 82% for 4-class classification (such as joy, anger, sadness, and pleasure). On the example of the DEAP (database for emotion analysis using physiological signals) , the paper  shows the comparison of different classifiers for 4 quadrants of the circumplex model: 63% for the kNN, 67% for the SVM, 70% for deep convolutional neural network and 75% for the deep hybrid neural network. On the example of the eNTERFACE06_EMOBRAIN database, the best classification accuracy among calm, exciting positive, and exciting negative emotional states achieved around 77% . On the SEED dataset, the emotion classification into positive, neutral, and negative classes has achieved accuracy up to 83% . Presented accuracies are virtually unreachable for humans.
4. EEG-based system for validation of affective picture databases standardization
In this section, we present the idea of the system for EEG-based validation of affective picture databases (Figure 8). The system consists of:
a computer displaying affective pictures, collecting self-assessment responses, and providing feedback to the participant
EEG device placed on the participant’s head
a set of trained deep neural networks (DNNs) for emotion recognition from EEG
a set of mappings between emotion models
In our example, stimuli from the CAP-D picture set are displayed on the screen. Participant assesses each picture following the emotion categorization procedure from CAP-D. For each stimulus display period, the EEG signal is collected and passed to the input of trained DNNs for emotion recognition. The process of training such DNNs is described in Section 3. Based on the input EEG signal, each DNN outputs coordinates in the specific dimensional emotion model. They need to be mapped onto discrete emotions used in the emotion categorization of CAP-D. An example of such mapping is presented in Section 2.2. The mappings are crucial when operating on datasets described using different emotion models.
In the results validation phase, the information about the discrete emotion class labels from the emotion categorization, output of the selected mapping, and the ground truth label of the image are compared. There are several possible outcomes from such a comparison:
All the labels are the same – the normative label from the database is in agreement with the participant’s categorization and physiological response.
The normative label alone is different – the emotion induced in the participant consistently differ from the normative label.
The participant’s categorization alone is different – the physiological response is in agreement with the normative label but was assessed differently by the participant.
The output of the mapping alone is different – the participant’s categorization is in agreement with the normative label, but the physiological response suggests a different label.
All the labels are different – there is no agreement between ground truth, self-assessment, and mappings.
Based on these outcomes several conclusions can be drawn and translated into the feedback about the database standardization. For outcome 1., the feedback should say about positive validation of the normative label. This is the desired outcome of the system. On the other hand, outcome 2. suggests a serious problem with the normative label for the particular participant, as both subjective and physiological responses agree on a different label. This situation itself does not mean the validation is negative. Only if this problem persists among the majority of participants the label should be reconsidered. The supporting example here is the picture of a happy dog that should induce happiness according to the normative label but induces fear in individuals with cynophobia (the fear of dogs). Outcome 3., if consistent among the population, may suggest problems with naming the proper emotion on the picture. The physiological response is as expected for the normative label, but participants do not select the expected label. The supporting example here is the picture with a normative label of “fear” presenting the wolf eating its prey that induces fear in the physiological response. But, participants may focus on the prey’s appearance in the subjective response and select the label “disgust”. In this example, we may face the problem of ambiguous labeling of the image. If outcome 3. is present only in individual participants it may rather suggest their problems of emotion perception. Outcome 4. should be a suggestion for the authors of the database that the normative label of the picture may be biased by subjective responses of the participants (e.g. because of some cultural or ethical reasons), so their physiological responses disagree with conscious categorization. E.g., they cannot answer differently because it would put them in a bad light. Outcome 5 is the only one resulting with clearly negative validation where all participant’s reactions are different. It may suggest that the normative label is too ambiguous or too weak to be perceived correctly.
The system was designed to be generic. The described validation may be performed for any discrete and dimensional models with little to no modifications of the flow. The only requirement is the existence of at least one algorithm trained to recognize the assessed emotions or at least one mapping which translates recognition results (in a different emotion model) into the target model. The more algorithms and mappings the more detailed validation results. Also, the system can be easily adapted to videos, sound, or text stimuli. Additionally, this system may select the most feasible emotion model for the participant and can be calibrated for him by fine-tuning the networks using his consecutive responses.
This system may be further adapted as a tool for training emotion perception - one of the branches of emotional intelligence that is measured in the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT) . The feedback from the system provides suggestions of improvements in the emotional perception and points to the differences between self-assessment and normative benchmark that should be considered by the participant.
5. Summary and future work
The chapter presents a conceptual design of the computer system that uses EEG signals and deep neural networks to assess the affective picture databases standardization. According to the presented current state-of-the-art in psychology and machine learning, this kind of system is possible to create. All elements of the systems are ready to use. The only challenge is the selection of a representative population and collection of a significant amount of EEG data to train the deep neural networks.
As there are many models for describing emotions, we focused here on the mappings between emotion models. Such mappings allow using machine learning methods trained on one model for emotion recognition in a different model. There is a lack of emotion mappings for affective picture sets, so our new mappings between dimensions of valence, arousal, approach/avoidance, and discrete emotions are the value added by the chapter. There is also a possibility that one consistent and dominant model of emotion will be established in the future. Then, the mappings may be deprecated, and one “general” model may be used to train the deep neural network.
The genericity of the system opens many possibilities for future work and adaptations to different applications. Besides emotion self-assessment validation, the system can be adapted for validation of emotion mappings, or emotional intelligence tests, e.g. emotion perception task from MSCEIT. It may be used in the future for the rehabilitation of people with emotion perception disorders like ASD. Also, the new machine learning methods can be inserted into the system and compared with existing deep neural networks. Even the EEG device may be replaced or extended with other physiological measurements without big changes in the system architecture.
The exploration of top-performing deep neural networks and emotion mappings may help to understand the underlying biological model of emotion, e.g. by using feature visualization approaches .