Abstract
Over the past few decades human-computer interaction has become more important in our daily lives and research has developed in many directions: memory research, depression detection, and behavioural deficiency detection, lie detection, (hidden) emotion recognition etc. Because of that, the number of generic emotion and face databases or those tailored to specific needs have grown immensely large. Thus, a comprehensive yet compact guide is needed to help researchers find the most suitable database and understand what types of databases already exist. In this paper, different elicitation methods are discussed and the databases are primarily organized into neat and informative tables based on the format.
Keywords
- emotion
- computer vision
- databases
1. Introduction
With facial recognition and human-computer interaction becoming more prominent with each passing year, the amount of databases associated with both face detection and facial expressions has grown immensely [1, 2]. A key part in creating, training and even evaluating supervised emotion recognition models is a well-labelled database of visual and/or audio information fit for the desired application. For example, emotion recognition has many different applications ranging from simple human-robot computer interaction [3, 4, 5] to automated depression detection [6].
There are several papers, blogs and books [7, 8, 9, 10] fully dedicated to just describing some of the more prominent databases for face recognition. However, the collection of emotion databases is disparate, as they are often tailored to a specific purpose, so there is no complete and thorough overview of the ones that currently exist.
Even though there already are a lot of collected databases out there that fit many specific criteria [11, 12], it is important to recognize that there are several different aspects that affect the content of the database. The selection of the participants, the method used to collect the data and what was in fact collected all have a great impact on the performance of the final model [13]. The cultural and social background of participants as well as their mood during recordings can sway the results of the database to be specific to a particular group of people. This can even happen with larger sample pools, like the case with the Bosphorus database [14], which suffers from a lack of ethnic diversity compared to databases with a similar or even smaller size [15, 16, 17].
Since most algorithms take an aligned and cropped face as an input, the most basic form of datasets is a collection of portrait images or already cropped faces, with uniform lighting and backgrounds. Among those is the NIST mugshot database [18], which has clear gray-scale mugshots and portraits of 1573 individuals on a uniform background. However, real-life scenarios are more complicated, requiring the authors to experiment with different lighting, head pose and occlusions [19]. For example in the M2VTS database [20], which contains the faces of 37 subjects in different rotated positions and lighting angles.
Some databases have focused on gathering samples from even less controlled environments with obstructed facial data like the SCface database [21], which contains surveillance data gathered from real world scenarios. Emotion recognition is not solely based on a person’s facial expression, but can also be assisted by body language [22] or vocal context. Unfortunately, not many databases include body language, preferring to completely focus on the face, but there are some multi-modal video and audio databases that incorporate vocal context [11, 23].
2. Elicitation methods
An important choice to make in gathering data for emotion recognition databases is how to bring out different emotions in the participants. This is the reason why facial emotion databases are divided into three main categories [24]:
posed
induced
spontaneous
Eliciting expressions can be done in several different ways and unfortunately, they yield wildly different results.
2.1. Posed
Emotions acted out based on conjecture or with the guidance from actors or professionals are called posed expressions [25]. Most facial emotion databases, especially the early ones i.e. Banse-Scherer [26], CK [27] and Chen-Huang [28], consist purely of posed facial expressions, as it is the easiest to gather. However, they also are the least representative of real world authentic emotions as forced emotions are often over-exaggerated or missing subtle details, like in Figure 1. Due to this, human expression analysis models created through the use of posed databases often have very poor results with real world data [13, 30]. To overcome the problems related to authenticity, professional theatre actors have been employed, e.g. for the GEMEP [31] database.

Figure 1.
Posed expressions over different age groups from the FACES database [
2.2. Induced
This method of elicitation displays more genuine emotions as the participants usually interact with other individuals or are subject to audiovisual media in order to invoke real emotions. Induced emotion databases have become more common in recent years due to the limitations of posed expressions. The performance of the models in real life is greatly improved, since they are not hindered by overemphasised and fake expressions, making them more natural, as seen in Figure 2. There are several databases that deal with audiovisual emotion elicitation like the SD [32], UT DALLAS [33] and SMIC [34], and some that deal with human to human interaction like the ISL meeting corpus [35], AAI [36] and CSC corpus [37].

Figure 2.
Induced facial expressions from the SD database [
Databases produced by observing human-computer interaction on the other hand are a lot less common. The best representatives are the AIBO database [23], where children are trying to give commands to a Sony AIBO robot, and SAL [11], in which adults interact with an artificial chat-bot.
Even though induced databases are much better than the posed ones, they still have some problems with truthfulness. Since the emotions are often invoked in a lab setting with the supervision of authoritative figures, the subjects might subconsciously keep their expressions in check [25, 30].
2.3. Spontaneous
Spontaneous emotion datasets are considered to be the closest to actual real-life scenarios. However, since true emotion can only be observed, when the person is not aware of being recorded [30], they are difficult to collect and label. The acquisition of data is usually in conflict with privacy or ethics, whereas the labelling has to be done manually and the true emotion has to be guessed by the analyser [25]. This arduous task is both time-consuming and erroneous [13, 38], having a sharp contrast with posed and induced datasets, where labels are either predefined or can be derived from the elicitation content.
With that being said, there still exist a few databases out there that consist of data extracted from movies [39, 40], YouTube videos [41], or even television series [42], but these databases have inherently fewer samples in them than their posed and induced counterparts. Example images from these databases are in Figures 3–5 respectively.

Figure 3.
Images of movie clips taken from the AFEW database [

Figure 4.
Spanish YouTube video clips taken from the Spanish Multimodal Opinion database [

Figure 5.
TV show stills taken from the VAM database [
3. Categories of emotion
The purpose of a database is defined by the emotions represented in it. Several databases like CK [27, 44], MMI [45], eNTERFACE [46], NVIE [47] all opt to capture the six basic emotion types: anger, disgust, fear, happiness, sadness and surprise as proposed by Ekman [48, 49, 50]. In the tables, they are denoted as primary 6. Often authors tend to add contempt to these, forming seven primary emotions and often neutral is included. However, they cover a very small subcategory of all possible emotions, so there have been attempts to combine them [51, 52].
Several databases try to just categorise the general positive and negative emotions or incorporate them along with others, e.g. the SMO [41], AAI [36], and ISL meeting corpus [35] databases. Some even try to rank deception and honesty like the CSC corpus database [37].
Apart from anger and disgust within the six primary emotions, scientists have tried to capture other negative expressions, such as boredom, disinterest, pain, embarrassment and depression. Unfortunately, these categories are harder to elicit than other types of emotions.
TUM AVIC [53] and AVDLC [12] databases are amongst those that try to label levels of interest and depression while GEMEP [31] and VAM [43] attempt to divide emotions into four quadrants and three dimensions, respectively. The main reason why most databases have a very small number of categories (mainly, neutral and smile/no-smile) is that the more emotions added, the more difficult they are to label and also more data is required to properly train a model.
Relatively newer databases have begun recording more subtle emotions hidden behind other forced or dominant emotions. Among these are the MAHNOB [51] database, which focuses on emotional laughter and different types of laughter, and others that try to record emotions hidden behind a neutral or straight face like SMIC [34], RML [54], Polikovsky’s [55] databases.
One of the more recent databases, the iCV-MEFED [52, 56] database, takes on a different approach by posing varying combinations of emotions simultaneously, where one emotion takes the dominant role and the other is complimentary. Sample images can be seen in Figure 6.

Figure 6.
Combinations of emotions from the iCV-MEFED [
3.1. Action units
The Facial Affect Sorting Technique (FAST) was developed to measure facial movement relative to emotion. They describe the six basic emotions through facial behaviour: happiness, surprise and disgust have three intensities and anger is reported as controlled and uncontrolled [57]. Darwin [58], Duchenne [59] and Hjortsjo [60], Ekman and Friesen [61] developed the Facial Action Coding System (FACS), a comprehensive system, which catalogues all possible visually distinguishable facial movements.
FACS describes facial expressions in terms of 44 anatomically based Action Units (AU). They are meant for facial punctuators in conversation, facial deficits indicative of brain lesions, emotion detection, etc. FACS only deals with visible changes, which are often induced by a combination of muscle contractions. Because of that, they are called action units [61]. A small sample of such expressions can be seen in Figure 7. A selection of databases based on AUs instead of regular facial expressions is listed in Table 1.

Figure 7.
Induced facial action units from the DISFA database [
Database | Participants | Elicitation | Format | Action units | Additional information |
---|---|---|---|---|---|
CMU-Pittsburgh AU-Coded Face Expression Database [27] 2000 | 210 | Posed | Videos | 44 | Varying ethnic backgrounds, FACS coding |
MMI Facial Expression Database [63, 64] 2002 | 19 | Posed and audiovisual media | Videos, images | 79 | Continuously updated, contains different parts |
Face Video Database of the MPI [65, 66] 2003 | 1 | Posed | Six viewpoint videos | 55 | Created using the MPI VideoLab |
D3DFACS [67] 2011 | 10 | posed | 3D videos | 19–97 | Supervised by FACS specialists |
DISFA [62] 2013 | 27 | audiovisual media | Videos | 12 |
Table 1.
Action unit databases.
In 2002, the FACS system was revised and the number of facial contraction AUs was reduced to 33 and 25 head pose AUs were added [68, 69, 70]. In addition, there is a separate FACS version intended for children [71].
4. Database types
Emotion recognition databases may come in many different forms, depending on how the data was collected. We review existing databases for different types of emotion recognition. In order to better compare similar types of databases, we decided to split them into three broad categories based on format. The first two categories separated still images from video sequences, while the last category is comprised of databases with more unique capturing methods.
4.1. Static databases
Most early facial expression databases, like the CK [27], only consist of frontal portrait images taken with simple RGB cameras. Newer databases try to design collection methods that incorporate data, which is closer to real life scenarios by using different angles and occlusion (hats, glasses, etc.). Great examples are the MMI [45] and Multi-PIE [72] databases, which were some of the first well-known ones using multiple view angles. In order to increase the accuracy of the human expression analysis models, databases like the FABO [22] have expanded the frame from a portrait to the entire upper body.
Static databases are the oldest and most common type. Therefore, it’s understandable that they were created with the most diverse of goals, varying from expression perception [29] to neuropsychological research [73], and have a wide range of data gathering styles, including self-photography through a semi-reflective mirror [74] and occlusion and light angle variation [75]. Static databases usually have the largest number of participants and a bigger sample size. While it is relatively easy to find a database suited for the task at hand, categories of emotions are quite limited, as static databases only focus on six primary emotions or smile/neutral detection. In the future, it would be convenient if there were databases with more emotions, especially spontaneous or induced, because, as you can see in Table 2, almost all static databases to date are posed.
Database | Participants | Primary 6 | Neutral | Contempt | Embarrassment | Pain | Smile | Positive | Negative | Other | Additional information |
---|---|---|---|---|---|---|---|---|---|---|---|
JACFEE [76] 1988 | 4 | X | X | Eight images of each emotion | |||||||
POFA (or PFA) [73] 1993 | 14 | X | Cross-cultural studies and neuropsychological research | ||||||||
AT-T Database for Faces (formerly ORL) [77, 78] 1994 | 40 | X | X | Dark homogeneous background, frontal face | |||||||
Yale [75] 1997 | 15 | X | Frontal face, different light angles, occlusions | ||||||||
FERET [79] 1998 | 1199 | X | X | Standard for face recognition algorithms | |||||||
KDEF [80] 1998 | 70 | X | X | Psychological and medical research (perception, attention, emotion, memory and backward masking) | |||||||
The AR Face Database [81] 1998 | 126 | ✓1 | X | X | X | Frontal face, different light angles, occlusions | |||||
The Japanese Female Facial Expression Database [74] 1998 | 10 | X | X | Subjects photographed themselves through a semi-reflective mirror | |||||||
MSFDE [82] 2000 | 12 | X | X | FACS coding, ethnical diversity | |||||||
CAFE Database [83] 2001 | 24 | X | X | FACS coding, ethnical diversity | |||||||
CMU PIE [84] 2002 | 68 | X | X | X | Illumination variation, varying poses | ||||||
Indian Face Database [85] 2002 | 40 | ✓ | X | Indian participants from seven view angles | |||||||
NimStim Face Stimulus Set [86] 2002 | 70 | X | X | X | Facial expressions were supervised | ||||||
KFDB [87] 2003 | 1920 | X | X | Includes ground truth for facial landmarks | |||||||
PAL Face Database [88] 2004 | 576 | ✓ | X | Wide age range | |||||||
UT DALLAS [33] 2005 | 284 | ✓ | X | Head and face detection, emotions induced using audiovisual media | |||||||
TFEID [89] 2007 | 40 | X | X | Taiwanese actors, two simultaneous angles | |||||||
CAS-PEAL [90] 2008 Multi-PIE [72] 2008 | 1040 337 | X | X X | X X | Chinese face detection Multiple view angles, illumination variation | ||||||
PUT [91] 2008 | 100 | X | X | High-resolution head-pose database | |||||||
Radboud Faces Database [92] 2008 | 67 | X | X | X | Supervised by FACS specialists | ||||||
FACES database [29] 2010 | 154 | X | Expression perception, wide age range, evaluated by participants | ||||||||
iCV-MEFED [52] 2017 | 115 | X | X | Psychologists picked best from 5 |
Table 2.
Posed static databases.
A selection of six primary emotions has been used in databases with this symbol.
4.2. Video databases
The most convenient format for capturing induced and spontaneous emotions is video. This is due to a lack of clear start and end points for non-posed emotions [93]. In the case of RGB Video, the subtle emotional changes known as microexpressions have also been recorded with the hope of detecting concealed emotions as in USF-HD [94], YorkDDT [95], SMIC [34], CASME [96] and Polikovsky’s [55] databases, the newest and most extensive among those being CASME.
Posed video databases in Table 3 suggest that they tend to be quite small in the number of participants, usually around 10, and often professional actors have been used. Unlike with still images, scientists have tried to benefit from voice, speech or any other type of utterances for emotion recognition. Many databases have also tried to gather micro-expressions, as they do not show up on still images or are harder to catch. The posed video databases have mainly focused on six primary emotions and a neutral expression.
Database | Participants | Primary 6 | Neutral | Contempt | Embarrassment | Pain | Smile | Positive | Negative | Other | Additional information |
---|---|---|---|---|---|---|---|---|---|---|---|
University of Maryland DB [97] 1997 | 40 | X | 1–3 expressions per clip | ||||||||
CK [27] 2000 | 97 | X | One of the first FE databases made public | ||||||||
Chen-Huang [28] 2000 | 100 | X | Facial expressions and speech | ||||||||
DaFEx [98] 2004 | 8 | X | X | Italian actors mimicked emotions while uttering different sentences | |||||||
Mind Reading [99] 2004 | 6 | X | X | Teaching tool for children with behavioural disabilities | |||||||
GEMEP [31] 2006 | 10 | ✓ | X | Professional actors, supervised | |||||||
AONE [100] 2007 | 75 | Asian adults | |||||||||
FABO [22] 2007 | 4 | ✓ | X | Face and upper-body | |||||||
IEMOCAP [101] 2008 | 10 | ✓ | X | X | Markers on face, head, hands | ||||||
RML [54] 2008 | 8 | X | Suppressed emotions | ||||||||
Polikovsky’s database [55] 2009 | 10 | X | X | Low intensity micro-expressions | |||||||
SAVEE [102] 2009 | 4 | X | X | Blue markers, three images per emotion | |||||||
STOIC [103] 2009 | 10 | X | X | X | Face recognition, discerning gender, contains still images | ||||||
YorkDDT [95] 2009 | 9 | X | X | Micro-expressions | |||||||
ADFES [104] 2011 | 22 | X | X | X | X | Frontal and turned facial expressions | |||||
USF-HD [94] 2011 | 16 | ✓ | X | Micro-expressions, mimicked shown expressions | |||||||
CASME [96] 2013 | 35 | ✓ | X | X | Micro expressions, suppressed emotions |
Table 3.
Posed video databases.
Media induced databases, as in Table 4, have a larger number of participants and the emotions are usually induced by audiovisual media, like Superbowl ads [107]. Because the emotions in these databases are induced via external means, this format is great for gathering fake [108] or hidden [34] emotions.
Database | Participants | Elicitation | Primary 6 | Neutral | Contempt | Embarrassment | Pain | Smile | Positive | Negative | Other | Additional information |
---|---|---|---|---|---|---|---|---|---|---|---|---|
IAPS [105] 1997 | 497–1483 | Visual media | X | Pleasure and arousal reaction images, subset for children | ||||||||
SD [32] 2004 | 28 | AVM1 | ✓ | X | X | One of the first international induced emotion data-sets | ||||||
eNTERFACE’05 [46] 2006 | 42 | Auditory media | X | Standard for face recognition algorithms | ||||||||
CK+ [44] 2010 | 220 | Posed and AVM | X | Updated version of CK | ||||||||
SMIC [34] 2011 | 6 | AVM | ✓ | Supressed emotions | ||||||||
Face Place [106] 2012 | 235 | AVM | X | X | X | Different ethnicities | ||||||
AM-FED [107] 2013 | 81–240 | AVM | X | X | Reactions to Superbowl ads | |||||||
MAHNOB [51] 2013 | 22 | Posed and AVM | ✓ | X | Laughter recognition research | |||||||
SASE-FE [108] 2017 | 54 | AVM | ✓ | X | Fake emotions |
Table 4.
Media induced video databases.
Audiovisual media.
Interaction induced video databases have more unique ways of gathering data, like child-robot interaction [23] or reviewing past memories [36]. This can be seen in Table 5. This type of databases takes significantly longer time to create [113], but this does not seem to affect the sample size. Almost all of the spontaneous databases are in video format from other media sources, purely because of how difficult they are to collect. Spontaneous databases are also some of the rarest, compared to other elicitation methods. This is reflected in Table 6, which has the lowest number of databases among the different elicitation methods.
Database | Participants | Elicitation | Primary 6 | Neutral | Contempt | Embarrassment | Pain | Smile | Positive | Negative | Other | Additional information |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ISL meeting corpus [35] 2002 | 90 | Human-human interaction | X | X | X | Collected in a meeting fashion | ||||||
AAI [36] 2004 | 60 | Human-human interaction | X | X | X | X | Induced via past memories | |||||
AIBO database [23] 2004 | 30 | Child-robot interaction | ✓ | X | X | Robot instructed by children | ||||||
CSC corpus [37] 2005 | 32 | Human-human interaction | X | Honesty research | ||||||||
RU-FACS [109] 2005 SAL [11] 2005 | 90 24 | Human-human interaction human-computer interaction | X ✓ | X X | Subjects were all university students conversations held with a simulated “chat-bot” system | |||||||
MMI [45] 2006 | 61/29 | Posed/child-comedian interaction, adult-audiovisual media | X | Profile views along with portrait images | ||||||||
TUM AVIC [53] 2007 | 21 | Human-human interaction | X | Commercial presentation | ||||||||
SEMAINE [110, 111] 2010/2012 AVDLC [12] 2013 | 150 292 | Human-human interaction Human-computer interaction | X | X | X X | Operator was thoroughly familiar with SAL script Mood disorder and unipolar depression research | ||||||
RECOLA [112] 2013 | 46 | Human-human interaction | X | Collaborative tasks. Audio-video, ECG and EDA were recorded |
Table 5.
Interaction induced video databases.
Database | Participants | Primary 6 | Neutral | Contempt | Embarrassment | Pain | Smile | Positive | Negative | Other | Additional information |
---|---|---|---|---|---|---|---|---|---|---|---|
Belfast natural database [42] 2003 | 125 | X | X | X | X | Video clips from television and interviews | |||||
Belfast Naturalistic Emotional Database [114] 2003 | 125 | X | X | Studio recordings and television program clips | |||||||
VAM [43] 2008 | 47 | X | Video clips from a talk-show | ||||||||
AFEW [39, 40] 2011/2012 | 330 | X | X | Video clips from movies | |||||||
Spanish Multimodal Opinion [41] 2013 | 105 | X | X | Spanish video clips from YouTube |
Table 6.
Spontaneous video databases.
4.3. Miscellaneous databases
Apart from the formats mentioned above, 3D scanned and even thermal databases of different emotions have also been constructed. The most well-known 3D datasets are the BU-3DFE [15], BU-4DFE [16], Bosphorus [14] and BP4D [17]. BU-3DFE and BU-4DFE both contain posed datasets with six expressions, the latter having higher resolution. Bosphorus tries to address the issue of having a wider selection of facial expressions and BP4D is the only one among the four using induced expressions instead of posed ones. A sample of models from a 3D database can be seen in Figure 8.

Figure 8.
3D facial expression samples from the BU-3DFE database [
With RGB-D databases, however, it is important to note that the data is unique to each sensor with outputs having varying density and error, so algorithms trained on databases like the IIIT-D RGB-D [115], VAP RGB-D [116] and KinectFaceDB [117] would be very susceptible to hardware changes. For comparison with the 3D databases, an RGB-D sample has been provided in Figure 9. One of the newer databases, the iCV SASE [118] database, is RGB-D dataset solely dedicated to headpose with free facial expressions.

Figure 9.
RGB-D facial expression samples from the KinectFaceDB database [
Even though depth based databases, like the ones in Table 7, are relatively new compared to other types and there are very few of them, they still manage to cover a wide range of different emotions. With the release of commercial use depth cameras like the Microsoft Kinect [120], they will only continue to get more popular in the future.
As their applications are more specific, thermal facial expression datasets are very scarce. Some of the first and more known ones are IRIS [123] and Equinox [121, 122], which consist of RGB and thermal image pairs that are labelled with three emotions [124], as can be seen in Figure 10. Thermal databases are usually posed or induced by audiovisual media. The ones in Table 8 mostly focus on positive, negative, neutral and six primary emotions. The average number of participants is quite high relative to other types of databases.

Figure 10.
Thermal images taken from the Equinox database [
4.3.1. Audio databases
There are mainly two types of emotion databases that contain audio content: stand-alone audio databases and video databases that include spoken words or utterances. The information extracted from audio is called context and can be generally categorized into a multitude, wherein the three important context subdivisions for emotion recognition databases are the semantic, structural, and temporal ones.
In case of multimodal data, the audio component can provide a semantic context, which can have a larger bearing on the emotion than the facial expressions themselves [11, 23]. However, in case of solely audio data, like the Bank and Stock Service [126] and ACC [127] databases, the context of the speech plays a quintessential role in emotion recognition [128, 129].
The audio databases in Table 9 are very scarce and tailored to specific needs, like the Banse-Schrerer [26], which has only four participants and was gathered to see whether judges can deduce emotions from vocal cues. The easiest way to gather a larger amount of audio data is from call-centres, where the emotions are elicited either by another person or a computer program.
Database | Participants | Format | Primary 6 | Neutral | Contempt | Embarrassment | Pain | Smile | Positive | Negative | Other | Additional information |
---|---|---|---|---|---|---|---|---|---|---|---|---|
BU-3DFE [15] 2006 | 100 | 3D images | X | Ethnically diverse, two angled views | ||||||||
Bosphorus [14] 2008 | 105 | 3D images | X | Occlusions, less ethnic diversity than BU-3DF | ||||||||
BU-4DFE [16] 2008 | 101 | 3D videos | Newer version of BU-3DFE, has 3D videos | |||||||||
VAP RGB-D [116] 2012 | 31 | RGB-D videos | X | X | 17 different recorded states repeated 3 times for each person | |||||||
PICS [119] 2013 | — | Images, videos, 3D images | Includes several different datasets and is still ongoing | |||||||||
BP4D [17] 2014 | 41 | 3D videos | X | X | X | Human-human interaction | ||||||
IIIT-D RGB-D [115] 2014 | 106 | RGB-D images | X | X | Captured with Kinect | |||||||
KinectFaceDB [117] 2014 | 52 | RGB-D images, videos | X | X | Captured with Kinect, varying occlusions |
Table 7.
3D and RGB-D databases.
Database | Participants | Elicitation | Primary 6 | Neutral | Contempt | Embarrassment | Pain | Smile | Positive | Negative | Other | Additional information |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Equinox [121, 122] 2002 | 340 | Posed | X | X | X | Captured in SWIR, MWIR and LWIR | ||||||
IRIS [123] 2007 | 4228 | Posed | X | X | X | Some of the first thermal FE data-sets | ||||||
NVIE [47] 2010 | 215 | Posed and AVM1 | X | Spontaneous expressions are not present for every subject | ||||||||
KTFE [125] 2014 | 26 | Posed and AVM | X | X |
Table 8.
Thermal databases.
Audiovisual media.
Database | Participants | Elicitation | Primary 6 | Neutral | Contempt | Embarrassment | Pain | Smile | Positive | Negative | Other | Additional information |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Banse-Scherer [26] 1996 | 4 | Posed | X | X | X | X | Vocally expressed emotions | |||||
Bank and Stock Service [126] 2004 | 350 | Human-human interaction | ✓ | X | X | Collected from a call center and Capital Bank Service Center | ||||||
ACC [127] 2005 | 1187 | Human-computer interaction | X | X | Collected from automated call center applications |
Table 9.
Audio databases.
Even with all of the readily available databases out there, there is still a need for creating self-collected databases for emotion recognition, as the existing ones don’t always fulfil all of the criteria [130, 131, 132, 133].
5. Conclusion
With the rapid increase of computing power and size of data, it has become more and more feasible to distinguish emotions, identify people, and verify honesty based on video, audio or image input, taking a large step forward not only in human-computer interaction, but also in mental illness detection, medical research, security and so forth. In this paper an overview of existing face databases in varying categories has been given. They have been organised into tables to give the reader an easy way to find necessary data. This paper should be a good starting point for anyone who considers training a model for emotion recognition.
Acknowledgments
This work has been partially supported by Estonian Research Council Grant PUT638, The Scientific and Technological Research Council of Turkey 1001 Project (116E097), The Spanish project TIN2016-74946-P (MINECO/FEDER, UE), CERCA Programme/Generalitat de Catalunya, the COST Action IC1307 iV&L Net (European Network on Integrating Vision and Language) supported by COST (European Cooperation in Science and Technology), and the Estonian Centre of Excellence in IT (EXCITE) funded by the European Regional Development Fund. We also gratefully acknowledge the support of the NVIDIA Corporation with the donation of the Titan X Pascal GPU.