Deep, Dimensional and Multimodal Emotion Recognition Using Attention Mechanisms

Jan Lucas, Esam Ghaleb, Stelios Asteriadis

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review


Emotion recognition is an increasingly important sub-field in artificial intelligence (AI). Advances in this field could drastically change the way people interact with computers and allow for automation of tasks that currently require a lot of manual work. For example, registering the emotion a subject expresses for a potential advert. Previous work has shown that using multiple modalities, although challenging, is very beneficial. Affective cues in audio and video may not occur simultaneously, and the modalities do not always contribute equally to emotion. This work seeks to apply attention mechanisms to aid in the fusion of audio and video, for the purpose of emotion recognition using state-of-the-art techniques from artificial intelligence and, more specifically, deep neural networks. To achieve this, two forms of attention are used. Embedding attention applies attention on the input of a modality-specific model, allowing recurrent networks to consider multiple input time steps. Bimodal attention fusion applies attention to fuse the output of modality-specific networks. Combining both these attention mechanisms yielded CCCs of 0.62 and 0.72 for arousal and valence respectively on the RECOLA dataset used in AVEC 2016. These results are competitive with the state-of-the-art, underlying the potential of attention mechanisms in multimodal fusion for behavioral signals.

Original languageEnglish
Title of host publicationThe annual Benelux Conference on Artificial Intelligence and Machine Learning (BNAIC 2020)
Subtitle of host publicationBNAIC/BeneLearn 2020
Number of pages10
Publication statusPublished - 2020
EventBenelux Conference on Artificial Intelligence and Machine Learning - Online, Leiden University, Leiden, Netherlands
Duration: 19 Nov 202020 Nov 2020


ConferenceBenelux Conference on Artificial Intelligence and Machine Learning
Abbreviated titleBNAIC/BeneLearn 2020
Internet address


Dive into the research topics of 'Deep, Dimensional and Multimodal Emotion Recognition Using Attention Mechanisms'. Together they form a unique fingerprint.

Cite this