Multimodal Attention-Mechanism For Temporal Emotion Recognition

Esam Ghaleb, Jan Niehues, Stelios Asteriadis

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

Exploiting the multimodal and temporal interaction between audio-visual channels is essential for automatic audio-video emotion recognition (AVER). Modalities' strength in emotions and time-window of a video-clip could be further utilized through a weighting scheme such as attention mechanism to capture their complementary information. The attention mechanism is a powerful approach for sequence modeling, which can be employed to fuse audio-video cues overtime. We propose a novel framework which consists of biaudio-visual time-windows that span short video-clips labeled with discrete emotions. Attention is used to weigh these time windows for multimodal learning and fusion. Experimental results on two datasets show that the proposed methodology can achieve an enhanced multimodal emotion recognition.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Image Processing (ICIP)
PublisherIEEE
Pages251-255
Number of pages5
ISBN (Print)9781728163956
DOIs
Publication statusPublished - 25 Oct 2020
Event2020 IEEE International Conference on Image Processing (ICIP) - Abu Dhabi, Abu Dhabi, United Arab Emirates
Duration: 25 Oct 202028 Oct 2020
https://2020.ieeeicip.org/

Publication series

SeriesIEEE International Conference on Image Processing ICIP
ISSN1522-4880

Conference

Conference2020 IEEE International Conference on Image Processing (ICIP)
Abbreviated titleICIP
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period25/10/2028/10/20
Internet address

Keywords

  • attention
  • multimodal learning
  • audiovisual emotion recognition

Cite this