Multimodal Attention-Mechanism For Temporal Emotion Recognition

Esam Ghaleb; Jan Niehues; Stelios Asteriadis

doi:10.1109/icip40778.2020.9191019

Multimodal Attention-Mechanism For Temporal Emotion Recognition

Esam Ghaleb, Jan Niehues, Stelios Asteriadis

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

Abstract

Exploiting the multimodal and temporal interaction between audio-visual channels is essential for automatic audio-video emotion recognition (AVER). Modalities' strength in emotions and time-window of a video-clip could be further utilized through a weighting scheme such as attention mechanism to capture their complementary information. The attention mechanism is a powerful approach for sequence modeling, which can be employed to fuse audio-video cues overtime. We propose a novel framework which consists of biaudio-visual time-windows that span short video-clips labeled with discrete emotions. Attention is used to weigh these time windows for multimodal learning and fusion. Experimental results on two datasets show that the proposed methodology can achieve an enhanced multimodal emotion recognition.

Original language	English
Title of host publication	2020 IEEE International Conference on Image Processing (ICIP)
Publisher	IEEE
Pages	251-255
Number of pages	5
ISBN (Print)	9781728163956
DOIs	https://doi.org/10.1109/icip40778.2020.9191019
Publication status	Published - 25 Oct 2020
Event	2020 IEEE International Conference on Image Processing (ICIP) - Abu Dhabi, Abu Dhabi, United Arab Emirates Duration: 25 Oct 2020 → 28 Oct 2020 https://2020.ieeeicip.org/

Publication series

Series	IEEE International Conference on Image Processing ICIP
ISSN	1522-4880

Conference

Conference	2020 IEEE International Conference on Image Processing (ICIP)
Abbreviated title	ICIP
Country/Territory	United Arab Emirates
City	Abu Dhabi
Period	25/10/20 → 28/10/20
Internet address	https://2020.ieeeicip.org/

Keywords

attention
multimodal learning
audiovisual emotion recognition

Access to Document

10.1109/icip40778.2020.9191019

Cite this

@inproceedings{75a95ea8736a4bda9f4f834090a5faa9,

title = "Multimodal Attention-Mechanism For Temporal Emotion Recognition",

abstract = "Exploiting the multimodal and temporal interaction between audio-visual channels is essential for automatic audio-video emotion recognition (AVER). Modalities' strength in emotions and time-window of a video-clip could be further utilized through a weighting scheme such as attention mechanism to capture their complementary information. The attention mechanism is a powerful approach for sequence modeling, which can be employed to fuse audio-video cues overtime. We propose a novel framework which consists of biaudio-visual time-windows that span short video-clips labeled with discrete emotions. Attention is used to weigh these time windows for multimodal learning and fusion. Experimental results on two datasets show that the proposed methodology can achieve an enhanced multimodal emotion recognition.",

keywords = "attention, multimodal learning, audiovisual emotion recognition",

author = "Esam Ghaleb and Jan Niehues and Stelios Asteriadis",

year = "2020",

month = oct,

day = "25",

doi = "10.1109/icip40778.2020.9191019",

language = "English",

isbn = "9781728163956",

series = "IEEE International Conference on Image Processing ICIP",

publisher = "IEEE",

pages = "251--255",

booktitle = "2020 IEEE International Conference on Image Processing (ICIP)",

address = "United States",

note = "2020 IEEE International Conference on Image Processing (ICIP), ICIP ; Conference date: 25-10-2020 Through 28-10-2020",

url = "https://2020.ieeeicip.org/",

}

Ghaleb, E, Niehues, J & Asteriadis, S 2020, Multimodal Attention-Mechanism For Temporal Emotion Recognition. in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, IEEE International Conference on Image Processing ICIP, pp. 251-255, 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25/10/20. https://doi.org/10.1109/icip40778.2020.9191019

Multimodal Attention-Mechanism For Temporal Emotion Recognition. / Ghaleb, Esam; Niehues, Jan; Asteriadis, Stelios.
2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020. p. 251-255 (IEEE International Conference on Image Processing ICIP).

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - Multimodal Attention-Mechanism For Temporal Emotion Recognition

AU - Ghaleb, Esam

AU - Niehues, Jan

AU - Asteriadis, Stelios

PY - 2020/10/25

Y1 - 2020/10/25

N2 - Exploiting the multimodal and temporal interaction between audio-visual channels is essential for automatic audio-video emotion recognition (AVER). Modalities' strength in emotions and time-window of a video-clip could be further utilized through a weighting scheme such as attention mechanism to capture their complementary information. The attention mechanism is a powerful approach for sequence modeling, which can be employed to fuse audio-video cues overtime. We propose a novel framework which consists of biaudio-visual time-windows that span short video-clips labeled with discrete emotions. Attention is used to weigh these time windows for multimodal learning and fusion. Experimental results on two datasets show that the proposed methodology can achieve an enhanced multimodal emotion recognition.

AB - Exploiting the multimodal and temporal interaction between audio-visual channels is essential for automatic audio-video emotion recognition (AVER). Modalities' strength in emotions and time-window of a video-clip could be further utilized through a weighting scheme such as attention mechanism to capture their complementary information. The attention mechanism is a powerful approach for sequence modeling, which can be employed to fuse audio-video cues overtime. We propose a novel framework which consists of biaudio-visual time-windows that span short video-clips labeled with discrete emotions. Attention is used to weigh these time windows for multimodal learning and fusion. Experimental results on two datasets show that the proposed methodology can achieve an enhanced multimodal emotion recognition.

KW - attention

KW - multimodal learning

KW - audiovisual emotion recognition

U2 - 10.1109/icip40778.2020.9191019

DO - 10.1109/icip40778.2020.9191019

M3 - Conference article in proceeding

SN - 9781728163956

T3 - IEEE International Conference on Image Processing ICIP

SP - 251

EP - 255

BT - 2020 IEEE International Conference on Image Processing (ICIP)

PB - IEEE

T2 - 2020 IEEE International Conference on Image Processing (ICIP)

Y2 - 25 October 2020 through 28 October 2020

ER -