Bimodal emotion recognition through audio-visual cues

Esam A.H. Ghaleb

doi:10.26481/dis.20210708eg

Bimodal emotion recognition through audio-visual cues

Esam A.H. Ghaleb

Research output: Thesis › Doctoral Thesis › Internal

865 Downloads (Pure)

Abstract

Emotions play a crucial role in human-human communication with a complex socio-psychological nature, making emotion recognition a challenging task. In this dissertation, we study emotion recognition from audio and visual cues in video clips, utilizing facial expressions and speech signals, which are among the most prominent emotional expression channels. We propose novel computational methods to capture the complementary information provided by audio-visual cues for enhanced emotion recognition. The research in this dissertation shows how emotion recognition depends on emotion annotation, the perceived modalities, modalities' robust data representations, and computational modeling. It presents progressive fusion techniques for audio-visual representations that are essential to improve their performance. Furthermore, the methods aim at exploiting the temporal dynamics of audio-visual cues and detect the informative time segments from both modalities. The dissertation presents meta-analysis studies and extensive evaluations for multimodal and temporal emotion recognition.

Original language	English
Awarding Institution	Maastricht University
Supervisors/Advisors	Asteriadis, Stelios, Supervisor Weiss, Gerhard, Supervisor
Award date	8 Jul 2021
Place of Publication	Maastricht
Publisher	ProefschriftMaken
Print ISBNs	9789464233070
DOIs	https://doi.org/10.26481/dis.20210708eg
Publication status	Published - 2021

Keywords

Affective Computing
Machine Learning
Audio-Visual Emotion Recognition
Shallow and Deep Metric Learning
Attention Mechanisms

Access to Document

10.26481/dis.20210708eg

Full TextFinal published version, 15.2 MB
SummaryFinal published version, 85.3 KB
PropositionsFinal published version, 39.5 KB
CoverFinal published version, 53.4 KB
ImpactFinal published version, 114 KB

Cite this

@phdthesis{38b93e193c6848b7905862bb4be01d59,

title = "Bimodal emotion recognition through audio-visual cues",

abstract = "Emotions play a crucial role in human-human communication with a complex socio-psychological nature, making emotion recognition a challenging task. In this dissertation, we study emotion recognition from audio and visual cues in video clips, utilizing facial expressions and speech signals, which are among the most prominent emotional expression channels. We propose novel computational methods to capture the complementary information provided by audio-visual cues for enhanced emotion recognition. The research in this dissertation shows how emotion recognition depends on emotion annotation, the perceived modalities, modalities' robust data representations, and computational modeling. It presents progressive fusion techniques for audio-visual representations that are essential to improve their performance. Furthermore, the methods aim at exploiting the temporal dynamics of audio-visual cues and detect the informative time segments from both modalities. The dissertation presents meta-analysis studies and extensive evaluations for multimodal and temporal emotion recognition.",

keywords = "Affective Computing, Machine Learning, Audio-Visual Emotion Recognition, Shallow and Deep Metric Learning, Attention Mechanisms",

author = "Ghaleb, {Esam A.H.}",

year = "2021",

doi = "10.26481/dis.20210708eg",

language = "English",

isbn = "9789464233070",

series = "Siks Dissertation Series",

number = "2021-16",

publisher = " ProefschriftMaken",

address = "Netherlands",

school = "Maastricht University",

}

TY - BOOK

T1 - Bimodal emotion recognition through audio-visual cues

AU - Ghaleb, Esam A.H.

PY - 2021

Y1 - 2021

N2 - Emotions play a crucial role in human-human communication with a complex socio-psychological nature, making emotion recognition a challenging task. In this dissertation, we study emotion recognition from audio and visual cues in video clips, utilizing facial expressions and speech signals, which are among the most prominent emotional expression channels. We propose novel computational methods to capture the complementary information provided by audio-visual cues for enhanced emotion recognition. The research in this dissertation shows how emotion recognition depends on emotion annotation, the perceived modalities, modalities' robust data representations, and computational modeling. It presents progressive fusion techniques for audio-visual representations that are essential to improve their performance. Furthermore, the methods aim at exploiting the temporal dynamics of audio-visual cues and detect the informative time segments from both modalities. The dissertation presents meta-analysis studies and extensive evaluations for multimodal and temporal emotion recognition.

AB - Emotions play a crucial role in human-human communication with a complex socio-psychological nature, making emotion recognition a challenging task. In this dissertation, we study emotion recognition from audio and visual cues in video clips, utilizing facial expressions and speech signals, which are among the most prominent emotional expression channels. We propose novel computational methods to capture the complementary information provided by audio-visual cues for enhanced emotion recognition. The research in this dissertation shows how emotion recognition depends on emotion annotation, the perceived modalities, modalities' robust data representations, and computational modeling. It presents progressive fusion techniques for audio-visual representations that are essential to improve their performance. Furthermore, the methods aim at exploiting the temporal dynamics of audio-visual cues and detect the informative time segments from both modalities. The dissertation presents meta-analysis studies and extensive evaluations for multimodal and temporal emotion recognition.

KW - Affective Computing

KW - Machine Learning

KW - Audio-Visual Emotion Recognition

KW - Shallow and Deep Metric Learning

KW - Attention Mechanisms

U2 - 10.26481/dis.20210708eg

DO - 10.26481/dis.20210708eg

M3 - Doctoral Thesis

SN - 9789464233070

T3 - Siks Dissertation Series

PB - ProefschriftMaken

CY - Maastricht

ER -