Temporal conditional Wasserstein GANs for audio-visual affect-related ties

Christos Athanasiadis*, Enrique Hortal, Stelios Asteriadis

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

70 Downloads (Pure)

Abstract

Emotion recognition through audio is a rather challenging task that entails proper feature extraction and classification. Meanwhile, state-of-the-art classification strategies are usually based on deep learning architectures. Training complex deep learning networks normally requires very large audiovisual corpora with available emotion annotations. However, such availability is not always guaranteed since harvesting and annotating such datasets is a time-consuming task. In this work, temporal conditional Wasserstein Generative Adversarial Networks (tc-wGANs) are introduced to generate robust audio data by leveraging information from a face modality. Having as input temporal facial features extracted using a dynamic deep learning architecture (based on 3dCNN, LSTM and Transformer networks) and, additionally, conditional information related to annotations, our system manages to generate realistic spectrograms that represent audio clips corresponding to specific emotional context. As proof of their validity, apart from three quality metrics (Frechet Inception Distance, Inception Score and Structural Similarity index), we verified the generated samples applying an audio-based emotion recognition schema. When the generated samples are fused with the initial real ones, an improvement between 3.5 to 5.5% was achieved in audio emotion recognition performance for two state-of-the-art datasets.
Original languageEnglish
Title of host publication2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)
Pages1-8
Number of pages8
DOIs
Publication statusPublished - 2021
Event2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos - Nara, Japan
Duration: 28 Sept 20211 Oct 2021
Conference number: 29

Conference

Conference2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos
Abbreviated titleACIIW 2021
Country/TerritoryJapan
CityNara
Period28/09/211/10/21

Keywords

  • Domain Adaptation
  • Audio Emotion Recognition
  • Generative Adversarial Networks
  • Attention Mechanisms

Cite this