Audio-Based Emotion Recognition Enhancement Through Progressive GANS

Christos Athanasiadis*, Enrique Hortal, Stelios Asteriadis

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

57 Downloads (Pure)


Training large-scale architectures such as Generative Adversarial Networks (GANs) in order to investigate audio-visual relations in emotion-enriched interactions is a challenging task. This procedure is hindered by the high complexity as well as the mode collapse phenomenon. Sufficiently training these architectures requires a massive amount of data. Furthermore, creating extensive audio-visual datasets for specific tasks, like emotion recognition, is a complicated task handicapped by the annotation cost and labelling ambiguities. On the other hand, it is much more forthright to get access to unlabeled audio-visual datasets mainly due to the easy access to online multimedia content. In this work, a progressive process for training GANs was conducted. The first step leverages enormous audio-visual unlabeled datasets to expose concealed cross-modal relationships. Meanwhile in the second step, a calibration of the weights by employing a limited amount of emotion annotated data was performed. Through experimentation, it was shown that our progressive GANs schema leads to a more efficient optimization of the whole network, and the generated samples from the target domain, when fused with the authentic ones, provides enhanced emotion recognition results.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Image Processing (ICIP)
Number of pages5
Publication statusPublished - 2020
Event2020 IEEE International Conference on Image Processing (ICIP) - Abu Dhabi, Abu Dhabi, United Arab Emirates
Duration: 25 Oct 202028 Oct 2020


Conference2020 IEEE International Conference on Image Processing (ICIP)
Abbreviated titleICIP
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Internet address

Cite this