Abstract
Training large-scale architectures such as Generative Adversarial Networks (GANs) in order to investigate audio-visual relations in emotion-enriched interactions is a challenging task. This procedure is hindered by the high complexity as well as the mode collapse phenomenon. Sufficiently training these architectures requires a massive amount of data. Furthermore, creating extensive audio-visual datasets for specific tasks, like emotion recognition, is a complicated task handicapped by the annotation cost and labelling ambiguities. On the other hand, it is much more forthright to get access to unlabeled audio-visual datasets mainly due to the easy access to online multimedia content. In this work, a progressive process for training GANs was conducted. The first step leverages enormous audio-visual unlabeled datasets to expose concealed cross-modal relationships. Meanwhile in the second step, a calibration of the weights by employing a limited amount of emotion annotated data was performed. Through experimentation, it was shown that our progressive GANs schema leads to a more efficient optimization of the whole network, and the generated samples from the target domain, when fused with the authentic ones, provides enhanced emotion recognition results.
Original language | English |
---|---|
Title of host publication | 2020 IEEE International Conference on Image Processing, ICIP 2020 - Proceedings |
Publisher | IEEE |
Pages | 236-240 |
Number of pages | 5 |
ISBN (Electronic) | 9781728163956 |
DOIs | |
Publication status | Published - Oct 2020 |
Event | 2020 IEEE International Conference on Image Processing (ICIP) - Abu Dhabi, Abu Dhabi, United Arab Emirates Duration: 25 Oct 2020 → 28 Oct 2020 https://2020.ieeeicip.org/ |
Conference
Conference | 2020 IEEE International Conference on Image Processing (ICIP) |
---|---|
Abbreviated title | ICIP |
Country/Territory | United Arab Emirates |
City | Abu Dhabi |
Period | 25/10/20 → 28/10/20 |
Internet address |