Abstract
Training large-scale architectures such as Generative Adversarial Networks (GANs) in order to investigate audio-visual relations in emotion-enriched interactions is a challenging task. This procedure is hindered by the high complexity as well as the mode collapse phenomenon. Sufficiently training these architectures requires a massive amount of data. Furthermore, creating extensive audio-visual datasets for specific tasks, like emotion recognition, is a complicated task handicapped by the annotation cost and labelling ambiguities. On the other hand, it is much more forthright to get access to unlabeled audio-visual datasets mainly due to the easy access to online multimedia content. In this work, a progressive process for training GANs was conducted. The first step leverages enormous audio-visual unlabeled datasets to expose concealed cross-modal relationships. Meanwhile in the second step, a calibration of the weights by employing a limited amount of emotion annotated data was performed. Through experimentation, it was shown that our progressive GANs schema leads to a more efficient optimization of the whole network, and the generated samples from the target domain, when fused with the authentic ones, provides enhanced emotion recognition results.
| Original language | English |
|---|---|
| Title of host publication | 2020 IEEE International Conference on Image Processing, ICIP 2020 - Proceedings |
| Publisher | IEEE |
| Pages | 236-240 |
| Number of pages | 5 |
| ISBN (Electronic) | 9781728163956 |
| DOIs | |
| Publication status | Published - Oct 2020 |
| Event | 2020 IEEE International Conference on Image Processing (ICIP) - Abu Dhabi, Abu Dhabi, United Arab Emirates Duration: 25 Oct 2020 → 28 Oct 2020 https://2020.ieeeicip.org/ |
Conference
| Conference | 2020 IEEE International Conference on Image Processing (ICIP) |
|---|---|
| Abbreviated title | ICIP |
| Country/Territory | United Arab Emirates |
| City | Abu Dhabi |
| Period | 25/10/20 → 28/10/20 |
| Internet address |