Abstract
Exploiting the multimodal and temporal interaction between audio-visual channels is essential for automatic audio-video emotion recognition (AVER). Modalities' strength in emotions and time-window of a video-clip could be further utilized through a weighting scheme such as attention mechanism to capture their complementary information. The attention mechanism is a powerful approach for sequence modeling, which can be employed to fuse audio-video cues overtime. We propose a novel framework which consists of biaudio-visual time-windows that span short video-clips labeled with discrete emotions. Attention is used to weigh these time windows for multimodal learning and fusion. Experimental results on two datasets show that the proposed methodology can achieve an enhanced multimodal emotion recognition.
Original language | English |
---|---|
Title of host publication | 2020 IEEE International Conference on Image Processing (ICIP) |
Publisher | IEEE |
Pages | 251-255 |
Number of pages | 5 |
ISBN (Print) | 9781728163956 |
DOIs | |
Publication status | Published - 25 Oct 2020 |
Event | 2020 IEEE International Conference on Image Processing (ICIP) - Abu Dhabi, Abu Dhabi, United Arab Emirates Duration: 25 Oct 2020 → 28 Oct 2020 https://2020.ieeeicip.org/ |
Publication series
Series | IEEE International Conference on Image Processing ICIP |
---|---|
ISSN | 1522-4880 |
Conference
Conference | 2020 IEEE International Conference on Image Processing (ICIP) |
---|---|
Abbreviated title | ICIP |
Country/Territory | United Arab Emirates |
City | Abu Dhabi |
Period | 25/10/20 → 28/10/20 |
Internet address |
Keywords
- attention
- multimodal learning
- audiovisual emotion recognition