Abstract
Affective computing is a subset of the larger field of human-computer interaction, having important connections with cognitive processes, influencing the learning process, decision-making and perception. Out of the multiple means of communication, facial expressions are one of the most widely accepted channels for emotion modulation, receiving an increased attention during the last few years. An important aspect, contributing to their recognition success, concerns modeling the temporal dimension. Therefore, this paper aims to investigate the applicability of current state-of-the-art action recognition techniques to the human emotion recognition task. In particular, two different architectures were investigated, a CNN-based model, named Temporal Shift Module (TSM) that can learn spatiotemporal features in 3D data with the computational complexity of a 2D CNN and a video based vision transformer, employing spatio-temporal self attention. The models were trained and tested on the CREMA-D dataset, demonstrating state-of-the-art performance, with a mean class accuracy of 82% and 77% respectively, while outperforming best previous approaches by at least 3.5%
Original language | English |
---|---|
Title of host publication | 2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) |
Publisher | IEEE |
Pages | 01-08 |
Number of pages | 8 |
DOIs | |
Publication status | Published - 1 Sept 2021 |
Event | 2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos - Nara, Japan Duration: 28 Sept 2021 → 1 Oct 2021 Conference number: 29 |
Conference
Conference | 2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos |
---|---|
Abbreviated title | ACIIW 2021 |
Country/Territory | Japan |
City | Nara |
Period | 28/09/21 → 1/10/21 |
Keywords
- Temporal shift module(TSM)
- Vision transformers
- Emotion recognition
- Action recognition