Abstract
In the field of machine learning, it is common practice to use benchmark datasets to prove the working of a method. The domain of action recognition in videos often uses datasets like Kinet-ics, Something-Something, UCF-101 and HMDB-51 to report results. Considering the properties of the datasets, there are no datasets that focus solely on very short clips (2 to 3 seconds), and on highly-similar fine-grained actions within one specific domain. This paper researches how current state-of-the-art action recognition methods perform on a dataset that consists of highly similar, fine-grained actions. To do so, a dataset of skateboarding tricks was created. The performed analysis highlights both benefits and limitations of state-of-the-art methods, while proposing future research directions in the activity recognition domain. The conducted research shows that the best results are obtained by fusing RGB data with OpenPose data for the Temporal Shift Module.
Original language | English |
---|---|
Publication status | Published - 25 Aug 2021 |
Event | 15. International Conference on Computer Vision and Image Processing- Paris | World Academy of Science, Engineering and Technology - Paris, France Duration: 30 Dec 2021 → 31 Dec 2021 https://app.qwoted.com/opportunities/event-iccvip-2021-15-international-conference-on-computer-vision-and-image-processing-paris |
Conference
Conference | 15. International Conference on Computer Vision and Image Processing- Paris | World Academy of Science, Engineering and Technology |
---|---|
Abbreviated title | ICCVIP 2021 |
Country/Territory | France |
City | Paris |
Period | 30/12/21 → 31/12/21 |
Internet address |