Abstract
Moving object detection and classification are fundamental tasks in computer vision. However, current solutions detect all objects, and then another algorithm is used to determine which objects are in motion. Furthermore, diverse solutions employ complex networks that require a lot of computational resources, unlike lightweight solutions that could lead to widespread use. We introduce TRG-Net, a unified model that can be executed on computationally limited devices to detect and classify just moving objects. This proposal is based on the Faster R-CNN architecture, MobileNetV3 as a feature extractor, and a Gaussian mixture model for a fast search of regions of interest based on motion. TRG-Net reduces the inference time by unifying moving object detection and image classification tasks, and by limiting the regions of interest to the number of moving objects. Experiments over surveillance videos and the Kitti dataset for 2D object detection show that our approach improves the inference time of Faster R-CNN (0.221 to 0.138s) using fewer parameters (18.91 M to 18.30 M) while maintaining average precision (AP=0.423). Therefore, TRG-Net achieves a balance between precision and speed, and could be applied in various real-world scenarios.
Original language | English |
---|---|
Pages (from-to) | 173-184 |
Number of pages | 12 |
Journal | VISIGRAPP. Proceedings |
Volume | 5 |
Early online date | 1 Jan 2023 |
DOIs | |
Publication status | Published - 2023 |
Event | 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Lisbon, Portugal Duration: 19 Feb 2023 → 21 Feb 2023 Conference number: 18 https://visapp.scitevents.org/?y=2023 |
Keywords
- Classification
- Detection
- Gaussian Mixture
- Lightweight Model
- Moving Objects