The analysis of dynamic scenes in video is a very useful task especially for the detection and monitoring of natural hazards such as floods and fires. In this work, we focus on the challenging problem of real-world dynamic scene understanding, where videos contain dynamic textures that have been recorded in the "wild". These videos feature large illumination variations, complex motion, occlusions, camera motion, as well as significant intra-class differences, as the motion patterns of dynamic textures of the same category may be subject to large variations in real world recordings. We address these issues by introducing a novel dynamic texture descriptor, the "Local Binary Pattern-flow" (LBP-flow), which is shown to be able to accurately classify dynamic scenes whose complex motion patterns are difficult to separate using existing local descriptors, or which cannot be modelled by statistical techniques. LBP-flow builds upon existing Local Binary Pattern (LBP) descriptors by providing a low-cost representation of both appearance and optical flow textures, to increase its representation capabilities. The descriptor statistics are encoded with the Fisher vector, an informative mid-level descriptor, while a neural network follows to reduce the dimensionality and increase the discriminability of the encoded descriptor. The proposed algorithm leads to a highly accurate spatio-temporal descriptor which achieves a very low computational cost, enabling the deployment of our descriptor in real world surveillance and security applications. Experiments on challenging benchmark datasets demonstrate that it achieves recognition accuracy results that surpass State-of-the-Art dynamic texture descriptors.
|Title of host publication
|MSF 2017 IEEE/ISPRS 4th Joint Workshop on Multi-Sensor Fusion for Dynamic Scene Understanding, in conjunction with ICCV 2017
|Published - 2017