Affiliation:
1. Sri Venkateshwara University College of Engineering, Tirupati, Andhra Pradesh, India.
Abstract
Video processing has become a vital area in computer vision and deep learning, with diverse applications including crowd analysis, anomaly identification, and activity tracking. Although numerous surveys have examined various aspects of these functionalities, there is still a requirement for a complete review that combines these findings into a coherent perspective. This survey study provides a comprehensive analysis of several model architectures, emphasising their advantages, shortcomings, and constraints. We also emphasise the profound influence of these technologies in several fields, such as surveillance, healthcare, and autonomous systems, specifically focussing on the applications of deep learning in video processing. Our review not only analyses the latest advancements but also explores the complex processes and tactics used by deep learning models to derive valuable insights from video data. Furthermore, we examine the importance of accessible datasets and their crucial role in propelling research progress in this field. By outlining the obstacles and concerns that researchers have while adopting these systems, we offer a clear plan for future research paths. We want to stimulate ongoing innovation and advancement in the domain of video processing using deep learning techniques.
Reference27 articles.
1. Basak, H., Kundu, R., Singh, P.K., Ijaz, M.F., Woźniak, M., Sarkar, R.: A unionof deep learning and swarm-based optimization for 3d human action recognition. Scientific Reports 12(1), 5494 (2022)
2. Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: Alarge-scale video benchmark for human activity understanding. In: Proceedings of the ieee conference on computer vision and pattern recognition. pp. 961–970 (2015)
3. Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and thekinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6299–6308 (2017)
4. Elboushaki, A., Hannane, R., Afdel, K., Koutti, L.: Multid-cnn: A multidimensional feature learning approach based on deep convolutional networks for gesture recognition in rgb-d image sequences. Expert Systems with Applications 139, 112829 (2020)
5. Fernando, B., Gould, S.: Learning end-to-end video classification with rankpooling. In: International Conference on Machine Learning. pp. 1187–1196. PMLR (2016)