Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification-Reference-Cited by-同舟云学术

Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification

Published:2023 Issue:7 Volume:20 Page:12529-12561
ISSN:1551-0018
Container-title:Mathematical Biosciences and Engineering
language:
Short-container-title:MBE

Author:

Swathi H. Y.¹,Shivakumar G.²

Affiliation:

1. Department of Electronics and Communication Engineering, Malnad College of Engineering, Visvesvaraya Technological University, Belagavi, India

2. Department of Electronics and Communication Engineering, AMC Engineering College, Visvesvaraya Technological University, Belagavi, India

Abstract

<abstract> <p>The high pace emergence in advanced software systems, low-cost hardware and decentralized cloud computing technologies have broadened the horizon for vision-based surveillance, monitoring and control. However, complex and inferior feature learning over visual artefacts or video streams, especially under extreme conditions confine majority of the at-hand vision-based crowd analysis and classification systems. Retrieving event-sensitive or crowd-type sensitive spatio-temporal features for the different crowd types under extreme conditions is a highly complex task. Consequently, it results in lower accuracy and hence low reliability that confines existing methods for real-time crowd analysis. Despite numerous efforts in vision-based approaches, the lack of acoustic cues often creates ambiguity in crowd classification. On the other hand, the strategic amalgamation of audio-visual features can enable accurate and reliable crowd analysis and classification. Considering it as motivation, in this research a novel audio-visual multi-modality driven hybrid feature learning model is developed for crowd analysis and classification. In this work, a hybrid feature extraction model was applied to extract deep spatio-temporal features by using Gray-Level Co-occurrence Metrics (GLCM) and AlexNet transferrable learning model. Once extracting the different GLCM features and AlexNet deep features, horizontal concatenation was done to fuse the different feature sets. Similarly, for acoustic feature extraction, the audio samples (from the input video) were processed for static (fixed size) sampling, pre-emphasis, block framing and Hann windowing, followed by acoustic feature extraction like GTCC, GTCC-Delta, GTCC-Delta-Delta, MFCC, Spectral Entropy, Spectral Flux, Spectral Slope and Harmonics to Noise Ratio (HNR). Finally, the extracted audio-visual features were fused to yield a composite multi-modal feature set, which is processed for classification using the random forest ensemble classifier. The multi-class classification yields a crowd-classification accurac12529y of (98.26%), precision (98.89%), sensitivity (94.82%), specificity (95.57%), and F-Measure of 98.84%. The robustness of the proposed multi-modality-based crowd analysis model confirms its suitability towards real-world crowd detection and classification tasks.</p> </abstract>

Publisher

American Institute of Mathematical Sciences (AIMS)

Subject

Applied Mathematics,Computational Mathematics,General Agricultural and Biological Sciences,Modeling and Simulation,General Medicine

Reference81 articles.

1. J. C. S. Jacques Junior, S. R. Musse, C. R. Jung, Crowd analysis using computer vision techniques, IEEE Signal Process. Mag., 27 (2010), 66–77. https://doi.org/10.1109/MSP.2010.937394

2. Z. Jie, D. Gao, D. Zhang, Moving vehicle detection for automatic traffic monitoring, vehicular technology, IEEE Trans. Veh. Technol., 56 (2007), 51–59. https://doi.org/10.1109/TVT.2006.883735

3. H. Qiu, X. Liu, S. Rallapalli, A. J. Bency, K. Chan, R. Urgaonkar, et al., Kestrel: Video analytics for augmented multi-camera vehicle tracking, in 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI), IEEE, Orlando, FL, (2018), 48–59. https://doi.org/10.1109/IoTDI.2018.00015

4. R. Mehran, A. Oyama, M. Shah, Abnormal crowd behavior detection using social force model, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2009), 935–942. https://doi.org/10.1109/CVPR.2009.5206641

5. D. Y. Chen, P. C. Huang, Motion-based unusual event detection in human crowds, J. Visual Commun. Image Represent., 22 (2011), 178–186. https://doi.org/10.1016/j.jvcir.2010.12.004

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Synthetic aperture radar remote sensing scene classification based on fuzzy co-occurrence networks;Journal of Applied Remote Sensing;2024-03-08