Abstract
In order to make full use of the effective information in the video, this paper proposes a multi-model interactive video behavior recognition method. In order to solve the problems of incomplete human target detection and redundant feature extraction, YOLO_V4 is used to detect the human body and remove the redundant background information. Then, it is proposed to introduce the channel attention model SE-NET into the Inception_V3 network, so as to strengthen the extraction of key features and make the network pay more attention to the details of key features. Finally, the feature information is sent to LSTM network with memory function for action recognition and classification. The multi-model mutual fusion algorithm proposed in this paper is tested and verified on an internationally published UT-Interaction data set. The experimental results show that the accuracy of interactive behavior recognition is improved, and the improved accuracy is 85.1%, which indicates that the multi-model fusion method has higher accuracy.
Publisher
Darcy & Roy Press Co. Ltd.
Reference15 articles.
1. CARREIRA J,ZISSERMAN A. Quo vadis,Action recognition?a new model and the kinetics dataset [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu: IEEE,2017:1063-1071.
2. Ji Xiaofei, Zuo Xinmeng. Two-person interaction behavior recognition based on statistical features of key frame feature database [J]. Computer Application, 2016,36 (8): 2287-2291.
3. Chen Changhong, Liu Yuan. Two-person interaction behavior recognition based on improved sum-product network [J]. Computer Technology and Development, 2019,29 (10): 157-163.
4. Pei Xiaomin, Fan Huijie, Tang Yandong. Recognition of two-person interaction behavior in multi-channel spatio-temporal fusion network [J]. Infrared and Laser Engineering, 2020,49 (5): 211-216.
5. RYOO M S. Human activity prediction:early recognition of ongoing activities from streaming videos [C]// International Conference on Computer Vision. Barcelona, Spain:IEEE, 2011:1036-1043.