Motion Vector-Based Self-Attention for Real-Time Human Activity Recognition in Compressed Videos: The MVViT Approach

Author:

Praveenkumar S. M.1ORCID,Patil Prakashgoud1ORCID,Hiremath P. S.1ORCID

Affiliation:

1. Department of Computer Applications, KLE Technological University, Vidayanagr, Hubballi, Karnataka 580031, India

Abstract

Herein, a novel methodology is proposed for real-time recognition of human activity in a compressed domain of videos based on motion vectors and self-attention mechanism using vision transformers, and it is termed as motion vectors and vision transformers (MVViT). The videos in MPEG-4 and H.264 compression formats are considered for this study. Any video source without any prior setup could be considered by adopting the proposed method to the corresponding video codecs and camera settings. Existing algorithms for recognition of human action in a compressed video have some limitations in this regard, such as (i) requirement of keyframes at a fixed interval, (ii) usage of P frames only, and (iii) normally support single codec only. These limitations are overcome in the proposed method by using arbitrary keyframe intervals, using both P and B frames, and supporting MPEG-4 as well as H.264 codecs. The experimentation is carried out using the benchmark datasets, namely, UCF101, HMDB51, and THUMOS14, and the recognition accuracy in a compressed domain is found to be comparable to that observed in raw video data but at reduced cost of computation. The proposed MVViT method has outperformed other recent methods in terms of a lesser (61.0%) number of parameters and (63.7%) Giga Floating Point Operations Per Second (GFLOPS), while significantly improving accuracy by 0.8%, 5.9% and 16.6% for UCF101, HMDB51 and THUMOS14, respectively. Also, it is observed that the speed is increased by 8% in case of UCF101 when compared to the highest speed reported in the literature on the same dataset. The ablation study of the proposed method has been done using MVViT variants for different codecs and the performance analysis is done in comparison with the state-of-the-art network models.

Publisher

World Scientific Pub Co Pte Ltd

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3