Variable Temporal Length Training for Action Recognition CNNs

Author:

Li Tan-Kun1ORCID,Chan Kwok-Leung1ORCID,Tjahjadi Tardi2ORCID

Affiliation:

1. Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China

2. School of Engineering, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK

Abstract

Most current deep learning models are suboptimal in terms of the flexibility of their input shape. Usually, computer vision models only work on one fixed shape used during training, otherwise their performance degrades significantly. For video-related tasks, the length of each video (i.e., number of video frames) can vary widely; therefore, sampling of video frames is employed to ensure that every video has the same temporal length. This training method brings about drawbacks in both the training and testing phases. For instance, a universal temporal length can damage the features in longer videos, preventing the model from flexibly adapting to variable lengths for the purposes of on-demand inference. To address this, we propose a simple yet effective training paradigm for 3D convolutional neural networks (3D-CNN) which enables them to process videos with inputs having variable temporal length, i.e., variable length training (VLT). Compared with the standard video training paradigm, our method introduces three extra operations during training: sampling twice, temporal packing, and subvideo-independent 3D convolution. These operations are efficient and can be integrated into any 3D-CNN. In addition, we introduce a consistency loss to regularize the representation space. After training, the model can successfully process video with varying temporal length without any modification in the inference phase. Our experiments on various popular action recognition datasets demonstrate the superior performance of the proposed method compared to conventional training paradigm and other state-of-the-art training paradigms.

Funder

Research Grants Council of the Hong Kong Special Administrative Region, China

Publisher

MDPI AG

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3