P 2 ANet : A Large-Scale Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos

Author:

Bian Jiang1ORCID,Li Xuhong1ORCID,Wang Tao1ORCID,Wang Qingzhong1ORCID,Huang Jun1ORCID,Liu Chen1ORCID,Zhao Jun1ORCID,Lu Feixiang1ORCID,Dou Dejing1ORCID,Xiong Haoyi1ORCID

Affiliation:

1. Baidu Inc., China

Abstract

While deep learning has been widely used for video analytics, such as video classification and action detection, dense action detection with fast-moving subjects from sports videos is still challenging. In this work, we release yet another sports video benchmark P 2 ANet for P ing P ong- A ction detection, which consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads. We work with a crew of table tennis professionals and referees on a specially designed annotation toolbox to obtain fine-grained action labels (in 14 classes) for every ping-pong action that appeared in the dataset, and formulate two sets of action detection problems— action localization and action recognition . We evaluate a number of commonly seen action recognition (e.g., TSM, TSN, Video SwinTransformer, and Slowfast) and action localization models (e.g., BSN, BSN++, BMN, TCANet), using P 2 ANet for both problems, under various settings. These models can only achieve 48% area under the AR-AN curve for localization and 82% top-one accuracy for recognition since the ping-pong actions are dense with fast-moving subjects but broadcasting videos are with only 25 FPS. The results confirm that P 2 ANet is still a challenging task and can be used as a special benchmark for dense action detection from videos. We invite readers to examine our dataset by visiting the following link: https://github.com/Fred1991/P2ANET .

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Reference74 articles.

1. Sami Abu-El-Haija Nisarg Kothari Joonseok Lee Paul Natsev George Toderici Balakrishnan Varadarajan and Sudheendra Vijayanarasimhan. 2016. Youtube-8M: A large-scale video classification benchmark. arXiv:1609.08675. Retrieved from https://arxiv.org/abs/1609.08675

2. ViViT: A Video Vision Transformer

3. Dynamic 3D image simulation of basketball movement based on embedded system and computer vision

4. Am I a Baller? Basketball Performance Assessment from First-Person Videos

5. Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In Proceedings of the International Conference on Machine Learning. PMLR, 813–824.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3