Egocentric Early Action Prediction via Adversarial Knowledge Distillation-Reference-Cited by-同舟云学术

Egocentric Early Action Prediction via Adversarial Knowledge Distillation

Published:2023-02-06 Issue:2 Volume:19 Page:1-21
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Zheng Na¹^ORCID,Song Xuemeng¹^ORCID,Su Tianyu¹^ORCID,Liu Weifeng²^ORCID,Yan Yan³^ORCID,Nie Liqiang¹^ORCID

Affiliation:

1. Shandong University, Binhai Highway Qingdao, China

2. China University of Petroleum (East China), West Changjiang Road, Qingdao, China

3. Illinois Institute of Technology

Abstract

Egocentric early action prediction aims to recognize actions from the first-person view by only observing a partial video segment, which is challenging due to the limited context information of the partial video. In this article, to tackle the egocentric early action prediction problem, we propose a novel multi-modal adversarial knowledge distillation framework. In particular, our approach involves a teacher network to learn the enhanced representation of the partial video by considering the future unobserved video segment, and a student network to mimic the teacher network to produce the powerful representation of the partial video and based on that predicting the action label. To promote the knowledge distillation between the teacher and the student network, we seamlessly integrate adversarial learning with latent and discriminative knowledge regularizations encouraging the learned representations of the partial video to be more informative and discriminative toward the action prediction. Finally, we devise a multi-modal fusion module toward comprehensively predicting the action label. Extensive experiments on two public egocentric datasets validate the superiority of our method over the state-of-the-art methods. We have released the codes and involved parameters to benefit other researchers. 1

Funder

National Key Research and Development Project of New Generation Artificial Intelligence

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3544493

Reference70 articles.

1. Yijun Cai, Haoxin Li, Jian-Fang Hu, and Wei-Shi Zheng. 2019. Action knowledge transfer for action prediction with partial videos. In Proceedings of the AAAI Conference on Artificial Intelligence. 8118–8125.

2. Yue Cao, Bin Liu, Mingsheng Long, and Jianmin Wang. 2018. HashGAN: Deep learning to hash with pair conditional Wasserstein GAN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1287–1296.

3. Xiaolin Chen, Xuemeng Song, Guozhen Peng, Shanshan Feng, and Liqiang Nie. 2021. Adversarial-enhanced hybrid graph network for user identity linkage. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 1084–1093.

4. Xinyuan Chen, Chang Xu, Xiaokang Yang, and Dacheng Tao. 2018. Attention-GAN for object transfiguration in wild images. In Proceedings of the European Conference on Computer Vision. 164–180.

5. Ali Cheraghian, Shafin Rahman, Pengfei Fang, Soumava Kumar Roy, Lars Petersson, and Mehrtash Harandi. 2021. Semantic-aware knowledge distillation for few-shot class-incremental learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2534–2543.

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Context-aware relational reasoning for video chunks and frames overlapping in language-based moment localization;Neurocomputing;2024-10

2. SgLFT: Semantic-guided Late Fusion Transformer for video corpus moment retrieval;Neurocomputing;2024-09

3. From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-08-28

4. Multimodal Score Fusion with Sparse Low-rank Bilinear Pooling for Egocentric Hand Action Recognition;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-05-16

5. Intention action anticipation model with guide-feedback loop mechanism;Knowledge-Based Systems;2024-05