Characters Link Shots: Character Attention Network for Movie Scene Segmentation

Author:

Tan Jiawei1ORCID,Wang Hongxing1ORCID,Yuan Junsong2ORCID

Affiliation:

1. Key Laboratory of Dependable Service Computing in Cyber Physical Society (Chongqing University), Ministry of Education, China, and School of Big Data and Software Engineering, Chongqing University, China

2. Department of Computer Science and Engineering, State University of New York at Buffalo, USA

Abstract

Movie scene segmentation aims to automatically segment a movie into multiple story units, i.e., scenes, each of which is a series of semantically coherent and time-continual shots. Previous methods have continued efforts on shot semantic association, but few take into account the impact of different semantics on foreground characters and background scenes in movie shots. In particular, the background scene in the shot can adversely affect scene boundary classification. Motivated by the fact that it is the characters who drive the plot development of a movie scene, we build a Character Attention Network (CANet) to detect movie scene boundaries in a character-centric fashion. To eliminate the background clutter, we extract multi-view character semantics for each shot in terms of human bodies and faces. Furthermore, we equip our CANet with two stages of character attention. The first is Masked Shot Attention (MSA) through selective self-attention over similar temporal contexts from multi-view character semantics to yield an enhanced omni-view shot representation, by which the CANet can better handle the variations of characters in pose and appearance. The second is Key Character Attention (KCA) through temporal-aware attention on character reappearances for Bidirectional Long Short-Term Memory (Bi-LSTM) feature association so that linking shots can be focused on those with recurring key characters. We encourage the proposed CANet in learning boundary-discriminative shot features. Specifically, we formulate a Boundary-Aware circle Loss (BAL) to push far apart CANet-features between adjacent scenes, which is also coupled with the cross-entropy loss to drive CANet-features sensitive to scene boundaries. Experimental results on the MovieNet-SSeg and OVSD datasets show that our method achieves superior performance in temporal scene segmentation compared with state-of-the-art methods.

Funder

National Natural Science Foundation of China

Key Project of Chongqing Technology Innovation and Application Development

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Reference64 articles.

1. Max Bain, Arsha Nagrani, Andrew Brown, and Andrew Zisserman. 2020. Condensed movies: Story based retrieval with contextual embeddings. In ACCV. 460–479.

2. Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2015. A deep Siamese network for scene detection in broadcast videos. In MM. 1199–1202.

3. Longformer: The long-document transformer;Beltagy Iz;CoRR,2020

4. Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In ICML (Proceedings of Machine Learning Research), Vol. 139. 813–824.

5. Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In ECCV. 213–229.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3