Knowledge-integrated Multi-modal Movie Turning Point Identification

Author:

Wang Depei1ORCID,Xu Ruifeng2ORCID,Cheng Lianglun3ORCID,Wang Zhuowei3ORCID

Affiliation:

1. Guangdong Southern Planning & Designing Institute of Telecom Co., Ltd., China and Harbin Institute of Technology (Shenzhen), China

2. Harbin Institute of Technology (Shenzhen), China and Peng Cheng Laboratory, China

3. Guangdong University of Technology, China

Abstract

The rapid development of artificial intelligence provides rich technologies and tools for the automated understanding of literary works. As a comprehensive carrier of storylines, movies are natural multimodal data sources that provide sufficient data foundations, and how to fully leverage the benefits of data remains a sustainable research hotspot. In addition, the efficient representation of multi-source data also poses new challenges for information fusion technology. Therefore, we propose a knowledge-enhanced turning points identification (KTPi) method for multimodal scene recognition. First, the BiLSTM method is used to encode scene text and integrate contextual information into scene representations to complete text sequence modeling. Then, the graph structure is used to model all scenes, which strengthens long-range semantic dependencies between scenes and enhances scene representations using graph convolution network. After, the self-supervised method is used to obtain the optimal number of neighboring nodes in sparse graph. Next, actor and verb knowledge involved in the scene text are added to the multimodal data to enhance the diversity of scene feature expressions. Finally, the teacher-student network strategy is used to train the KTPi model. Experimental results show that KTPi outperforms baseline methods in scene role recognition tasks, and ablation experiments show that incorporating knowledge into multimodal model can improve its performance.

Funder

Guangdong Provincial Key Laboratory of Cyber-Physical Systems

National Natural Science Foundation of China

Shenzhen Foundational Research Funding

Major Key Project of PCL

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Reference37 articles.

1. Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. VATT: Transformers for multimodal self-supervised learning from raw video, audio and text. In Advances in Neural Information Processing Systems. https://openreview.net/forum?id=RzYrn625bu8

2. Multimodal Machine Learning: A Survey and Taxonomy

3. Daniel Cer Yinfei Yang Sheng yi Kong Nan Hua Nicole Limtiaco Rhomni St. John Noah Constant Mario Guajardo-Cespedes Steve Yuan Chris Tar Yun-Hsuan Sung Brian Strope and Ray Kurzweil. 2018. Universal Sentence Encoder. arxiv:1803.11175 [cs.CL]

4. A joint learning Im-BiLSTM model for incomplete time-series sentinel-2A data imputation and crop classification;Chen Baili;International Journal of Applied Earth Observation and Geoinformation,2022

5. Self-training method based on GCN for semi-supervised short text classification

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3