Socializing the Videos: A Multimodal Approach for Social Relation Recognition-Reference-Cited by-同舟云学术

Socializing the Videos: A Multimodal Approach for Social Relation Recognition

Published:2021-04-16 Issue:1 Volume:17 Page:1-23
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Xu Tong¹,Zhou Peilun¹,Hu Linkang¹,He Xiangnan¹,Hu Yao²,Chen Enhong¹

Affiliation:

1. University of Science and Technology of China, Hefei, China

2. Alibaba Youku Cognitive and Intelligent Lab, Beijing, China

Abstract

As a crucial task for video analysis, social relation recognition for characters not only provides semantically rich description of video content but also supports intelligent applications, e.g., video retrieval and visual question answering. Unfortunately, due to the semantic gap between visual and semantic features, traditional solutions may fail to reveal the accurate relations among characters. At the same time, the development of social media platforms has now promoted the emergence of crowdsourced comments, which may enhance the recognition task with semantic and descriptive cues. To that end, in this article, we propose a novel multimodal-based solution to deal with the character relation recognition task. Specifically, we capture the target character pairs via a search module and then design a multistream architecture for jointly embedding the visual and textual information, in which feature fusion and attention mechanism are adapted for better integrating the multimodal inputs. Finally, supervised learning is applied to classify character relations. Experiments on real-world data sets validate that our solution outperforms several competitive baselines.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3416493

Reference42 articles.

1. Cross-modal video moment retrieval based on visual-textual relationship alignment

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. InteractNet: Social Interaction Recognition for Semantic-rich Videos;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-06-12

2. An Effective Anti-Object-Detection Image Privacy Protection Scheme Based on Robust Chaos;IEEE Transactions on Industrial Informatics;2024-05

3. Energy Shuttle Graph Convolution for Multimodal Relation Recognition in Videos;IEEE Access;2024

4. Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long Videos;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

5. Cross-modality Multiple Relations Learning for Knowledge-based Visual Question Answering;ACM Transactions on Multimedia Computing, Communications, and Applications;2023-10-23