Affiliation:
1. City University of Hong Kong, Hong Kong
Abstract
This article considers multimedia question answering beyond factoid and how-to questions. We are interested in searching videos for answering opinion-oriented questions that are controversial and hotly debated. Examples of questions include “Should Edward Snowden be pardoned?” and “Obamacare—unconstitutional or not?”. These questions often invoke emotional response, either positively or negatively, hence are likely to be better answered by videos than texts, due to the vivid display of emotional signals visible through facial expression and speaking tone. Nevertheless, a potential answer of duration 60s may be embedded in a video of 10min, resulting in degraded user experience compared to reading the answer in text only. Furthermore, a text-based opinion question may be short and vague, while the video answers could be verbal, less structured grammatically, and noisy because of errors in speech transcription. Direct matching of words or syntactic analysis of sentence structure, such as adopted by factoid and how-to question-answering, is unlikely to find video answers. The first problem, the answer localization, is addressed by audiovisual analysis of the emotional signals in videos for locating video segments likely expressing opinions. The second problem, questions and answers matching, is tackled by a deep architecture that nonlinearly matches text words in questions and speeches in videos. Experiments are conducted on eight controversial topics based on questions crawled from Yahoo! Answers and Internet videos from YouTube.
Funder
Research Grants Council of the Hong Kong Special Administrative Region, China
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. SentiMedQAer: A Transfer Learning-Based Sentiment-Aware Model for Biomedical Question Answering;Frontiers in Neurorobotics;2022-03-10
2. BTDP;ACM Transactions on Multimedia Computing, Communications, and Applications;2019-08-12
3. Visual Content Recognition by Exploiting Semantic Feature Map with Attention and Multi-task Learning;ACM Transactions on Multimedia Computing, Communications, and Applications;2019-02-23
4. Toward Personalized Activity Level Prediction in Community Question Answering Websites;ACM Transactions on Multimedia Computing, Communications, and Applications;2018-04-30