Utilisation of Metadata Fields and Query Expansion in Cross-Lingual Search of User-Generated Internet Video-Reference-Cited by-同舟云学术

Utilisation of Metadata Fields and Query Expansion in Cross-Lingual Search of User-Generated Internet Video

Published:2016-01-27 Issue: Volume:55 Page:249-281
ISSN:1076-9757
Container-title:Journal of Artificial Intelligence Research
language:
Short-container-title:jair

Author:

Khwileh Ahmad,Ganguly Debasis,J. F. Jones Gareth

Abstract

Recent years have seen significant efforts in the area of Cross Language Information Retrieval (CLIR) for text retrieval. This work initially focused on formally published content, but more recently research has begun to concentrate on CLIR for informal social media content. However, despite the current expansion in online multimedia archives, there has been little work on CLIR for this content. While there has been some limited work on Cross-Language Video Retrieval (CLVR) for professional videos, such as documentaries or TV news broadcasts, there has to date, been no significant investigation of CLVR for the rapidly growing archives of informal user generated (UGC) content. Key differences between such UGC and professionally produced content are the nature and structure of the textual UGC metadata associated with it, as well as the form and quality of the content itself. In this setting, retrieval effectiveness may not only suffer from translation errors common to all CLIR tasks, but also recognition errors associated with the automatic speech recognition (ASR) systems used to transcribe the spoken content of the video and with the informality and inconsistency of the associated user-created metadata for each video. This work proposes and evaluates techniques to improve CLIR effectiveness of such noisy UGC content. Our experimental investigation shows that different sources of evidence, e.g. the content from different fields of the structured metadata, significantly affect CLIR effectiveness. Results from our experiments also show that each metadata field has a varying robustness to query expansion (QE) and hence can have a negative impact on the CLIR effectiveness. Our work proposes a novel adaptive QE technique that predicts the most reliable source for expansion and shows how this technique can be effective for improving the CLIR effectiveness for UGC content.

Publisher

AI Access Foundation

Subject

Artificial Intelligence

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Towards Update-Efficient and Parallel-Friendly Content-Based Indexing Scheme in Cloud Computing;International Journal of Semantic Computing;2018-06

2. A Content-Based Indexing Scheme for Large-Scale Unstructured Data;2017 IEEE Third International Conference on Multimedia Big Data (BigMM);2017-04

3. Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction;Lecture Notes in Computer Science;2017