Author:
Gan Chuang,Lin Ming,Yang Yi,Melo Gerard,G. Hauptmann Alexander
Abstract
Vast quantities of videos are now being captured at astonishing rates, but the majority of these are not labelled. To cope with such data, we consider the task of content-based activity recognition in videos without any manually labelled examples, also known as zero-shot video recognition. To achieve this, videos are represented in terms of detected visual concepts, which are then scored as relevant or irrelevant according to their similarity with a given textual query. In this paper, we propose a more robust approach for scoring concepts in order to alleviate many of the brittleness and low precision problems of previous work. Not only do we jointly consider semantic relatedness, visual reliability, and discriminative power. To handle noise and non-linearities in the ranking scores of the selected concepts, we propose a novel pairwise order matrix approach for score aggregation. Extensive experiments on the large-scale TRECVID Multimedia Event Detection data show the superiority of our approach.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献