Fine‐granularity semantic video annotation-Reference-Cited by-同舟云学术

Fine‐granularity semantic video annotation

Published:2013-08-30 Issue:3 Volume:9 Page:243-269
ISSN:1742-7371
Container-title:International Journal of Pervasive Computing and Communications
language:en
Short-container-title:

Author:

El‐Khoury Vanessa,Jergler Martin,Abebe Bayou Getnet,Coquil David,Kosch Harald

Abstract

PurposeA fine‐grained video content indexing, retrieval, and adaptation requires accurate metadata describing the video structure and semantics to the lowest granularity, i.e. to the object level. The authors address these requirements by proposing semantic video content annotation tool (SVCAT) for structural and high‐level semantic video annotation. SVCAT is a semi‐automatic MPEG‐7 standard compliant annotation tool, which produces metadata according to a new object‐based video content model introduced in this work. Videos are temporally segmented into shots and shots level concepts are detected automatically using ImageNet as background knowledge. These concepts are used as a guide to easily locate and select objects of interest which are then tracked automatically to generate an object level metadata. The integration of shot based concept detection with object localization and tracking drastically alleviates the task of an annotator. The paper aims to discuss these issues.Design/methodology/approachA systematic keyframes classification into ImageNet categories is used as the basis for automatic concept detection in temporal units. This is then followed by an object tracking algorithm to get exact spatial information about objects.FindingsExperimental results showed that SVCAT is able to provide accurate object level video metadata.Originality/valueThe new contribution in this paper introduces an approach of using ImageNet to get shot level annotations automatically. This approach assists video annotators significantly by minimizing the effort required to locate salient objects in the video.

Publisher

Emerald

Subject

General Computer Science,Theoretical Computer Science

Reference36 articles.

1. Ahmed, R., Karmakar, G.C. and Dooley, L.S. (2006), “Region‐based shape incorporation for probabilistic spatio‐temporal video object segmentation”, 2006 IEEE International Conference on Image Processing, pp. 2445‐2448.

2. Bruyne, S. et al., (2011), “Annotation based personalized adaptation and presentation of videos for mobile applications”, Multimedia Tools and Applications, Vol. 55 No. 2, pp. 307‐331.

3. Caruana, R., Karampatziakis, N. and Yessenalina, A. (2008), “An empirical evaluation of supervised learning in high dimensions”, International Conference on Machine Learning, pp. 96‐103.

4. Chan, T.F. and Vese, L.A. (2001), “An active contour model without edges”, IEEE Transactions on Image Processing, Vol. 10 No. 2, pp. 266‐277.

5. Chatfield, K., Lempitsky, V., Vedaldi, A. and Zisserman, A. (2011), “The devil is in the details: an evaluation of recent feature encoding methods”, paper presented at British Machine Vision Conference.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Segmented Translation Algorithm of Complex Long Sentences Based on Semantic Features;Journal of Physics: Conference Series;2021-04-01

2. Towards a Scene-Based Video Annotation Framework;2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS);2015-11