A semantic-based video scene segmentation using a deep neural network-Reference-Cited by-同舟云学术

A semantic-based video scene segmentation using a deep neural network

Published:2018-12-19 Issue:6 Volume:45 Page:833-844
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Ji Hyesung¹,Hooshyar Danial²,Kim Kuekyeng³,Lim Heuiseok³

Affiliation:

1. Department of Language AI Lab, NCSOFT, Korea

2. Institute of Education, University of Tartu, Estonia

3. Department of Computer Science and Engineering, Korea University, Korea

Abstract

Video scene segmentation is very important research in the field of computer vision, because it helps in efficient storage, indexing and retrieval of videos. Achieving this kind of scene segmentation cannot be done by just calculating the similarity of low-level features presented in the video; high-level features should also be considered to achieve a better performance. Even though much research has been conducted on video scene segmentation, most of these studies failed to semantically segment a video into scenes. Thus, in this study, we propose a Deep-learning Semantic-based Scene-segmentation model (called DeepSSS) that considers image captioning to segment a video into scenes semantically. First, the DeepSSS performs shot boundary detection by comparing colour histograms and then employs maximum-entropy-applied keyframe extraction. Second, for semantic analysis, using image captioning that benefits from deep learning generates a semantic text description of the keyframes. Finally, by comparing and analysing the generated texts, it assembles the keyframes into a scene grouped under a semantic narrative. That said, DeepSSS considers both low- and high-level features of videos to achieve a more meaningful scene segmentation. By applying DeepSSS to data sets from MS COCO for caption generation and evaluating its semantic scene-segmentation task results with the data sets from TRECVid 2016, we demonstrate quantitatively that DeepSSS outperforms other existing scene-segmentation methods using shot boundary detection and keyframes. What’s more, the experiments were done by comparing scenes segmented by humans and scene segmented by the DeepSSS. The results verified that the DeepSSS’ segmentation resembled that of humans. This is a new kind of result that was enabled by semantic analysis, which was impossible by just using low-level features of videos.

Funder

This research is supported by Ministry of Culture, Sport and Tourism(MCST) and Korea Creative Content Agency(KOCCA) in the Culture Technology(CT) Research & Development Program 2018

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551518819964

Reference33 articles.

1. The evolution of visual information retrieval

2. Incorporating social media comments in affective video retrieval

3. Exploring characteristics of video consuming behaviour in different social media using K-pop videos

4. Discovering the Top-k Unexplained Sequences in Time-Stamped Observation Data

5. Examining feedback in interactive video retrieval

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Diffusion Neural Network-Enhanced Object Tracking Approach Under Sports Scenarios;Journal of Circuits, Systems and Computers;2024-07-31

2. Multimodal High-order Relation Transformer for Scene Boundary Detection;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01

3. User Adaptive Video Summarization;2023 6th International Conference on Information Systems and Computer Networks (ISCON);2023-03-03

4. Fuzzy Rule-Based Model to Train Videos in Video Surveillance System;Intelligent Automation & Soft Computing;2023

5. Automatic Scene Segmentation Algorithm for Image Color Restoration;Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering;2022-10-21