Video text rediscovery: Predicting and tracking text across complex scenes-Reference-Cited by-同舟云学术

Video text rediscovery: Predicting and tracking text across complex scenes

Published:2024-06 Issue:3 Volume:40 Page:
ISSN:0824-7935
Container-title:Computational Intelligence
language:en
Short-container-title:Computational Intelligence

Author:

Naosekpam Veronica¹²^ORCID,Sahu Nilkanta²

Affiliation:

1. Artificial Intelligence Lab Indian Institute of Information Technology Guwahati Assam India

2. Department of Computer Science and Engineering Indian Institute of Information Technology Guwahati Assam India

Abstract

AbstractDynamic texts in scene videos provide valuable insights and semantic cues crucial for video applications. However, the movement of this text presents unique challenges, such as blur, shifts, and blockages. While efficient in tracking text, state‐of‐the‐art systems often need help when text becomes obscured or complicated scenes. This study introduces a novel method for detecting and tracking video text, specifically designed to predict the location of obscured or occluded text in subsequent frames using a tracking‐by‐detection paradigm. Our approach begins with a primary detector to identify text within individual frames, thus enhancing tracking accuracy. Using the Kalman filter, Munkres algorithm, and deep visual features, we establish connections between text instances across frames. Our technique works on the concept that when text goes missing in a frame due to obstructions, we use its previous speed and location to predict its next position. Experiments conducted on the ICDAR2013 Video and ICDAR2015 Video datasets confirm our method's efficacy, matching or surpassing established methods in performance.

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/coin.12686

Reference69 articles.

1. Scene Text Recognition with Orientation Rectification via IC-STN

2. Detecting text in natural scenes with stroke width transform

3. EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

4. Few‐shot learning for word‐level scene text script identification