Korean Lip-Reading: Data Construction and Sentence-Level Lip-Reading
-
Published:2024-04-05
Issue:2
Volume:27
Page:167-176
-
ISSN:2636-0640
-
Container-title:Journal of the Korea Institute of Military Science and Technology
-
language:en
-
Short-container-title:J. KIMS Technol
Author:
Cho Sunyoung,Yoon Soosung
Abstract
Lip-reading is the task of inferring the speaker’s utterance from silent video based on learning of lip movements. It is very challenging due to the inherent ambiguities present in the lip movement such as different characters that produce the same lip appearances. Recent advances in deep learning models such as Transformer and Temporal Convolutional Network have led to improve the performance of lip-reading. However, most previous works deal with English lip-reading which has limitations in directly applying to Korean lip-reading, and moreover, there is no a large scale Korean lip-reading dataset. In this paper, we introduce the first large-scale Korean lip-reading dataset with more than 120 k utterances collected from TV broadcasts containing news, documentary and drama. We also present a preprocessing method which uniformly extracts a facial region of interest and propose a transformer-based model based on grapheme unit for sentence-level Korean lip-reading. We demonstrate that our dataset and model are appropriate for Korean lip-reading through statistics of the dataset and experimental results.
Funder
The Government of the Republic of Korea
Publisher
The Korea Institute of Military Science and Technology