Korean Lip-Reading: Data Construction and Sentence-Level Lip-Reading-Reference-Cited by-同舟云学术

Korean Lip-Reading: Data Construction and Sentence-Level Lip-Reading

Published:2024-04-05 Issue:2 Volume:27 Page:167-176
ISSN:2636-0640
Container-title:Journal of the Korea Institute of Military Science and Technology
language:en
Short-container-title:J. KIMS Technol

Author:

Cho Sunyoung,Yoon Soosung

Abstract

Lip-reading is the task of inferring the speaker’s utterance from silent video based on learning of lip movements. It is very challenging due to the inherent ambiguities present in the lip movement such as different characters that produce the same lip appearances. Recent advances in deep learning models such as Transformer and Temporal Convolutional Network have led to improve the performance of lip-reading. However, most previous works deal with English lip-reading which has limitations in directly applying to Korean lip-reading, and moreover, there is no a large scale Korean lip-reading dataset. In this paper, we introduce the first large-scale Korean lip-reading dataset with more than 120 k utterances collected from TV broadcasts containing news, documentary and drama. We also present a preprocessing method which uniformly extracts a facial region of interest and propose a transformer-based model based on grapheme unit for sentence-level Korean lip-reading. We demonstrate that our dataset and model are appropriate for Korean lip-reading through statistics of the dataset and experimental results.

Funder

The Government of the Republic of Korea

Publisher

The Korea Institute of Military Science and Technology

Link

https://jkimst.org/upload/pdf/KIMST-2024-27-2-167.pdf

Reference25 articles.

1. Deep Audio-visual Speech Recognition