Inter-Sentence Segmentation of YouTube Subtitles Using Long-Short Term Memory (LSTM)-Reference-Cited by-同舟云学术

Inter-Sentence Segmentation of YouTube Subtitles Using Long-Short Term Memory (LSTM)

Published:2019-04-11 Issue:7 Volume:9 Page:1504
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Song Hye-Jeong,Kim Hong-Ki,Kim Jong-Dae,Park Chan-Young,Kim Yu-Seop

Abstract

Recently, with the development of Speech to Text, which converts voice to text, and machine translation, technologies for simultaneously translating the captions of video into other languages have been developed. Using this, YouTube, a video-sharing site, provides captions in many languages. Currently, the automatic caption system extracts voice data when uploading a video and provides a subtitle file converted into text. This method creates subtitles suitable for the running time. However, when extracting subtitles from video using Speech to Text, it is impossible to accurately translate the sentence because all sentences are generated without periods. Since the generated subtitles are separated by time units rather than sentence units, and are translated, it is very difficult to understand the translation result as a whole. In this paper, we propose a method to divide text into sentences and generate period marks to improve the accuracy of automatic translation of English subtitles. For this study, we use the 27,826 sentence subtitles provided by Stanford University’s courses as data. Since this lecture video provides complete sentence caption data, it can be used as training data by transforming the subtitles into general YouTube-like caption data. We build a model with the training data using the LSTM-RNN (Long-Short Term Memory – Recurrent Neural Networks) and predict the position of the period mark, resulting in prediction accuracy of 70.84%. Our research will provide people with more accurate translations of subtitles. In addition, we expect that language barriers in online education will be more easily broken by achieving more accurate translations of numerous video lectures in English.

Funder

The Research foundation of Korea Grant

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/9/7/1504/pdf

Reference27 articles.

1. Foundations of Statistical Natural Language Processing;Manning,1999

2. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Spoken Text With Punctuation Prediction Using N-Gram Language Model in Intelligent Technical Text Processing Software;Advances in Systems Analysis, Software Engineering, and High Performance Computing;2024-06-21

2. Comparison of the Keras-LSTM Algorithms for Classifying Autism Spectrum Disorder Using Facial Images;2023 International Conference on Data Science and Its Applications (ICoDSA);2023-08-09

3. Open-source dubbing system with synthetic voice for MOOCs;2022 IEEE Learning with MOOCS (LWMOOCS);2022-09-29

4. LSTM-A: A Cable Partial Discharge Detection Model with Circuit Big Data;Lecture Notes in Electrical Engineering;2022

5. A prediction and imputation method for marine animal movement data;PeerJ Computer Science;2021-08-03