Lip Reading Using Various Deep Learning Models with Visual Turkish Data

Author:

Berkol Ali1ORCID,Tümer Sivri Talya1ORCID,Erdem Hamit2ORCID

Affiliation:

1. Aselsan-Bites

2. BAŞKENT ÜNİVERSİTESİ

Abstract

In Human-Computer Interaction, lip reading is essential and still an open research problem. In the last decades, there have been many studies in the field of Automatic Lip-Reading (ALR) in different languages, which is important for societies where the essential applications developed. Similarly to other machine learning and artificial intelligence applications, Deep Learning (DL) based classification algorithms have been applied for ALR in order to improve the performance of ALR. In the field of ALR, few studies have been done on the Turkish language. In this study, we undertook a multifaceted approach to address the challenges inherent to Turkish lip reading research. To begin, we established a foundation by creating an original dataset meticulously curated for the purpose of this investigation. Recognizing the significance of data quality and diversity, we implemented three robust image data augmentation techniques: sigmoidal transform, horizontal flip, and inverse transform. These augmentation methods not only elevated the quality of our dataset but also introduced a rich spectrum of variations, thereby bolstering the dataset's utility. Building upon this augmented dataset, we delved into the application of cutting-edge DL models. Our choice of models encompassed Convolutional Neural Networks (CNN), known for their prowess in extracting intricate visual features, Long-Short Term Memory (LSTM), adept at capturing sequential dependencies, and Bidirectional Gated Recurrent Unit (BGRU), renowned for their effectiveness in handling complex temporal data. These advanced models were selected to leverage the potential of the visual Turkish lip reading dataset, ensuring that our research stands at the forefront of this rapidly evolving field. The dataset utilized in this study was gathered with the primary objective of augmenting the extant corpus of Turkish language datasets, thereby substantively enriching the landscape of Turkish language research while concurrently serving as a benchmark reference. The performance of the applied method has been compared regarding precision, recall, and F1 metrics. According to experiment results, BGRU and LSTM models gave the same results up to the fifth decimal, and BGRU had the fastest training time.

Funder

Aselsan-Bites

Publisher

Gazi University Journal of Science

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Script Generation for Silent Speech in E-Learning;Advances in Educational Technologies and Instructional Design;2024-06-03

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3