Abstract
AbstractThe problem of Lip-reading has become an important research challenge in recent years. The goal is to recognise speech from lip movements. Most of the Lip-reading technologies developed so far are camera-based, which require video recording of the target. However, these technologies have well-known limitations of occlusion and ambient lighting with serious privacy concerns. Furthermore, vision-based technologies are not useful for multi-modal hearing aids in the coronavirus (COVID-19) environment, where face masks have become a norm. This paper aims to solve the fundamental limitations of camera-based systems by proposing a radio frequency (RF) based Lip-reading framework, having an ability to read lips under face masks. The framework employs Wi-Fi and radar technologies as enablers of RF sensing based Lip-reading. A dataset comprising of vowels A, E, I, O, U and empty (static/closed lips) is collected using both technologies, with a face mask. The collected data is used to train machine learning (ML) and deep learning (DL) models. A high classification accuracy of 95% is achieved on the Wi-Fi data utilising neural network (NN) models. Moreover, similar accuracy is achieved by VGG16 deep learning model on the collected radar-based dataset.
Funder
RCUK | Engineering and Physical Sciences Research Council
Publisher
Springer Science and Business Media LLC
Subject
General Physics and Astronomy,General Biochemistry, Genetics and Molecular Biology,General Chemistry,Multidisciplinary
Reference32 articles.
1. WHO. Deafness and hearing loss. https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss. Accessed 18 Mar 2022.
2. Rashbrook, E. & Perkins, C. UK health security agency, health matters: Hearing loss across the life course. https://ukhsa.blog.gov.uk/2019/06/05/health-matters-hearing-loss-across-the-life-course. Accessed 18 Mar 2022.
3. Mahmoud, H. A., Muhaya, F. B. & Hafez, A. Lip reading based surveillance system. In: 2010 5th International Conference on Future Information Technology, 1–4, https://doi.org/10.1109/FUTURETECH.2010.5482688 (2010).
4. Lesani, F. S., Ghazvini, F. F. & Dianat, R. Mobile phone security using automatic lip reading. In: 2015 9th International Conference on e-Commerce in Developing Countries: With focus on e-Business (ECDC), 1–5, https://doi.org/10.1109/ECDC.2015.7156322 (2015).
5. Potamianos, G., Neti, C., Luettin, J. & Matthews, I. Audio-visual automatic speech recognition: an overview. Issues in visual and audio-visual speech processing 22, 23 (MIT Press Cambridge, 2004).
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献