Chinese Lip-Reading Research Based on ShuffleNet and CBAM-Reference-Cited by-同舟云学术

Chinese Lip-Reading Research Based on ShuffleNet and CBAM

Published:2023-01-13 Issue:2 Volume:13 Page:1106
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Fu Yixian,Lu Yuanyao,Ni Ran^ORCID

Abstract

Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset called ‘Databox’ previously built by our laboratory. Secondly, the current state-of-the-art model consists of a residual network and a temporal convolutional network. The residual network leads to excessive computational cost and is not suitable for the on-device applications. In the new model, the residual network is replaced with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture. Thirdly, to help the network focus on the most useful information, we insert a simple but effective attention module called Convolutional Block Attention Module (CBAM) into the ShuffleNet. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the residual network (3.5 GFLOPs) under the computational budget of 1.01 GFLOPs.

Funder

the National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/2/1106/pdf

Reference31 articles.

1. Palecek, K. (2017, January 12–16). Utilizing lipreading in large vocabulary continuous speech recognition. Proceedings of the International Conference on Speech and Computer, Hatfield, UK.

2. Hearing lips and seeing voices;Mcgurk;Nature,1976

3. Assael, Y.M., Shillingford, B., and Whiteson, S. (2016). Lipnet: End-to-end sentence-level lipreading. arXiv.

4. Burton, J., Frank, D., Saleh, M., Navab, N., and Bear, H.L. (2018, January 12–14). The speaker-independent lipreading play-off; a survey of lipreading machines. Proceedings of the 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), Sophia Antipolis, France.

5. Lu, H., Liu, X., Yin, Y., and Chen, Z. (2019, January 19–20). A Patent Text Classification Model Based on Multivariate Neural Network Fusion. Proceedings of the 2019 6th International Conference on Soft Computing & Machine Intelligence (ISCMI), Johannesburg, South Africa.

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automatic lip-reading classification using deep learning approaches and optimized quaternion meixner moments by GWO algorithm;Knowledge-Based Systems;2024-11

2. YOLO-WDNet: A lightweight and accurate model for weeds detection in cotton field;Computers and Electronics in Agriculture;2024-10

3. AI-based visual speech recognition towards realistic avatars and lip-reading applications in the metaverse;Applied Soft Computing;2024-10

4. Automated multi-class skin cancer classification using white shark optimizer with ensemble learning classifier on dermoscopy images;Multimedia Tools and Applications;2024-03-26

5. Deep learning in food category recognition;Information Fusion;2023-10