An Interference-Resistant and Low-Consumption Lip Recognition Method

Author:

Jia Junwei,Wang Zhilu,Xu Lianghui,Dai Jiajia,Gu Mingyi,Huang Jing

Abstract

Lip movements contain essential linguistic information. It is an important medium for studying the content of the dialogue. At present, there are many studies on how to improve the accuracy of lip language recognition models. However, there are few studies on the robustness and generalization performance of the model under various disturbances. Specific experiments show that the current state-of-the-art lip recognition model significantly drops in accuracy when disturbed and is particularly sensitive to adversarial examples. This paper substantially alleviates this problem by using Mixup training. Taking the model subjected to negative attacks generated by FGSM as an example, the model in this paper achieves 85.0% and 40.2% accuracy on the English dataset LRW and the Mandarin dataset LRW-1000, respectively. The correct recognition rates are improved by 9.8% and 8.3%, compared with the current advanced lip recognition models. The positive impact of Mixup training on the robustness and generalization of lip recognition models is demonstrated. In addition, the performance of the lip recognition classification model depends more on the training parameters, which increase the computational cost. The InvNet-18 network in this paper reduces the consumption of GPU resources and the training time while improving the model accuracy. Compared with the standard ResNet-18 network used in mainstream lip recognition models, the InvNet-18 network in this paper has more than three times lower GPU consumption and 32% fewer parameters. After detailed analysis and comparison in various aspects, it is demonstrated that the model in this paper can effectively improve the model’s anti-interference ability and reduce training resource consumption. At the same time, the accuracy is comparable with the current state-of-the-art results.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference41 articles.

1. End-to-end Audiovisual Speech Recognition;Petridis;Proceedings of the IEEE International Conference on Acoustics,2018

2. Lip Reading-Based User Authentication Through Acoustic Sensing on Smartphones

3. A review of recent advances in visual speech decoding

4. A survey of visual lip reading and lip-password verification;Mathulaprangsan;Proceedings of the 2015 International Conference on Orange Technologies (ICOT),2015

5. Lip feature selection based on BPSO and SVM;Wang;Proceedings of the IEEE 2011 10th International Conference on Electronic Measurement & Instruments,2011

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3