Is Lip Region-of-Interest Sufficient for Lipreading?-Reference-Cited by-同舟云学术

Is Lip Region-of-Interest Sufficient for Lipreading?

Published:2022-11-07 Issue: Volume: Page:
ISSN:
Container-title:INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
language:
Short-container-title:

Author:

Zhang Jing-Xuan¹^ORCID,Wan Genshun²,Pan Jia³

Affiliation:

1. iFLYTEK Research, iFLYTEK Co., Ltd., China and University of Science and Technology of China, China

2. iFLYTEK Research, iFLYTEK Co., Ltd, China

3. iFLYTEK Research, iFLYTEK Co., Ltd., China

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3536221.3556571

Reference30 articles.

1. Triantafyllos Afouras , Joon Son Chung , Andrew Senior , Oriol Vinyals , and Andrew Zisserman . 2018. Deep audio-visual speech recognition . IEEE transactions on pattern analysis and machine intelligence ( 2018 ). Triantafyllos Afouras, Joon Son Chung, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. 2018. Deep audio-visual speech recognition. IEEE transactions on pattern analysis and machine intelligence (2018).

2. T. Afouras J. S. Chung and A. Zisserman. 2018. LRS3-TED: a large-scale dataset for visual speech recognition. In arXiv preprint arXiv:1809.00496. T. Afouras J. S. Chung and A. Zisserman. 2018. LRS3-TED: a large-scale dataset for visual speech recognition. In arXiv preprint arXiv:1809.00496.

3. ASR is All You Need: Cross-Modal Distillation for Lip Reading

4. Wav2vec 2.0: A framework for self-supervised learning of speech representations;Baevski Alexei;Advances in Neural Information Processing Systems,2020

5. Hangbo Bao , Li Dong , Songhao Piao , and Furu Wei . 2022 . BEiT: BERT Pre-Training of Image Transformers. In ICLR 2022. Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2022. BEiT: BERT Pre-Training of Image Transformers. In ICLR 2022.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-Modal Knowledge Transfer for Target Speaker Lipreading with Improved Audio-Visual Pretraining and Cross-Lingual Fine-Tuning;2024 IEEE International Conference on Multimedia and Expo Workshops (ICMEW);2024-07-15

2. Speaker independent VSR: A systematic review and futuristic applications;Image and Vision Computing;2023-10

3. Self-Supervised Audio-Visual Speech Representations Learning by Multimodal Self-Distillation;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2023-06-04

4. An Investigation into Audio–Visual Speech Recognition under a Realistic Home–TV Scenario;Applied Sciences;2023-03-23