1. Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder;2024 IEEE International Conference on Multimedia and Expo Workshops (ICMEW);2024-07-15
2. Sla-former: conformer using shifted linear attention for audio-visual speech recognition;Complex & Intelligent Systems;2024-05-18
3. Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14
4. Do VSR Models Generalize Beyond LRS3?;2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2024-01-03