1. LRS3-TED: A large-scale dataset for visual speech recognition;Afouras Triantafyllos;arXiv preprint arXiv:1809.00496,2018
2. Víctor Arroyo, Jose J. Valero-Mas, Jorge Calvo-Zaragoza, and Antonio Pertusa. 2022. Neural audio-to-score music transcription for unconstrained polyphony using compact output representations. In 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’22). IEEE, 4603–4607.
3. wav2vec 2.0: A framework for self-supervised learning of speech representations;Baevski Alexei;Advances in Neural Information Processing Systems,2020
4. Sakya Basak, Shrutina Agarwal, Sriram Ganapathy, and Naoya Takahashi. 2021. End-to-end lyrics recognition with voice to singing style transfer. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 266–270.
5. Ke Chen, Shuai Yu, Cheng-i Wang, Wei Li, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. 2022. Tonet: Tone-octave network for singing melody extraction from polyphonic music. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 621–625.