1. Triantafyllos Afouras Joon Son Chung and Andrew Zisserman. 2018. LRS3-TED: a large-scale dataset for visual speech recognition. CoRR abs/1809.00496(2018). arXiv:1809.00496http://arxiv.org/abs/1809.00496 Triantafyllos Afouras Joon Son Chung and Andrew Zisserman. 2018. LRS3-TED: a large-scale dataset for visual speech recognition. CoRR abs/1809.00496(2018). arXiv:1809.00496http://arxiv.org/abs/1809.00496
2. Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model
3. Stefano Arduini and Robert Hodgson. 2007. Similarity and Difference in Translation. Ed. di Storia e Letteratura. Stefano Arduini and Robert Hodgson. 2007. Similarity and Difference in Translation. Ed. di Storia e Letteratura.
4. R. Barsam and D. Mohanan . 2010 . Looking at Movies: An Introduction to Film.3 rd ed. New York :W. W. Norton & Company .. R. Barsam and D. Mohanan. 2010. Looking at Movies: An Introduction to Film.3 rd ed. New York:W. W. Norton & Company..
5. Julie N. Buchan , Martin Paré , and Kevin G. Munhall . 2008. The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception. Brain Research 1242 (Nov . 2008 ), 162–171. https://doi.org/10.1016/j.brainres.2008.06.083 10.1016/j.brainres.2008.06.083 Julie N. Buchan, Martin Paré, and Kevin G. Munhall. 2008. The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception. Brain Research 1242 (Nov. 2008), 162–171. https://doi.org/10.1016/j.brainres.2008.06.083