A Deep Dive Into Neural Synchrony Evaluation for Audio-visual Translation-Reference-Cited by-同舟云学术

A Deep Dive Into Neural Synchrony Evaluation for Audio-visual Translation

Published:2022-11-07 Issue: Volume: Page:
ISSN:
Container-title:INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
language:
Short-container-title:

Author:

Nayak Shravan¹^ORCID,Schuler Christian²^ORCID,Saha Debjoy³^ORCID,Baumann Timo⁴^ORCID

Affiliation:

1. Language Technology Group, Universität Hamburg, Germany

2. Computer Science, Universität Hamburg, Germany

3. Indian Institute of Technology Kharagpur, India

4. Computer Science and Mathematics, Ostbayerische Technische Hochschule Regensburg, Germany

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3536221.3556621

Reference42 articles.

1. Triantafyllos Afouras Joon Son Chung and Andrew Zisserman. 2018. LRS3-TED: a large-scale dataset for visual speech recognition. CoRR abs/1809.00496(2018). arXiv:1809.00496http://arxiv.org/abs/1809.00496 Triantafyllos Afouras Joon Son Chung and Andrew Zisserman. 2018. LRS3-TED: a large-scale dataset for visual speech recognition. CoRR abs/1809.00496(2018). arXiv:1809.00496http://arxiv.org/abs/1809.00496

2. Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model

3. Stefano Arduini and Robert Hodgson. 2007. Similarity and Difference in Translation. Ed. di Storia e Letteratura. Stefano Arduini and Robert Hodgson. 2007. Similarity and Difference in Translation. Ed. di Storia e Letteratura.

4. R. Barsam and D. Mohanan . 2010 . Looking at Movies: An Introduction to Film.3 rd ed. New York :W. W. Norton & Company .. R. Barsam and D. Mohanan. 2010. Looking at Movies: An Introduction to Film.3 rd ed. New York:W. W. Norton & Company..

5. Julie N. Buchan , Martin Paré , and Kevin G. Munhall . 2008. The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception. Brain Research 1242 (Nov . 2008 ), 162–171. https://doi.org/10.1016/j.brainres.2008.06.083 10.1016/j.brainres.2008.06.083 Julie N. Buchan, Martin Paré, and Kevin G. Munhall. 2008. The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception. Brain Research 1242 (Nov. 2008), 162–171. https://doi.org/10.1016/j.brainres.2008.06.083