AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception-Reference-Cited by-同舟云学术

AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception

Published:2023-05-01 Issue:5 Volume:153 Page:3130
ISSN:0001-4966
Container-title:The Journal of the Acoustical Society of America
language:en
Short-container-title:

Author:

Varano Enrico¹,Guilleminot Pierre¹,Reichenbach Tobias²

Affiliation:

1. Department of Bioengineering and Centre for Neurotechnology, Imperial College London 1 , London, United Kingdom

2. Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg 2 , Erlangen, Germany

Abstract

Seeing a speaker's face can help substantially with understanding their speech, particularly in challenging listening conditions. Research into the neurobiological mechanisms behind audiovisual integration has recently begun to employ continuous natural speech. However, these efforts are impeded by a lack of high-quality audiovisual recordings of a speaker narrating a longer text. Here, we seek to close this gap by developing AVbook, an audiovisual speech corpus designed for cognitive neuroscience studies and audiovisual speech recognition. The corpus consists of 3.6 h of audiovisual recordings of two speakers, one male and one female, each reading 59 passages from a narrative English text. The recordings were acquired at a high frame rate of 119.88 frames/s. The corpus includes phone-level alignment files and a set of multiple-choice questions to test attention to the different passages. We verified the efficacy of these questions in a pilot study. A short written summary is also provided for each recording. To enable audiovisual synchronization when presenting the stimuli, four videos of an electronic clapperboard were recorded with the corpus. The corpus is publicly available to support research into the neurobiology of audiovisual speech processing as well as the development of computer algorithms for audiovisual speech recognition.

Funder

Engineering and Physical Sciences Research Council

Publisher

Acoustical Society of America (ASA)

Subject

Acoustics and Ultrasonics,Arts and Humanities (miscellaneous)

Link

https://pubs.aip.org/asa/jasa/article-pdf/153/5/3130/17860496/3130_1_10.0019460.pdf

Reference45 articles.

1. Benezeth, Y., Bachman, G., Le-Jan, G., Souviraà-Labastie, N., and Bimbot, F. (2011). “ BL-Database: A french audiovisual database for speech driven lip animation systems,” Research Report RR-7711.

2. Auditory speech detection in noise enhanced by lipreading;Speech Commun.,2004

3. Brookes, M. (2022). “ Speech processing toolbox for MATLAB,” https://github.com/ImperialCollegeLondon/sap-voicebox.

4. What accounts for individual differences in susceptibility to the McGurk effect?;PLoS ONE,2018

5. Building a data corpus for audio-visual speech recognition;Rothkrantz,2007

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research on Corpus-based Graphic Analysis of Ceramic Science and Technology Texts and Its Application in Multimodal English Teaching;Applied Mathematics and Nonlinear Sciences;2024-01-01