The interrelationship between the face and vocal tract configuration during audiovisual speech-Reference-Cited by-同舟云学术

The interrelationship between the face and vocal tract configuration during audiovisual speech

Published:2020-12-08 Issue:51 Volume:117 Page:32791-32798
ISSN:0027-8424
Container-title:Proceedings of the National Academy of Sciences
language:en
Short-container-title:Proc Natl Acad Sci USA

Author:

Scholes Chris^ORCID,Skipper Jeremy I.^ORCID,Johnston Alan^ORCID

Abstract

It is well established that speech perception is improved when we are able to see the speaker talking along with hearing their voice, especially when the speech is noisy. While we have a good understanding of where speech integration occurs in the brain, it is unclear how visual and auditory cues are combined to improve speech perception. One suggestion is that integration can occur as both visual and auditory cues arise from a common generator: the vocal tract. Here, we investigate whether facial and vocal tract movements are linked during speech production by comparing videos of the face and fast magnetic resonance (MR) image sequences of the vocal tract. The joint variation in the face and vocal tract was extracted using an application of principal components analysis (PCA), and we demonstrate that MR image sequences can be reconstructed with high fidelity using only the facial video and PCA. Reconstruction fidelity was significantly higher when images from the two sequences corresponded in time, and including implicit temporal information by combining contiguous frames also led to a significant increase in fidelity. A “Bubbles” technique was used to identify which areas of the face were important for recovering information about the vocal tract, and vice versa,on a frame-by-frame basis. Our data reveal that there is sufficient information in the face to recover vocal tract shape during speech. In addition, the facial and vocal tract regions that are important for reconstruction are those that are used to generate the acoustic speech signal.

Funder

RCUK | Engineering and Physical Sciences Research Council

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Reference41 articles.

1. Hearing impairment and audiovisual speech integration ability: A case study report;Altieri;Front. Psychol.,2014

2. Use of hearing aids by older people: influence of non-auditory factors (vision, manual dexterity)

3. Visual Contribution to Speech Intelligibility in Noise

4. Some Experiments on the Recognition of Speech, with One and with Two Ears

5. Prediction and constraint in audiovisual speech perception

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research in methodologies for modelling the oral cavity;Biomedical Physics & Engineering Express;2024-03-18

2. Modulation transfer functions for audiovisual speech;PLOS Computational Biology;2022-07-19

3. A PCA-Based Active Appearance Model for Characterising Modes of Spatiotemporal Variation in Dynamic Facial Behaviours;Frontiers in Psychology;2022-05-26

4. Faces and Voices Processing in Human and Primate Brains: Rhythmic and Multimodal Mechanisms Underlying the Evolution and Development of Speech;Frontiers in Psychology;2022-03-30

5. Hierarchical multimodal transformer to summarize videos;Neurocomputing;2022-01