The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study-Reference-Cited by-同舟云学术

The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study

Published:2018 Issue:1-2 Volume:31 Page:57-78
ISSN:2213-4794
Container-title:Multisensory Research
language:
Short-container-title:Multisens Res

Author:

Sánchez-García Carolina¹,Kandel Sonia²,Savariaux Christophe²,Soto-Faraco Salvador¹³

Affiliation:

1. Departament de Tecnologies de la Informació i les Comunicacions, Universitat Pompeu Fabra, Barcelona, Spain

2. Université Grenoble Alpes, GIPSA-lab (CNRS UMR 5216), Grenoble, France

3. Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain

Abstract

Speech unfolds in time and, as a consequence, its perception requires temporal integration. Yet, studies addressing audio-visual speech processing have often overlooked this temporal aspect. Here, we address the temporal course of audio-visual speech processing in a phoneme identification task using a Gating paradigm. We created disyllabic Spanish word-like utterances (e.g., /pafa/, /paθa/, …) from high-speed camera recordings. The stimuli differed only in the middle consonant (/f/, /θ/, /s/, /r/, /g/), which varied in visual and auditory saliency. As in classical Gating tasks, the utterances were presented in fragments of increasing length (gates), here in 10 ms steps, for identification and confidence ratings. We measured correct identification as a function of time (at each gate) for each critical consonant in audio, visual and audio-visual conditions, and computed the Identification Point and Recognition Point scores. The results revealed that audio-visual identification is a time-varying process that depends on the relative strength of each modality (i.e., saliency). In some cases, audio-visual identification followed the pattern of one dominant modality (either A or V), when that modality was very salient. In other cases, both modalities contributed to identification, hence resulting in audio-visual advantage or interference with respect to unimodal conditions. Both unimodal dominance and audio-visual interaction patterns may arise within the course of identification of the same utterance, at different times. The outcome of this study suggests that audio-visual speech integration models should take into account the time-varying nature of visual and auditory saliency.

Publisher

Brill

Subject

Cognitive Neuroscience,Computer Vision and Pattern Recognition,Sensory Systems,Ophthalmology,Experimental and Cognitive Psychology

Link

https://data.brill.com/files/journals/22134808_031_01-02_s005_text.pdf

Reference70 articles.

1. How can coarticulation models account for speech sensitivity to audio-visual desynchronization?;Abry,1996

2. Searching for audiovisual correspondence in multiple speaker scenarios;Alsius;Exp. Brain Res.,2011

3. Effect of attentional load on audiovisual speech perception: evidence from ERPs;Alsius;Front. Psychol.,2014

4. An assessment of behavioral dynamic information processing measures in audiovisual speech perception;Altieri;Front. Psychol.,2011

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Construction and Reform of the Russian Audiovisual Speaking Course in Higher Vocational Institutions in the Context of Deep Learning;Applied Mathematics and Nonlinear Sciences;2024-01-01

2. The Role of the Root in Spoken Word Recognition in Hebrew: An Auditory Gating Paradigm;Brain Sciences;2022-06-07

3. Weak observer–level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation;Cortex;2020-12

4. Causal inference explains the stimulus-level relationship between the McGurk Effect and auditory speech perception;2020-05-10

5. Time-resolved discrimination of audio-visual emotion expressions;Cortex;2019-10