Abstract
AbstractAudiovisual integration of speech can benefit the listener by not only improving comprehension of what a talker is saying but also helping a listener pick a particular talker’s voice out of a mix of sounds. Binding, an early integration of auditory and visual streams that helps an observer allocate attention to a combined audiovisual object, is likely involved in audiovisual speech processing. Although temporal coherence of stimulus features across sensory modalities has been implicated as an important cue for non-speech stimuli (Maddox et al., 2015), the specific cues that drive binding in speech are not fully understood due to the challenges of studying binding in natural stimuli. Here we used speech-like artificial stimuli that allowed us to isolate three potential contributors to binding: temporal coherence (are the face and the voice changing synchronously?), articulatory correspondence (do visual faces represent the correct phones?), and talker congruence (do the face and voice come from the same person?). In a trio of experiments, we examined the relative contributions of each of these cues. Normal hearing listeners performed a dual detection task in which they were instructed to respond to events in a target auditory stream and a visual stream while ignoring events in a distractor auditory stream. We found that viewing the face of a talker who matched the attended voice (i.e., talker congruence) offered a performance benefit. Importantly, we found no effect of temporal coherence on performance in this task, a result that prompts an important recontextualization of previous findings.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献