Abstract
AbstractIn face-to-face communication, audio-visual (AV) stimuli can be fused, combined or perceived as mismatching. While the left superior temporal sulcus (STS) is admittedly the locus of AV integration, the process leading to combination is unknown. Analysing behaviour and time-/source-resolved human MEG data, we show that while fusion and combination both involve early detection of AV physical features discrepancy in the STS, combination is associated in with activity of AV asynchrony-sensitive regions (auditory and inferior frontal cortices). Based on dynamic causal modelling, and neural signal decoding, we further show that AV speech integration outcome primarily depends on whether the STS can quickly converge onto an existing multimodal syllable representation, and that combination results from subsequent temporal processing, presumably re-ordering, of discrepant AV stimuli.
Publisher
Cold Spring Harbor Laboratory