Abstract
AbstractOur brain seamlessly integrates distinct sensory information to form a coherent percept. However, when real-world audiovisual events are perceived, the specific brain regions and timings for processing different levels of information remain less investigated. To address that, we curated naturalistic videos and recorded fMRI and EEG data when participants viewed videos with accompanying sounds. Our findings reveal early asymmetrical cross-modal interaction, with acoustic information represented in both early visual and auditory regions, while visual information only identified in visual cortices. The visual and auditory features were processed with similar onset but different temporal dynamics. High-level categorical and semantic information emerged in multi-modal association areas later in time, indicating late cross-modal integration and its distinct role in converging conceptual information. Comparing neural representations to a two-branch deep neural network model highlighted the necessity of early fusion to build a biologically plausible model of audiovisual perception. With EEG-fMRI fusion, we provided a spatiotemporally resolved account of neural activity during the processing of naturalistic audiovisual stimuli.
Publisher
Cold Spring Harbor Laboratory
Reference152 articles.
1. B. E. Stein and M. A. Meredith . The Merging of the Senses. 1st edition. Cambridge, Mass. London: Mit Pr, Jan. 1, 1993. 211 pp. ISBN: 978-0-262-19331-3.
2. Merging the senses into a robust percept
3. THE HUMAN VISUAL CORTEX
4. How Does the Brain Solve Visual Object Recognition?
5. The what, where and how of auditory-object perception