Author:
Chen Lihan,Zhou Xiaolin,Müller Hermann J.,Shi Zhuanghua
Abstract
AbstractIn our multisensory world, we often rely more on auditory information than on visual input for temporal processing. One typical demonstration of this is that the rate of auditory flutter assimilates the rate of concurrent visual flicker. To date, however, this auditory dominance effect has largely been studied using regular auditory rhythms. It thus remains unclear whether irregular rhythms would have a similar impact on visual temporal processing; what information is extracted from the auditory sequence that comes to influence visual timing; and how the auditory and visual temporal rates are integrated together in quantitative terms. We investigated these questions by assessing, and modeling, the influence of a task-irrelevant auditory sequence on the type of ‘Ternus apparent motion’: group motion versus element motion. The type of motion seen critically depends on the time interval between the two Ternus display frames. We found that an irrelevant auditory sequence preceding the Ternus display modulates the visual interval, making observers perceive either more group motion or more element motion. This biasing effect manifests whether the auditory sequence is regular or irregular, and it is based on a summary statistic extracted from the sequential intervals: their geometric mean. However, the audiovisual interaction depends on the discrepancy between the mean auditory and visual intervals: if it becomes too large, no interaction occurs – which can be quantitatively described by a partial Bayesian integration model. Overall, our findings reveal a crossmodal perceptual averaging principle that may underlie complex audiovisual interactions in many everyday dynamic situations.Public Significance StatementThe present study shows that auditory rhythms, regardless of their regularity, can influence the way in which the visual system times (subsequently presented) events, thereby altering dynamic visual (motion) perception. This audiovisual temporal interaction is based on a summary statistic derived from the auditory sequence: the geometric mean interval, which is then combined with the visual interval in a process of partial Bayesian integration (where integration is unlikely to occur if the discrepancy between the auditory and visual intervals is too large). We propose that this crossmodal perceptual averaging principle underlies complex audiovisual interactions in many everyday dynamic perception scenarios.Author NoteThis study was supported by grants from the Natural Science Foundation of China (31200760, 61621136008, 61527804), German DFG project SH166 3/1 and “projektbezogener Wissenschaftleraustausch” (proWA). The data, and the source code of statistical analysis and modeling are available at https://github.com/msenselab/temporal_averaging. Part of the study has been presented as a talk in 17th International Multisensory Research Forum (IMRF, June 2016, Suzhou, China).
Publisher
Cold Spring Harbor Laboratory