Abstract
AbstractHumans are remarkably skilled at listening to one speaker out of an acoustic mixture of several speech sources. Two speakers are easily segregated, even without binaural cues, but the neural mechanisms underlying this ability are not well understood. One possibility is that early cortical processing performs a spectro-temporal decomposition of the acoustic mixture, allowing the attended speech to be reconstructed via optimally weighted recombinations that discount spectro-temporal regions where sources heavily overlap. Using human magnetoencephalography (MEG) responses to a two-talker mixture, we show evidence for an alternative possibility, in which early, active segregation occurs even for strongly spectro-temporally overlapping regions. Early (∼70 ms) responses to non-overlapping spectro-temporal features are seen for both talkers. When competing talkers’ spectro-temporal features mask each other, the individual representations persist, but they occur with a ∼20 ms delay. This suggests that the auditory cortex recovers acoustic features that are masked in the mixture, even if they occurred in the ignored speech. The existence of such noise-robust cortical representations, of features present in attended as well as ignored speech, suggests an active cortical stream segregation process, which could explain a range of behavioral effects of ignored background speech.
Publisher
Cold Spring Harbor Laboratory