Abstract
AbstractA social scene is particularly informative when people are distinguishable. To understand somebody amid a ‘cocktail party’ chatter, we automatically index their voice. This ability is underpinned by parallel processing of vocal spectral contours from speech sounds, but it has not yet been established how this occurs in the brain’s cortex. We investigate single-trial neural tracking of slow modulations in speech formants using electroencephalography. Participants briefly listened to unfamiliar single speakers, and in addition, they performed a cocktail party comprehension task. Quantified through stimulus reconstruction methods, robust tracking was found in neural responses to slow (delta-theta range) modulations of the fourth and fifth formant band contours equivalent to the 3.5–5 KHz audible range. Instantaneous inter-formant spacing (ΔF), which also yields indexical information from the vocal tract, was similarly decodable. Moreover, EEG evidence of listeners’ spectral tracking abilities predicted their chances of succeeding at selective listening when faced with two-speaker speech mixtures. In summary, the results indicate that the communicating brain can rely on locking of cortical rhythms to major changes led by upper resonances of the vocal tract. Their corresponding articulatory mechanics hence continuously issue a fundamental credential for listeners to target in real time.Significance statementThe human voice is acoustically fingerprinted, thanks in part to the shape and function of the larynx and oral cavities as they change from person to person, and from one instant to the next. This study shows that a substantial portion of these time-varying signatures are consistently traced by cortical networks. People who are more capable of tracking these variations neurally are more likely to grasp speakers of their choice despite masking by others. Thus, the changes can be used for the brain to tell what pieces of an unfamiliar voice fit together and which ones to put apart. Solving this inference in real time represents a cornerstone to understanding communication in everyday’s social world.
Publisher
Cold Spring Harbor Laboratory