Abstract
AbstractThe ability to recognize abstract features of voice during auditory perception is a complex, yet poorly understood, feat of human audition. For the listener, this occurs in near-automatic fasion to seamlessly extract complex cues from a highly variable auditory signal. Voice perception depends on specialized regions of auditory cortex, including superior temporal gyrus (STG) and superior temporal sulcus (STS). However, the nature of voice encoding at the cortical level remains poorly understoood. We leverage intracerebral recordings across human auditory cortex during presentation of voice and non-voice acoustic stimuli to examine voice encoding in auditory cortex, in eight patient-participants undergoing epilepsy surgery evaluation. We show that voice-selectivity increases along the auditory hierarchy from supratemporal plane (STP) to the STG and STS. Results show accurate decoding of vocalizations from human auditory cortical activity even in the complete absence of linguistic content. These findings show an early, less-selective temporal window of neural activity in the STG and STS followed by a sustained, strongly voice-selective window. We then developed encoding models that demonstrate divergence in the encoding of acoustic features along the auditory hierarchy, wherein STG/STS responses were best explained by voice category as opposed to the acoustic features of voice stimuli. This is in contrast to neural activity recorded from STP, in which responses were accounted for by acoustic features. These findings support a model of voice perception that engages categorical encoding mechanisms within STG and STS.Significance StatementVoice perception occurs via specialized networks in higher order auditory cortex, yet how voice features are encoded remains a central unanswered question. With human intracerebral recordings of auditory cortex, we provide evidence for categorical encoding of voice in STG and STS and occurs in the absence of linguistic content. This selectivity strengthens after an initial onset response and cannot be explained by simple acoustic features. Together, these data support the existence of sites within STG and STS that are specialized for voice perception.
Publisher
Cold Spring Harbor Laboratory