Affiliation:
1. Sensimetrics Corp. , Malden, Mass., b Boston University , Boston, Mass., and c Steven Greenberg , Silicon Speech, Santa Venetia, Calif. , USA
Abstract
Abstract
This study was motivated by the prospective role played by brain rhythms in speech perception. The intelligibility – in terms of word error rate – of natural-sounding, synthetically generated sentences was measured using a paradigm that alters speech-energy rhythm over a range of frequencies. The material com-prised 96 semantically unpredictable sentences, each approximately 2 s long (6–8 words per sentence), generated by a high-quality text-to-speech (TTS) synthesis engine. The TTS waveform was time-compressed by a factor of 3, creating a signal with a syllable rhythm three times faster than the original, and whose intel-ligibility is poor (<50% words correct). A waveform with an artificial rhythm was produced by automatically segmenting the time-compressed waveform into consecutive 40-ms fragments, each followed by a silent interval. The parameters varied were the length of the silent interval (0–160 ms) and whether the lengths of silence were equal (‘periodic’) or not (‘aperiodic’). The performance curve (word error rate as a function of mean duration of silence) was U-shaped. The lowest word error rate (i.e., highest intelligibility) occurred when the silence was 80 ms long and inserted periodically. This is also the condition for which word error rate increased when the silence was inserted aperiodically. These data are consistent with a model (TEMPO) in which low-frequency brain rhythms affect the ability to decode the speech signal. In TEMPO, optimum intelligibility is achieved when the syllable rhythm is within the range of the high theta-frequency brain rhythms (6–12 Hz), comparable to the rate at which segments and syllables are articulated in conversational speech.
Subject
Linguistics and Language,Acoustics and Ultrasonics,Language and Linguistics
Cited by
245 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献