Affiliation:
1. University of California, Los Angeles
Abstract
Purpose
Interrater disagreements in ratings of quality plague the study of voice. This study compared 2 methods for handling this variability.
Method
Listeners provided multiple breathiness ratings for 2 sets of pathological voices, one including 20 male and 20 female voices unselected for quality and one including 20 breathy female voices. Ratings for each listener were averaged together, mean ratings were
z
transformed, and the likelihood that 2 listeners would agree exactly in their ratings was calculated as a function of averaging and standardizing condition. Data were also multidimensionally scaled to examine similarities among listeners in perceptual strategy. Results were compared with parallel analyses of existing breathiness ratings of the same voices gathered using a method-of-adjustment task.
Results
Three-way interactions between the mean rating for a voice, standardization condition, and the number of voices averaged together were observed, but no main effect of averaging condition emerged. Multidimensional scaling revealed significant residual differences in perceptual strategy across listeners after averaging and standardizing. Ratings from the method-of-adjustment task showed both high agreement levels and consistent perceptual strategies across listeners, as theoretically predicted.
Conclusion
Averaging multiple ratings and standardizing the mean are inadequate in addressing variations in voice quality perception.
Publisher
American Speech Language Hearing Association
Subject
Speech and Hearing,Linguistics and Language,Language and Linguistics
Cited by
23 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献