Author:
Garg Saurabh,Hamarneh Ghassan,Sereno Joan,Jongman Allard,Wang Yue
Abstract
Visual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessarily distinguishable through visible articulatory movements. However, there is evidence that head, eyebrow, and lip movements correlate with production of pitch-related variations. One subsequent question is whether such visual cues are linguistically meaningful. In this study, we compare movements of the head, eyebrows and lips associated with plain (conversational) vs. clear speech styles of Mandarin tone articulation to examine the extent to which clear-speech modifications involve signal-based overall exaggerated facial movements or code-based enhancement of linguistically relevant articulatory movements. Applying computer-vision techniques to recorded speech, visible movements of the frontal face were tracked and measured for 20 native Mandarin speakers speaking in two speech styles: plain and clear. Thirty-three head, eyebrow and lip movement features based on distance, time, and kinematics were extracted from each individual tone word. A random forest classifier was used to identify the important features that differentiate the two styles across tones and for each tone. Mixed-effects models were then performed to determine the features that were significantly different between the two styles. Overall, for all the four Mandarin tones, we found longer duration and greater movements of the head, eyebrows, and lips in clear speech than in plain speech. Additionally, across tones, the maximum movement happened relatively earlier in clear than plain speech. Although limited evidence of tone-specific modifications was also observed, the cues involved overlap with signal-based changes. These findings suggest that visual facial tonal modifications for clear speech primarily adopt signal-based general emphatic cues that strengthen signal saliency.
Funder
Natural Sciences and Engineering Research Council of Canada
Social Sciences and Humanities Research Council of Canada
Subject
Social Sciences (miscellaneous),Communication
Reference60 articles.
1. “Production of Mandarin lexical tones: Auditory and visual components,”;Attina;Proceedings of International Conference on Auditory-visual Speech Processing (AVSP) 2010,2010
2. The clear speech effect for non-native listeners;Bradlow;J. Acoust. Soc. Am.,2002
3. Auditory-visual perception of lexical tone. In, P. Dalsgaard, B. Lindberg, H. Benner, and Z. H. Tan, (eds.);Burnham;Proceedings of the 7th Conference on Speech Communication and Technology, EUROSPEECH 2001,2001
4. “The perception and production of phones and tones: The role of rigid and non-rigid face and head motion,”;Burnham;Proceedings of the International Seminar on Speech Production 2006,2006
5. Seeing lexical tone: head and face motion in production and perception of Cantonese lexical tones;Burnham;Speech Commun.,2022
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献