Abstract
AbstractForced alignment is a speech technique that can automatically align audio files with transcripts. With the help of forced alignment tools, annotating audio files and creating annotated speech databases have become much more accessible and efficient. Researchers have recently started to evaluate the benefits and accuracy of forced aligners in speech research and have provided insightful suggestions for improvement. However, current work has so far paid little attention to evaluating forced aligners in prosody research, which focuses on suprasegmental features. In this paper, we take ambiguous sentence-level audio input in Mandarin Chinese, which can be disambiguated prosodically, to evaluate the alignment accuracy of the Montreal Forced Aligner (MFA). With a satisfactory result for syllable-by-syllable alignment, we further explore the possibility and benefits of using the forced alignment tool to generate phrase-by-phrase alignment. This topic has barely been studied in previous research on forced alignment. Our paper demonstrates that the forced alignment tool can effectively generate accurate alignment at both syllable and phrase levels for tonal languages, such as Mandarin. We found that the average differences between human annotators and MFA were smaller than the gold standard, indicating a satisfactory level of performance by the tool. Moreover, the MFA-assisted annotation rate by human transcribers was at least 20 times faster than previously reported manual annotation efficiency, providing significant time and resource savings for prosody researchers. Our results also suggest that phrase-level alignment accuracy of MFA can be affected by the quality of the recording, calling prosody researchers’ attention to controlling the audio quality in the recording. The finding that de-stressed words/phrases pose challenges for MFA also provides a reference for improving forced aligners.
Publisher
Springer Science and Business Media LLC
Subject
General Economics, Econometrics and Finance,General Psychology,General Social Sciences,General Arts and Humanities,General Business, Management and Accounting
Reference25 articles.
1. Boersma P, Weenink D (2021) Praat: doing phonetics by computer [Computer program], Version 6.1.53, retrieved 23 Sept 2021 from. https://www.praat.org
2. DiCanio C, Nam H, Whalen DH, Bunnell HT, Amith JD, García RC (2013) Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment. J Acoust Soc Am 134(3):2235–2246. https://doi.org/10.1121/1.4816491
3. Goldman JP (2011) EasyAlign: an automatic phonetic alignment tool under Praat. Proceedings of the 12th Annual Conference of the International Speech Communication Association, Florence, Italy, pp. 3233–3236. August 28–31, 2011
4. Gonzalez S, Grama J, Travis C (2020). Comparing the performance of forced aligners used in sociophonetic research. Linguist Vanguard. 5. https://doi.org/10.1515/lingvan-2019-0058
5. Gorman K, Howell J, Wagner M (2011) Prosodylab-aligner: a tool for forced alignment of laboratory speech. Can Acoust 39:192–193
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Less Peaky and More Accurate CTC Forced Alignment by Label Priors;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14