Using a forced aligner for prosody research

Author:

Wu HongchenORCID,Yun JiwonORCID,Li Xiang,Huang Huiyi,Liu Chuandong

Abstract

AbstractForced alignment is a speech technique that can automatically align audio files with transcripts. With the help of forced alignment tools, annotating audio files and creating annotated speech databases have become much more accessible and efficient. Researchers have recently started to evaluate the benefits and accuracy of forced aligners in speech research and have provided insightful suggestions for improvement. However, current work has so far paid little attention to evaluating forced aligners in prosody research, which focuses on suprasegmental features. In this paper, we take ambiguous sentence-level audio input in Mandarin Chinese, which can be disambiguated prosodically, to evaluate the alignment accuracy of the Montreal Forced Aligner (MFA). With a satisfactory result for syllable-by-syllable alignment, we further explore the possibility and benefits of using the forced alignment tool to generate phrase-by-phrase alignment. This topic has barely been studied in previous research on forced alignment. Our paper demonstrates that the forced alignment tool can effectively generate accurate alignment at both syllable and phrase levels for tonal languages, such as Mandarin. We found that the average differences between human annotators and MFA were smaller than the gold standard, indicating a satisfactory level of performance by the tool. Moreover, the MFA-assisted annotation rate by human transcribers was at least 20 times faster than previously reported manual annotation efficiency, providing significant time and resource savings for prosody researchers. Our results also suggest that phrase-level alignment accuracy of MFA can be affected by the quality of the recording, calling prosody researchers’ attention to controlling the audio quality in the recording. The finding that de-stressed words/phrases pose challenges for MFA also provides a reference for improving forced aligners.

Publisher

Springer Science and Business Media LLC

Subject

General Economics, Econometrics and Finance,General Psychology,General Social Sciences,General Arts and Humanities,General Business, Management and Accounting

Reference25 articles.

1. Boersma P, Weenink D (2021) Praat: doing phonetics by computer [Computer program], Version 6.1.53, retrieved 23 Sept 2021 from. https://www.praat.org

2. DiCanio C, Nam H, Whalen DH, Bunnell HT, Amith JD, García RC (2013) Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment. J Acoust Soc Am 134(3):2235–2246. https://doi.org/10.1121/1.4816491

3. Goldman JP (2011) EasyAlign: an automatic phonetic alignment tool under Praat. Proceedings of the 12th Annual Conference of the International Speech Communication Association, Florence, Italy, pp. 3233–3236. August 28–31, 2011

4. Gonzalez S, Grama J, Travis C (2020). Comparing the performance of forced aligners used in sociophonetic research. Linguist Vanguard. 5. https://doi.org/10.1515/lingvan-2019-0058

5. Gorman K, Howell J, Wagner M (2011) Prosodylab-aligner: a tool for forced alignment of laboratory speech. Can Acoust 39:192–193

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Less Peaky and More Accurate CTC Forced Alignment by Label Priors;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3