Automatic Fluency Assessment Method for Spontaneous Speech without Reference Text
-
Published:2023-04-09
Issue:8
Volume:12
Page:1775
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Liu Jiajun12ORCID, Wumaier Aishan23ORCID, Fan Cong23ORCID, Guo Shen23ORCID
Affiliation:
1. College of Software, Xinjiang University, Urumqi 830046, China 2. Key Laboratory of Multilingual Information Technology in Xinjiang Uyghur Autonomous Region, Urumqi 830046, China 3. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Abstract
The automatic fluency assessment of spontaneous speech without reference text is a challenging task that heavily depends on the accuracy of automatic speech recognition (ASR). Considering this scenario, it is necessary to explore an assessment method that combines ASR. This is mainly due to the fact that in addition to acoustic features being essential for assessment, the text features output by ASR may also contain potentially fluency information. However, most existing studies on automatic fluency assessment of spontaneous speech are based solely on audio features, without utilizing textual information, which may lead to a limited understanding of fluency features. To address this, we propose a multimodal automatic speech fluency assessment method that combines ASR output. Specifically, we first explore the relevance of the fluency assessment task to the ASR task and fine-tune the Wav2Vec2.0 model using multi-task learning to jointly optimize the ASR task and fluency assessment task, resulting in both the fluency assessment results and the ASR output. Then, the text features and audio features obtained from the fine-tuned model are fed into the multimodal fluency assessment model, using attention mechanisms to obtain more reliable assessment results. Finally, experiments on the PSCPSF and Speechocean762 dataset suggest that our proposed method performs well in different assessment scenarios.
Funder
National Science Foundation of China Basic Research Program of Tianshan Talent Plan of Xinjiang, China
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference45 articles.
1. Toward an understanding of fluency: A microanalysis of nonnative speaker conversations;Riggenbach;Discourse Process.,1991 2. Automatic scoring of non-native spontaneous speech in tests of spoken English;Zechner;Speech Commun.,2009 3. Bhat, S., Hasegawa-Johnson, M., and Sproat, R. (2010, January 22–24). Automatic fluency assessment by signal-level measurement of spontaneous speech. Proceedings of the Second Language Studies: Acquisition, Learning, Education and Technology, Tokyo, Japan. 4. Hirabayashi, K., and Nakagawa, S. (2010, January 26–30). Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques. Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, Chiba, Japan. 5. Deng, H., Lin, Y., Utsuro, T., Kobayashi, A., Nishizaki, H., and Hoshino, J. (2020, January 4–8). Automatic fluency evaluation of spontaneous speech using disfluency-based features. Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|