Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths


Liu Jiajun12ORCID,Wumaier Aishan23ORCID,Wei Dongping23ORCID,Guo Shen23ORCID


1. College of Software, Xinjiang University, Urumqi 830046, China

2. Key Laboratory of Multilingual Information Technology in Xinjiang Uyghur Autonomous Region, Urumqi 830046, China

3. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China


Speech is critical for interpersonal communication, but not everyone has fluent communication skills. Speech disfluency, including stuttering and interruptions, affects not only emotional expression but also clarity of expression for people who stutter. Existing methods for detecting speech disfluency rely heavily on annotated data, which can be costly. Additionally, these methods have not considered the issue of variable-length disfluent speech, which limits the scalability of detection methods. To address these limitations, this paper proposes an automated method for detecting speech disfluency that can improve communication skills for individuals and assist therapists in tracking the progress of stuttering patients. The proposed method focuses on detecting four types of disfluency features using single-task detection and utilizes embeddings from the pre-trained wav2vec2.0 model, as well as convolutional neural network (CNN) and Transformer models for feature extraction. The model’s scalability is improved by considering the issue of variable-length disfluent speech and modifying the model based on the entropy invariance of attention mechanisms. The proposed automated method for detecting speech disfluency has the potential to assist individuals in overcoming speech disfluency, improve their communication skills, and aid therapists in tracking the progress of stuttering patients. Additionally, the model’s scalability across languages and lengths enhances its practical applicability. The experiments demonstrate that the model outperforms baseline models in both English and Chinese datasets, proving its universality and scalability in real-world applications.


the Central Guiding Local Science and Technology Development Special Fund Project

the Basic Research Program of Tianshan Talent Plan of Xinjiang, China




Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference57 articles.

1. Literature survey and review of techniques used for automatic assessment of Stuttered Speech;Gupta;Int. J. Manag. Technol. Eng.,2019

2. Starkweather, C.W. (1987). Fluency and Stuttering, Prentice-Hall, Inc.

3. Overview of the diagnosis and treatment of stuttering;Maguire;J. Exp. Clin. Med.,2012

4. Stuttering: A brief review;Lawrence;Am. Fam. Physician,1998

5. Epidemiology of stuttering: 21st century advances;Yairi;J. Fluen. Disord.,2013

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3