TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation-Reference-Cited by-同舟云学术

TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation

Published:2023-09-22 Issue:19 Volume:23 Page:8033
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Basak Krishna¹^ORCID,Mishra Nilamadhab¹^ORCID,Chang Hsien-Tsung²³⁴⁵^ORCID

Affiliation:

1. School of Computing Science & Engineering, VIT Bhopal University, Sehore 466114, India

2. Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan 333, Taiwan

3. Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan 333, Taiwan

4. Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan 333, Taiwan

5. Artificial Intelligence Research Center, Chang Gung University, Taoyuan 333, Taiwan

Abstract

Stuttering, a prevalent neurodevelopmental disorder, profoundly affects fluent speech, causing involuntary interruptions and recurrent sound patterns. This study addresses the critical need for the accurate classification of stuttering types. The researchers introduce “TranStutter”, a pioneering Convolution-free Transformer-based DL model, designed to excel in speech disfluency classification. Unlike conventional methods, TranStutter leverages Multi-Head Self-Attention and Positional Encoding to capture intricate temporal patterns, yielding superior accuracy. In this study, the researchers employed two benchmark datasets: the Stuttering Events in Podcasts Dataset (SEP-28k) and the FluencyBank Interview Subset. SEP-28k comprises 28,177 audio clips from podcasts, meticulously annotated into distinct dysfluent and non-dysfluent labels, including Block (BL), Prolongation (PR), Sound Repetition (SR), Word Repetition (WR), and Interjection (IJ). The FluencyBank subset encompasses 4144 audio clips from 32 People Who Stutter (PWS), providing a diverse set of speech samples. TranStutter’s performance was assessed rigorously. On SEP-28k, the model achieved an impressive accuracy of 88.1%. Furthermore, on the FluencyBank dataset, TranStutter demonstrated its efficacy with an accuracy of 80.6%. These results highlight TranStutter’s significant potential in revolutionizing the diagnosis and treatment of stuttering, thereby contributing to the evolving landscape of speech pathology and neurodevelopmental research. The innovative integration of Multi-Head Self-Attention and Positional Encoding distinguishes TranStutter, enabling it to discern nuanced disfluencies with unparalleled precision. This novel approach represents a substantial leap forward in the field of speech pathology, promising more accurate diagnostics and targeted interventions for individuals with stuttering disorders.

Funder

National Science and Technology Council

Chang Gung Memorial Hospital

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/19/8033/pdf

Reference53 articles.

1. Why communication is important: A rationale for the centrality of the study of communication;Morreale;J. Assoc. Commun. Adm.,2000

2. Scientists, society, and stuttering;SheikhBahaei;Int. J. Clin. Pract.,2020

3. Epidemiology of stuttering: 21st century advances;Yairi;J. Fluen. Disord.,2013

4. Bloodstein, O., Ratner, N.B., and Brundage, S.B. (2021). A Handbook on Stuttering, Plural Publishing.

5. Guitar, B., and McCauley, R.J. (2010). Treatment of Stuttering: Established and Emerging Interventions, Wolters Kluwer.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identification of the Biomechanical Response of the Muscles That Contract the Most during Disfluencies in Stuttered Speech;Sensors;2024-04-20

2. StutterNet: Stuttering Disfluencies Detection in Synthetic Speech Signals via Mel Frequency Cepstral Coefficients Features Using Deep Learning;IEEE Access;2024