Intelligibility Improvement of Esophageal Speech Using Sequence-to-Sequence Voice Conversion with Auditory Attention-Reference-Cited by-同舟云学术

Intelligibility Improvement of Esophageal Speech Using Sequence-to-Sequence Voice Conversion with Auditory Attention

Published:2022-07-13 Issue:14 Volume:12 Page:7062
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Ezzine Kadria,Di Martino Joseph,Frikha Mondher

Abstract

Laryngectomees are individuals whose larynx has been surgically removed, usually due to laryngeal cancer. The immediate consequence of this operation is that these individuals (laryngectomees) are unable to speak. Esophageal speech (ES) remains the preferred alternative speaking method for laryngectomees. However, compared to the laryngeal voice, ES is characterized by low intelligibility and poor quality due to chaotic fundamental frequency F0, specific noises, and low intensity. Our proposal to solve these problems is to take advantage of voice conversion as an effective way to improve speech quality and intelligibility. To this end, we propose in this work a novel esophageal–laryngeal voice conversion (VC) system based on a sequence-to-sequence (Seq2Seq) model combined with an auditory attention mechanism. The originality of the proposed framework is that it adopts an auditory attention technique in our model, which leads to more efficient and adaptive feature mapping. In addition, our VC system does not require the classical DTW alignment process during the learning phase, which avoids erroneous mappings and significantly reduces the computational time. Moreover, to preserve the identity of the target speaker, the excitation and phase coefficients are estimated by querying a binary search tree. In experiments, objective and subjective tests confirmed that the proposed approach performs better even in some difficult cases in terms of speech quality and intelligibility.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/14/7062/pdf

Reference36 articles.

1. A pneumatic artificial larynx popularized in Hong Kong

2. Laryngectomy: The silent unknowns and challenges of surgical treatment

3. Vertical partial laryngectomy—Results;Guerrier,1984

4. Enhancement of esophageal speech using formant synthesis.

5. Real-time clarification of esophageal speech using a comb filter;Hisada;Proceedings of the International Conference on Disability, Virtual Reality and Associated Technologies,2002

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessment of Self-Supervised Denoising Methods for Esophageal Speech Enhancement;Applied Sciences;2024-07-31

2. Special Issue on Applications of Speech and Language Technologies in Healthcare;Applied Sciences;2023-06-05

3. Analysis of Phonetic Segments of Oesophageal Speech in People Following Total Laryngectomy;Applied Sciences;2023-04-16