Enrichment of Oesophageal Speech: Voice Conversion with Duration–Matched Synthetic Speech as Target-Reference-Cited by-同舟云学术

Enrichment of Oesophageal Speech: Voice Conversion with Duration–Matched Synthetic Speech as Target

Published:2021-06-26 Issue:13 Volume:11 Page:5940
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Raman Sneha^ORCID,Sarasola Xabier,Navas Eva^ORCID,Hernaez Inma^ORCID

Abstract

Pathological speech such as Oesophageal Speech (OS) is difficult to understand due to the presence of undesired artefacts and lack of normal healthy speech characteristics. Modern speech technologies and machine learning enable us to transform pathological speech to improve intelligibility and quality. We have used a neural network based voice conversion method with the aim of improving the intelligibility and reducing the listening effort (LE) of four OS speakers of varying speaking proficiency. The novelty of this method is the use of synthetic speech matched in duration with the source OS as the target, instead of parallel aligned healthy speech. We evaluated the converted samples from this system using a collection of Automatic Speech Recognition systems (ASR), an objective intelligibility metric (STOI) and a subjective test. ASR evaluation shows that the proposed system had significantly better word recognition accuracy compared to unprocessed OS, and baseline systems which used aligned healthy speech as the target. There was an improvement of at least 15% on STOI scores indicating a higher intelligibility for the proposed system compared to unprocessed OS, and a higher target similarity in the proposed system compared to baseline systems. The subjective test reveals a significant preference for the proposed system compared to unprocessed OS for all OS speakers, except one who was the least proficient OS speaker in the data set.

Funder

H2020 Marie Skłodowska-Curie Actions

Basque Government

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/13/5940/pdf

Reference48 articles.

1. Head and Neck Cancer: Treatment, Rehabilitation, and Outcomes;Ward,2014

2. Communication, functional disorders and lifestyle changes after total laryngectomy

3. Objective and subjective voice outcomes after total laryngectomy: a systematic review

4. Speech Rehabilitation after Total Laryngectomy

5. Voice and speech after laryngectomy

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessment of Self-Supervised Denoising Methods for Esophageal Speech Enhancement;Applied Sciences;2024-07-31

2. Special Issue on Applications of Speech and Language Technologies in Healthcare;Applied Sciences;2023-06-05

3. E-DGAN: An Encoder-Decoder Generative Adversarial Network Based Method for Pathological to Normal Voice Conversion;IEEE Journal of Biomedical and Health Informatics;2023-05

4. A review of IoT systems to enable independence for the elderly and disabled individuals;Internet of Things;2023-04

5. Predicted Phase Using Deep Neural Networks to Enhance Esophageal Speech;Lecture Notes on Data Engineering and Communications Technologies;2023