Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

Author:

Mcloughlin Ian V.1,Sharifzadeh Hamid Reza2,Tan Su Lim3,Li Jingjie1,Song Yan1

Affiliation:

1. The University of Science and Technology of China, Hefei, Anhui, China

2. Unitec Institute of Technology, Auckland, New Zealand

3. Singapore Institute of Technology, Singapore

Abstract

Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction. Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper. Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,Human-Computer Interaction

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Automated Assessment of Glottal Dysfunction Through Unified Acoustic Voice Analysis;Journal of Voice;2020-09

2. Glottal Flow Synthesis for Whisper-to-Speech Conversion;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2020

3. Effectiveness of Cross-Domain Architectures for Whisper-to-Normal Speech Conversion;2019 27th European Signal Processing Conference (EUSIPCO);2019-09

4. Whispered Speech to Normal Speech Conversion Using Bidirectional LSTMs with Meta-network;2019 IEEE 2nd International Conference on Information Communication and Signal Processing (ICICSP);2019-09

5. Whisper to Normal Speech Conversion Using Sequence-to-Sequence Mapping Model With Auditory Attention;IEEE Access;2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3