Voice Conversion Using a Perceptual Criterion-Reference-Cited by-同舟云学术

Voice Conversion Using a Perceptual Criterion

Published:2020-04-22 Issue:8 Volume:10 Page:2884
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Lee Ki-Seung

Abstract

In voice conversion (VC), it is highly desirable to obtain transformed speech signals that are perceptually close to a target speaker’s voice. To this end, a perceptually meaningful criterion where the human auditory system was taken into consideration in measuring the distances between the converted and the target voices was adopted in the proposed VC scheme. The conversion rules for the features associated with the spectral envelope and the pitch modification factor were jointly constructed so that perceptual distance measurement was minimized. This minimization problem was solved using a deep neural network (DNN) framework where input features and target features were derived from source speech signals and time-aligned version of target speech signals, respectively. The validation tests were carried out for the CMU ARCTIC database to evaluate the effectiveness of the proposed method, especially in terms of perceptual quality. The experimental results showed that the proposed method yielded perceptually preferred results compared with independent conversion using conventional mean-square error (MSE) criterion. The maximum improvement in perceptual evaluation of speech quality (PESQ) was 0.312, compared with the conventional VC method.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/8/2884/pdf

Reference45 articles.

1. An overview of voice conversion systems

2. Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis;Wu;IEEE Trans. Audio Speech Lang. Process.,2006

3. Personalized Spectral and Prosody Conversion Using Frame-Based Codeword Distribution and Adaptive CRF

4. Application of speech conversion to alaryngeal speech enhancement;Bi;IEEE Trans. Audio Speech Lang. Process.,1997

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Investigating the feature extraction capabilities of non-negative matrix factorisation algorithms for black-and-white images;ITM Web of Conferences;2024

2. An Adaptive-Learning-Based Generative Adversarial Network for One-to-One Voice Conversion;IEEE Transactions on Artificial Intelligence;2023-02

3. Transfer Learning, Style Control, and Speaker Reconstruction Loss for Zero-Shot Multilingual Multi-Speaker Text-to-Speech on Low-Resource Languages;IEEE Access;2022

4. Expressive TTS Training With Frame and Style Reconstruction Loss;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2021