Continuous vocoder applied in deep neural network based voice conversion-Reference-Cited by-同舟云学术

Continuous vocoder applied in deep neural network based voice conversion

Published:2019-09-16 Issue:23 Volume:78 Page:33549-33572
ISSN:1380-7501
Container-title:Multimedia Tools and Applications
language:en
Short-container-title:Multimed Tools Appl

Author:

Al-Radhi Mohammed Salah^ORCID,Csapó Tamás Gábor,Németh Géza

Abstract

Abstract In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech of two speakers (source and target) are converted acoustically. Traditional conversion methods focus on the prosodic feature represented by the discontinuous fundamental frequency (F0) and the spectral envelope. Studies have shown that speech analysis/synthesis solutions play an important role in the overall quality of the converted voice. Recently, we have proposed a new continuous vocoder, originally for statistical parametric speech synthesis, in which all parameters are continuous. Therefore, this work introduces a new method by using a continuous F0 (contF0) in SVC to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech. Our contribution includes the following. (1) We integrate into the SVC framework the continuous vocoder, which provides an advanced model of the excitation signal, by converting its contF0, maximum voiced frequency, and spectral features. (2) We show that the feed-forward deep neural network (FF-DNN) using our vocoder yields high quality conversion. (3) We apply a geometric approach to spectral subtraction (GA-SS) in the final stage of the proposed framework, to improve the signal-to-noise ratio of the converted speech. Our experimental results, using two male and one female speakers, have shown that the resulting converted speech with the proposed SVC technique is similar to the target speaker and gives state-of-the-art performance as measured by objective evaluation and subjective listening tests.

Funder

Budapest University of Technology and Economics

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Hardware and Architecture,Media Technology,Software

Link

http://link.springer.com/content/pdf/10.1007/s11042-019-08198-5.pdf

Reference68 articles.

1. Aihara R, Takiguchi T, Ariki Y (2014) Individuality-preserving voice conversion for articulation disorders using dictionary selective non-negative matrix factorization. In: Proceedings of SLPAT, p 29–37

2. Al-Radhi MS, Csapó TG, Németh G (2017) Continuous vocoder in feed-forward deep neural network based speech synthesis. In: Proceedings of the digital speech and image processing. Serbia

3. Al-Radhi MS, Csapó TG, Németh G (2017) Time-domain envelope modulating the noise component of excitation in a continuous residual-based vocoder for statistical parametric speech synthesis. In: Proceedings of Interspeech. Stockholm, p 434–438

4. ANSI (1997) Methods for the calculation of the speech intelligibility index. American National Standards Institute, ANSI Standard S3.5

5. Chen LH, Ling ZH, Liu LJ, Dai LR (2014) Voice conversion using deep neural networks with layer-wise generative training. IEEE Trans Audio Speech Lang Process 22(12):1859–1872. https://doi.org/10.1109/TASLP.2014.2353991

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Perturbation AUTOVC: Voice Conversion From Perturbation and Autoencoder Loss;IEEE Access;2023

2. MixGAN-TTS: Efficient and Stable Speech Synthesis Based on Diffusion Model;IEEE Access;2023

3. Investigations on speaker adaptation using a continuous vocoder within recurrent neural network based text-to-speech synthesis;Multimedia Tools and Applications;2022-10-22

4. Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning;Applied Sciences;2021-08-15