Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition-Reference-Cited by-同舟云学术

Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition

Published:2019-09-11 Issue:1 Volume:2019 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Takashima Yuki^ORCID,Nakashika Toru,Takiguchi Tetsuya,Ariki Yasuo

Abstract

Abstract Voice conversion (VC) is a technique of exclusively converting speaker-specific information in the source speech while preserving the associated phonemic information. Non-negative matrix factorization (NMF)-based VC has been widely researched because of the natural-sounding voice it achieves when compared with conventional Gaussian mixture model-based VC. In conventional NMF-VC, models are trained using parallel data which results in the speech data requiring elaborate pre-processing to generate parallel data. NMF-VC also tends to be an extensive model as this method has several parallel exemplars for the dictionary matrix, leading to a high computational cost. In this study, an innovative parallel dictionary-learning method using non-negative Tucker decomposition (NTD) is proposed. The proposed method uses tensor decomposition and decomposes an input observation into a set of mode matrices and one core tensor. The proposed NTD-based dictionary-learning method estimates the dictionary matrix for NMF-VC without using parallel data. The experimental results show that the proposed method outperforms other methods in both parallel and non-parallel settings.

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics

Link

http://link.springer.com/content/pdf/10.1186/s13636-019-0160-1.pdf

Reference46 articles.

1. T. Toda, L. -H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi, in Proc. Interspeech. The voice conversion challenge 2016 (ISCASan Francisco, 2016), pp. 1632–1636.

2. R. Gray, Vector quantization. IEEE Assp. Mag.1(2), 4–29 (1984).

3. H. Valbret, E. Moulines, J. -P. Tubach, Voice transformation using PSOLA technique. Speech Comm.11(2–3), 175–187 (1992).

4. A. Kain, M. W. Macon, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. Spectral voice conversion for text-to-speech synthesis (IEEESeattle, 1998), pp. 285–288.

5. C. Veaux, X. Rodet, in Proc. Interspeech. Intonation conversion from neutral to expressive speech (ISCAFlorence, 2011), pp. 2765–2768.