Overview of Voice Conversion Methods Based on Deep Learning-Reference-Cited by-同舟云学术

Overview of Voice Conversion Methods Based on Deep Learning

Published:2023-02-28 Issue:5 Volume:13 Page:3100
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Walczyna Tomasz¹^ORCID,Piotrowski Zbigniew¹^ORCID

Affiliation:

1. Institute of Communication Systems, Faculty of Electronics, Military University of Technology, 00-908 Warsaw, Poland

Abstract

Voice conversion is a process where the essence of a speaker’s identity is seamlessly transferred to another speaker, all while preserving the content of their speech. This usage is accomplished using algorithms that blend speech processing techniques, such as speech analysis, speaker classification, and vocoding. The cutting-edge voice conversion technology is characterized by deep neural networks that effectively separate a speaker’s voice from their linguistic content. This article offers a comprehensive overview of the development status of this area of science based on the current state-of-the-art voice conversion methods.

Funder

National Centre for Research and Development

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/5/3100/pdf

Reference66 articles.

1. Voice conversion;Childers;Speech Commun.,1989

2. An Overview of Voice Conversion Systems;Mohammadi;Speech Commun.,2017

3. An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning;Sisman;IEEE/ACM Trans. Audio Speech Lang. Process.,2020

4. Variani, E., Lei, X., McDermott, E., Moreno, I.L., and Gonzalez-Dominguez, J. (2014, January 4–9). Deep Neural Networks for Small Footprint Text-dependent Speaker Verification. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.

5. Voice conversion versus speaker verification: An overview;Wu;APSIPA Trans. Signal Inf. Process.,2014

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion;Frontiers in Signal Processing;2024-08-16

2. Assessment of Self-Supervised Denoising Methods for Esophageal Speech Enhancement;Applied Sciences;2024-07-31

3. GNNAE-AVSS: Graph Neural Network Based Autoencoders for Audio-Visual Speech Synthesis;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

4. Review of Existing Methods for Generating and Detecting Fake and Partially Fake Audio;Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics;2024-06-19

5. Introduction to Audio Deepfake Generation: Academic Insights for Non-Experts;3rd ACM International Workshop on Multimedia AI against Disinformation;2024-06-10