Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion

Author:

Bargum Anders R.,Serafin Stefania,Erkut Cumhur

Abstract

Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios are gaining increasing popularity. Although many of the works in the field of voice conversion share a common global pipeline, there is considerable diversity in the underlying structures, methods, and neural sub-blocks used across research efforts. Thus, obtaining a comprehensive understanding of the reasons behind the choice of the different methods included when training voice conversion models can be challenging, and the actual hurdles in the proposed solutions are often unclear. To shed light on these aspects, this paper presents a scoping review that explores the use of deep learning in speech analysis, synthesis, and disentangled speech representation learning within modern voice conversion systems. We screened 628 publications from more than 38 venues between 2017 and 2023, followed by an in-depth review of a final database of 130 eligible studies. Based on the review, we summarise the most frequently used approaches to voice conversion based on deep learning and highlight common pitfalls. We condense the knowledge gathered to identify main challenges, supply solutions grounded in the analysis and provide recommendations for future research directions.

Publisher

Frontiers Media SA

Reference130 articles.

1. Voice conversion through vector quantization;Abe;ICASSP-88., Int. Conf. Acoust. Speech, Signal Process.,1988

2. Effects of sinusoidal model on non-parallel voice conversion with adversarial learning;Al-Radhi;Appl. Sci.,2021

3. Scoping studies: towards a methodological framework;Arksey;Int. J. Soc. Res. Methodol.,2005

4. StarGAN-ZSVC: towards zero-shot voice conversion in low-resource contexts;Baas;Proc. South. Afr. Conf. AI Res. (SACAIR) (Muldersdrift, South Afr.),2020

5. Gan you hear me? reclaiming unconditional speech synthesis from diffusion models;Baas,2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3