Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion-Reference-Cited by-同舟云学术

Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion

Published:2023-06 Issue: Volume:151 Page:52-63
ISSN:0167-6393
Container-title:Speech Communication
language:en
Short-container-title:Speech Communication

Author:

Yang Jichen,Zhou Yi^ORCID,Huang Hao

Publisher

Elsevier BV

Subject

Computer Science Applications,Computer Vision and Pattern Recognition,Linguistics and Language,Language and Linguistics,Communication,Modeling and Simulation,Software

Reference69 articles.

1. Vq-wav2vec: Self-supervised learning of discrete speech representations;Baevski,2019

2. Baevski, A., Zhou, Y., Mohamed, A., Auli, M., 2020. wav2vec 2.0: a framework for self-supervised learning of speech representation. In: Annual Conference on Neural Information Processing System.

3. Voice conversion using deep neural networks with layer-wise generative training;Chen;IEEE/ACM Trans. Audio Speech Lang. Process.,2014

4. Chen, Y.-N., Liu, L.-J., Hu, Y.-J., Jiang, Y., Ling, Z.-H., 2022. Improving Recognition-Synthesis Based any-to-one Voice Conversion with Cyclic Training. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 7007–7011.

5. WavLM: Large-scale self-supervised pre-training for full stack speech processing;Chen,2021

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion;Frontiers in Signal Processing;2024-08-16

2. Improving Voice Style Conversion via Self-attention VAE with Feature Disentanglement;Communications in Computer and Information Science;2024

3. Coal Gangue Recognition in the Strong Background Noise Using Two-Level Auditory Feature Fusion with Attention Mechanism;2024