RPCA-DRNN technique for monaural singing voice separation-Reference-Cited by-同舟云学术

RPCA-DRNN technique for monaural singing voice separation

Published:2022-02-05 Issue:1 Volume:2022 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Lai Wen-Hsing^ORCID,Wang Siou-Lin

Abstract

AbstractIn this study, we propose a methodology for separating a singing voice from musical accompaniment in a monaural musical mixture. The proposed method uses robust principal component analysis (RPCA), followed by postprocessing, including median filter, morphology, and high-pass filter, to decompose the mixture. Subsequently, a deep recurrent neural network comprising two jointly optimized parallel-stacked recurrent neural networks (sRNNs) with mask layers and trained on limited data and computation is applied to the decomposed components to optimize the final estimated separated singing voice and background music to further correct misclassified or residual singing and background music in the initial separation. The experimental results of MIR-1K, ccMixter, and MUSDB18 datasets and the comparison with ten existing techniques indicate that the proposed method achieves competitive performance in monaural singing voice separation. On MUSDB18, the proposed method reaches the comparable separation quality in less training data and lower computational cost compared to the other state-of-the-art technique.

Funder

Ministry of Science and Technology, Taiwan

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics

Link

https://link.springer.com/content/pdf/10.1186/s13636-022-00236-9.pdf

Reference77 articles.

1. K. Hu, D. Wang, An unsupervised approach to cochannel speech separation. IEEE Trans. Audio. Speech. Lang. Process. 21(1), 122–131 (2013). https://doi.org/10.1109/TASL.2012.2215591

2. Z. Jin, D. Wang, Reverberant speech segregation based on multipitch tracking and classification. IEEE Trans. Audio. Speech. Lang. Process. 19(8), 2328–2337 (2011). https://doi.org/10.1109/TASL.2011.2134086

3. D. Kawai, K. Yamamoto, S. Nakagawa, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Speech analysis of sung-speech and lyric recognition in monophonic singing (IEEE, Shanghai, 2016), pp. 271–275