Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection
Author:
Li Feng12, Hu Yujun1, Wang Lingling1
Affiliation:
1. Department of Computer Science and Technology, Anhui University of Finance and Economics, Bengbu 233030, China 2. School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
Abstract
Singing-voice separation is a separation task that involves a singing voice and musical accompaniment. In this paper, we propose a novel, unsupervised methodology for extracting a singing voice from the background in a musical mixture. This method is a modification of robust principal component analysis (RPCA) that separates a singing voice by using weighting based on gammatone filterbank and vocal activity detection. Although RPCA is a helpful method for separating voices from the music mixture, it fails when one single value, such as drums, is much larger than others (e.g., the accompanying instruments). As a result, the proposed approach takes advantage of varying values between low-rank (background) and sparse matrices (singing voice). Additionally, we propose an expanded RPCA on the cochleagram by utilizing coalescent masking on the gammatone. Finally, we utilize vocal activity detection to enhance the separation outcomes by eliminating the lingering music signal. Evaluation results reveal that the proposed approach provides superior separation outcomes than RPCA on ccMixter and DSD100 datasets.
Funder
National Natural Science Foundation of China Innovation Support Program for Returned Overseas Students in Anhui Province
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference61 articles.
1. Phoneme level lyrics alignment and text-informed singing voice separation;Doire;IEEE/ACM Trans. Audio Speech Lang. Process.,2021 2. Deep Learning Approaches in Topics of Singing Information Processing;Gupta;IEEE/ACM Trans. Audio Speech Lang. Process.,2022 3. Yu, S., Li, C., Deng, F., and Wang, X. (2021, January 14–17). Rethinking Singing Voice Separation With Spectral-Temporal Transformer. Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan. 4. Basak, S., Agarwal, S., Ganapathy, S., and Takahashi, N. (2021, January 6–12). End-to-end Lyrics Recognition with Voice to Singing Style Transfer. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada. 5. Zhang, X., Qian, J., Yu, Y., Sun, Y., and Li, W. (2021, January 6–12). Singer identification using deep timbre feature learning with knn-net. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.
|
|