High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation-Reference-Cited by-同舟云学术

High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Published:2022-09-14 Issue:2 Volume:42 Page:1083-1104
ISSN:0278-081X
Container-title:Circuits, Systems, and Signal Processing
language:en
Short-container-title:Circuits Syst Signal Process

Author:

Bhattarai Bhuwan^ORCID,Pandeya Yagya Raj,Jie You,Lamichhane Arjun Kumar,Lee Joonwhoan

Abstract

AbstractMusic source separation has traditionally followed the encoder-decoder paradigm (e.g., hourglass, U-Net, DeconvNet, SegNet) to isolate individual music components from mixtures. Such networks, however, result in a loss of location-sensitivity, as low-resolution representation drops the useful harmonic patterns over the temporal dimension. We overcame this problem by performing singing voice separation using a high-resolution representation learning (HRNet) system coupled with a long short-term memory (LSTM) module to retain high-resolution feature map and capture the temporal behavior of the acoustic signal. We called this joint combination of HRNet and LSTM as HR-LSTM. The predicted spectrograms produced by this system are close to ground truth and successfully separate music sources, achieving results superior to those realized by past methods. The proposed network was tested using four datasets (DSD100, MIR-1K, Korean Pansori, and Nepal Idol singing voice). Our experiments confirmed that the proposed HR-LSTM outperforms state-of-the-art networks at singing voice separation when the DSD100 dataset is used, performs comparably to alternative methods when the MIR-1K dataset is used, and separates the voice and accompaniment components well when the Pansori and NISVS datasets are used. In addition to proposing and validating our network, we also developed and shared our Nepal Idol dataset.

Funder

National Research Foundation of Korea

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Signal Processing

Link

https://link.springer.com/content/pdf/10.1007/s00034-022-02166-5.pdf

Reference50 articles.

1. B. Bhuwan, R.P. Yagya, L. Joonwhoan, Parallel stacked hourglass network for music source separation. IEEE Access 8, 206016–206027 (2020). https://doi.org/10.1109/ACCESS.2020.3037773

2. J. Chen, Y. Wang et al., Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises. J. Acoust. Soc. Am. 139(5), 2604–2612 (2016). https://doi.org/10.1121/1.4948445

3. C. P. Dadula, E. P. Dadios, A genetic algorithm for blind source separation based on independent component analysis, in 2014 International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), pp. 1–6. IEEE. https://doi.org/10.1109/HNICEM.2014.7016226

4. C. Donahue, J. McAuley, M. Puckette, Adversarial audio synthesis, ICLR 2019. https://doi.org/10.48550/arXiv.1802.04208.

5. Z.C. Fan, J.S.R. Jang, C.L. Lu, Singing voice separation and pitch extraction from monaural polyphonic audio music via DNN and adaptive pitch tracking, in IEEE International Conference on Multimedia Big Data (2016). https://doi.org/10.1109/BigMM.2016.56

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Construction of Intelligent Recommendation System for Impromptu Singing by Using Database Technology;2024 International Conference on Artificial Intelligence and Digital Technology (ICAIDT);2024-06-07

2. Landscape Planning and Design Based on E Cognition High-resolution Remote Sensing Image;2023 International Conference on Intelligent Sensing and Industrial Automation;2023-12-09

3. Symmetrical Impulsive Inertial Neural Networks with Unpredictable and Poisson-Stable Oscillations;Symmetry;2023-09-22