Harmonic-aware tri-path convolution recurrent network for singing voice separation-Reference-Cited by-同舟云学术

Harmonic-aware tri-path convolution recurrent network for singing voice separation

Published:2023-07-01 Issue:7 Volume:3 Page:
ISSN:2691-1191
Container-title:JASA Express Letters
language:en
Short-container-title:

Author:

Shen Yih-Liang¹,Lai Ya-Ching¹,Chi Tai-Shih¹

Affiliation:

1. Department of Electronics and Electrical Engineering, National Yang Ming Chiao Tung University , Hsinchu City, Taiwan yihliang.ee06@nycu.edu.tw ; r7.ee08@nycu.edu.tw ; tschi@nycu.edu.tw

Abstract

Temporal coherence and spectral regularity are critical cues for human auditory streaming processes and are considered in many sound separation models. Some examples include the Conv-tasnet model, which focuses on temporal coherence using short length kernels to analyze sound, and the dual-path convolution recurrent network (DPCRN) model, which uses two recurring neural networks to analyze general patterns along the temporal and spectral dimensions on a spectrogram. By expanding DPCRN, a harmonic-aware tri-path convolution recurrent network model via the addition of an inter-band RNN is proposed. Evaluation results on public datasets show that this addition can further boost the separation performances of DPCRN.

Funder

Ministry of Science and Technology, Taiwan

Publisher

Acoustical Society of America (ASA)

Subject

Electrical and Electronic Engineering,Atomic and Molecular Physics, and Optics

Link

https://pubs.aip.org/asa/jel/article-pdf/doi/10.1121/10.0019997/18025136/074801_1_10.0019997.pdf

Reference28 articles.

1. Défossez, A., Usunier, N., Bottou, L., and Bach, F. (2019). “ DEMUCS: Deep extractor for music sources with extra unlabeled data remixed,” arXiv:1909.01174.

2. Automatic lyrics transcription using dilated convolutional neural networks with self-attention,2020

3. Singing voice separation and pitch extraction from monaural polyphonic audio music via DNN and adaptive pitch tracking,2016

4. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,2015