Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks-Reference-Cited by-同舟云学术

Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks

Published:2021-12-13 Issue:24 Volume:11 Page:11838
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Gui Wenming^ORCID,Li Yukun,Zang Xian,Zhang Jinglan

Abstract

Singing voice detection is still a challenging task because the voice can be obscured by instruments having the same frequency band, and even the same timbre, produced by mimicking the mechanism of human singing. Because of the poor adaptability and complexity of feature engineering, there is a recent trend towards feature learning in which deep neural networks play the roles of feature extraction and classification. In this paper, we present two methods to explore the channel properties in the convolution neural network to improve the performance of singing voice detection by feature learning. First, channel attention learning is presented to measure the importance of a feature, in which two attention mechanisms are exploited, i.e., the scaled dot-product and squeeze-and-excitation. This method focuses on learning the importance of the feature map so that the neurons can place more attention on the more important feature maps. Second, the multi-scale representations are fed to the input channels, aiming at adding more information in terms of scale. Generally, different songs need different scales of a spectrogram to be represented, and multi-scale representations ensure the network can choose the best one for the task. In the experimental stage, we proved the effectiveness of the two methods based on three public datasets, with the accuracy performance increasing by up to 2.13 percent compared to its already high initial level.

Funder

Open Research Fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/24/11838/pdf

Reference37 articles.

1. Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition;PeerJ Computer Science;2022-08-03

2. Singing Voice Detection: A Survey;Entropy;2022-01-12