Dual Attention Network for Pitch Estimation of Monophonic Music
Author:
Ma Wenfang,Hu Ying,Huang Hao
Abstract
The task of pitch estimation is an essential step in many audio signal processing applications. In this paper, we propose a data-driven pitch estimation network, the Dual Attention Network (DA-Net), which processes directly on the time-domain samples of monophonic music. DA-Net includes six Dual Attention Modules (DA-Modules), and each of them includes two kinds of attention: element-wise and channel-wise attention. DA-Net is to perform element attention and channel attention operations on convolution features, which reflects the idea of "symmetry". DA-Modules can model the semantic interdependencies between element-wise and channel-wise features. In the DA-Module, the element-wise attention mechanism is realized by a Convolutional Gated Linear Unit (ConvGLU), and the channel-wise attention mechanism is realized by a Squeeze-and-Excitation (SE) block. We explored three kinds of combination modes (serial mode, parallel mode, and tightly coupled mode) of the element-wise attention and channel-wise attention. Element-wise attention selectively emphasizes useful features by re-weighting the features at all positions. Channel-wise attention can learn to use global information to selectively emphasize the informative feature maps and suppress the less useful ones. Therefore, DA-Net adaptively integrates the local features with their global dependencies. The outputs of DA-Net are fed into a fully connected layer to generate a 360-dimensional vector corresponding to 360 pitches. We trained the proposed network on the iKala and MDB-stem-synth datasets, respectively. According to the experimental results, our proposed dual attention network with tightly coupled mode achieved the best performance.
Funder
National Natural Science Foundation of China (NSFC) ;The Funds for Creative Research Groups of Higher Education of Xinjiang Uygur Autonomous Region under Grant
Subject
Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)
Reference51 articles.
1. Medleydb: A multitrack dataset for annotation-intensive mir research;Bittner;ISMIR,2014
2. Computer-aided melody note transcription using the Tony software: Accuracy and efficiencyhttps://qmro.qmul.ac.uk/xmlui/handle/123456789/7247
3. Music information retrieval
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Sonar Target Detection Based on a Dual Channel Attention Convolutional Network;2022 12th International Conference on Information Science and Technology (ICIST);2022-10-14