Affiliation:
1. Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China
2. Shandong Zhengzhong Information Technology Co., Ltd., Jinan 250098, China
Abstract
Music genre classification (MGC) is the basis for the efficient organization, retrieval, and recommendation of music resources, so it has important research value. Convolutional neural networks (CNNs) have been widely used in MGC and achieved excellent results. However, CNNs cannot model global features well due to the influence of the local receptive field; these global features are crucial for classifying music signals with temporal properties. Transformers can capture long-range dependencies within an image thanks to adopting the self-attention mechanism. Nevertheless, there are still performance and computational cost gaps between Transformers and existing CNNs. In this paper, we propose a hybrid architecture (CNN-TE) based on CNN and Transformer encoder for MGC. Specifically, we convert the audio signals into mel spectrograms and feed them into a hybrid model for training. Our model employs a CNN to initially capture low-level and localized features from the spectrogram. Subsequently, these features are processed by a Transformer encoder, which models them globally to extract high-level and abstract semantic information. This refined information is then classified using a multi-layer perceptron. Our experiments demonstrate that this approach surpasses many existing CNN architectures when tested on the GTZAN and FMA datasets. Notably, it achieves these results with fewer parameters and a faster inference speed.
Funder
special funds for central guiding local science and technology development: Industrialisation of internet of things terminal safety inspection platform
Jinan science and technology programme project: demonstration application of high performance big data security storage system
Shandong Provincial Natural Science Foundation
Reference33 articles.
1. Cheng, Y.H., Chang, P.C., and Kuo, C.N. (2020, January 13–16). Convolutional Neural Networks Approach for Music Genre Classification. Proceedings of the 2020 International Symposium on Computer, Consumer and Control (IS3C), Taichung City, Taiwan.
2. Liu, J., Wang, C., and Zha, L. (2021). A middle-level learning feature interaction method with deep learning for multi-feature music genre classification. Electronics, 10.
3. Parallel attention of representation global time–frequency correlation for music genre classification;Wen;Multimed. Tools Appl.,2024
4. Deepak, S., and Prasad, B. (2020, January 15–17). Music Classification based on Genre using LSTM. Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India.
5. The Classification of Music and Art Genres under the Visual Threshold of Deep Learning;Zheng;Comput. Intell. Neurosci.,2022