Author:
Meng Shiting,Hao Qingbo,Xiao Yingyuan,Zheng Wenguang
Abstract
Convolutional neural networks (CNNs) have been successfully applied to music genre classification tasks. With the development of diverse music, genre fusion has become common. Fused music exhibits multiple similar musical features such as rhythm, timbre, and structure, which typically arise from the temporal information in the spectrum. However, traditional CNNs cannot effectively capture temporal information, leading to difficulties in distinguishing fused music. To address this issue, this study proposes a CNN model called MusicNeXt for music genre classification. Its goal is to enhance the feature extraction method to increase focus on musical features, and increase the distinctiveness between different genres, thereby reducing classification result bias. Specifically, we construct the feature extraction module which can fully utilize temporal information, thereby enhancing its focus on music features. It exhibits an improved understanding of the complexity of fused music. Additionally, we introduce a genre-sensitive adjustment layer that strengthens the learning of differences between different genres through within-class angle constraints. This leads to increased distinctiveness between genres and provides interpretability for the classification results. Experimental results demonstrate that our proposed MusicNeXt model outperforms baseline networks and other state-of-the-art methods in music genre classification tasks, without generating category bias in the classification results.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science