Author:
Greer Timothy,Shi Xuan,Ma Benjamin,Narayanan Shrikanth
Abstract
AbstractComputational machine intelligence approaches have enabled a variety of music-centric technologies in support of creating, sharing and interacting with music content. A strong performance on specific downstream application tasks, such as music genre detection and music emotion recognition, is paramount to ensuring broad capabilities for computational music understanding and Music Information Retrieval. Traditional approaches have relied on supervised learning to train models to support these music-related tasks. However, such approaches require copious annotated data and still may only provide insight into one view of music—namely, that related to the specific task at hand. We present a new model for generating audio-musical features that support music understanding, leveraging self-supervision and cross-domain learning. After pre-training using masked reconstruction of musical input features using self-attention bidirectional transformers, output representations are fine-tuned using several downstream music understanding tasks. Results show that the features generated by our multi-faceted, multi-task, music transformer model, which we call M3BERT, tend to outperform other audio and music embeddings on several diverse music-related tasks, indicating the potential of self-supervised and semi-supervised learning approaches toward a more generalized and robust computational approach to modeling music. Our work can offer a starting point for many music-related modeling tasks, with potential applications in learning deep representations and enabling robust technology applications.
Publisher
Springer Science and Business Media LLC
Reference76 articles.
1. Bu, J. et al. Music recommendation by unified hypergraph: Combining social media information and music content. In Proceedings of the 18th ACM International Conference on Multimedia, 391–400 (2010).
2. Zhang, K., Zhang, H., Li, S., Yang, C. & Sun, L. The PMEmo dataset for music emotion recognition. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 135–142 (ACM, 2018).
3. Ghosal, D. & Kolekar, M. H. Music genre recognition using deep neural networks and transfer learning. In Annual Conference of the International Speech Communication Association, 2087–2091 (2018).
4. Hung, H.-T. et al. Mediaeval 2019 emotion and theme recognition task: A VQ-VAE based approach. In MediaEval (2019).
5. Koutini, K., Chowdhury, S., Haunschmid, V., Eghbal-Zadeh, H. & Widmer, G. Emotion and theme recognition in music with frequency-aware RF-regularized CNNs. Preprint at arXiv:1911.05833 (2019).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Visualisations of Jazz Standards Derived from Transformer-based Multi-task Learning;Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics;2023-11-24