Multimodal music datasets? Challenges and future goals in music processing-Reference-Cited by-同舟云学术

Multimodal music datasets? Challenges and future goals in music processing

Published:2024-08-28 Issue:3 Volume:13 Page:
ISSN:2192-6611
Container-title:International Journal of Multimedia Information Retrieval
language:en
Short-container-title:Int J Multimed Info Retr

Author:

Christodoulou Anna-Maria,Lartillot Olivier,Jensenius Alexander Refsum

Abstract

AbstractThe term “multimodal music dataset” is often used to describe music-related datasets that represent music as a multimedia art form and multimodal experience. However, the term “multimodality” is often used differently in disciplines such as musicology, music psychology, and music technology. This paper proposes a definition of multimodality that works across different music disciplines. Many challenges are related to constructing, evaluating, and using multimodal music datasets. We provide a task-based categorization of multimodal datasets and suggest guidelines for their development. Diverse data pre-processing methods are illuminated, highlighting their contributions to transparent and reproducible music analysis. Additionally, evaluation metrics, methods, and benchmarks tailored for multimodal music processing tasks are scrutinized, empowering researchers to make informed decisions and facilitating cross-study comparisons.

Funder

University of Oslo

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s13735-024-00344-6.pdf

Reference97 articles.

1. Abhyankar SG, Bharadwaj SS, Rani GS, et al (2023) A survey on music genre classification using multimodal information processing and retrieval. In: Proceedings of the 2023 international conference on recent trends in electronics and communication (ICRTEC). IEEE, Mysore, India, pp 1–6. https://doi.org/10.1109/ICRTEC56977.2023.10111926

2. Agostinelli A, Denk TI, Borsos Z, et al (2023) MusicLM: generating music from text. https://doi.org/10.48550/arXiv.2301. 91111325

3. Alfaro-Contreras M, Valero-Mas JJ, Iñesta JM et al (2023) Late multimodal fusion for image and audio music transcription. Expert Syst Appl 216:119491. https://doi.org/10.1016/j.eswa.2022.119491

4. Arola KL, Sheppard J, Ball CE (2014) Writer/designer: a guide to making multimodal projects. Bedford/St. Martins, Boston

5. Aryafar K, Shokoufandeh A (2014) Multimodal music and lyrics fusion classifier for artist identification. In: 2014 13th international conference on machine learning and applications. IEEE, Detroit, MI, USA, pp 506–509, https://doi.org/10.1109/ICMLA.2014.88