Author:
Han Bo,Li Yuheng,Shen Yixuan,Ren Yi,Han Feilin
Abstract
AbstractDance-driven music generation aims to generate musical pieces conditioned on dance videos. Previous works focus on monophonic or raw audio generation, while the multi-instrument scenario is under-explored. The challenges associated with dance-driven multi-instrument music (MIDI) generation are twofold: (i) lack of a publicly available multi-instrument MIDI and video paired dataset and (ii) the weak correlation between music and video. To tackle these challenges, we have built the first multi-instrument MIDI and dance paired dataset (D2MIDI). Based on this dataset, we introduce a multi-instrument MIDI generation framework (Dance2MIDI) conditioned on dance video. Firstly, to capture the relationship between dance and music, we employ a graph convolutional network to encode the dance motion. This allows us to extract features related to dance movement and dance style. Secondly, to generate a harmonious rhythm, we utilize a transformer model to decode the drum track sequence, leveraging a cross-attention mechanism. Thirdly, we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task. A BERT-like model is employed to comprehend the context of the entire music piece through self-supervised learning. We evaluate the music generated by our framework trained on the D2MIDI dataset and demonstrate that our method achieves state-of-the-art performance.
Publisher
Springer Science and Business Media LLC
Reference47 articles.
1. Cannataro, M.; Talia, D. The knowledge grid. Communications of the ACM Vol. 46, No. 1, 89–93, 2003.
2. Mastroianni, C.; Talia, D.; Verta, O. A super-peer model for resource discovery services in large-scale grids. Future Generation Computer Systems Vol. 21, No. 8, 1235–1248, 2005.
3. Aggarwal, G.; Parikh, D. Dance2Music: Automatic dance-driven music generation. arXiv preprint arXiv:2107.06252, 2021.
4. Di, S.; Jiang, Z.; Liu, S.; Wang, Z.; Zhu, L.; He, Z.; Liu, H.; Yan, S. Video background music generation with controllable music transformer. In: Proceedings of the 29th ACM International Conference on Multimedia, 2037–2045, 2021.
5. Lecture Notes in Computer Science;C Gan,2020