Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs-Reference-Cited by-同舟云学术

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Published:2021-05-18 Issue:1 Volume:35 Page:178-186
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Hsiao Wen-Yi,Liu Jen-Yu,Yeh Yin-Cheng,Yang Yi-Hsuan

Abstract

To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens of various types. For example, to describe a musical note, one needs separate tokens to indicate the note’s pitch, duration, velocity (dynamics), and placement (onset time) along the time grid. While different types of tokens may possess different properties, existing models usually treat them equally, in the same way as modeling words in natural languages. In this paper, we present a conceptually different approach that explicitly takes into account the type of the tokens, such as note types and metric types. And, we propose a new Transformer decoder architecture that uses different feed-forward heads to model tokens of different types. With an expansion-compression trick, we convert a piece of music to a sequence of compound words by grouping neighboring tokens, greatly reducing the length of the token sequences. We show that the resulting model can be viewed as a learner over dynamic directed hypergraphs. And, we employ it to learn to compose expressive Pop piano music of full-song length (involving up to 10K individual tokens per song), both conditionally and unconditionally. Our experiment shows that, compared to state-of-the-art models, the proposed model converges 5 to 10 times faster at training (i.e., within a day on a single GPU with 11 GB memory), and with comparable quality in the generated music

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 52 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Score Images as a Modality: Enhancing Symbolic Music Understanding through Large-Scale Multimodal Pre-Training;Sensors;2024-08-02

2. AI-Based Affective Music Generation Systems: A Review of Methods and Challenges;ACM Computing Surveys;2024-07-19

3. Popular Hooks: A Multimodal Dataset of Musical Hooks for Music Understanding and Generation;2024 IEEE International Conference on Multimedia and Expo Workshops (ICMEW);2024-07-15

4. MemoMusic 4.0: Personalized Emotion Music Generation Conditioned by Valence and Arousal as Virtual Tokens;2024 IEEE International Conference on Multimedia and Expo Workshops (ICMEW);2024-07-15

5. Suno: potential, prospects, and trends;Frontiers of Information Technology & Electronic Engineering;2024-06-20